An Artificial Intelligence Model for Translating Natural Language into Functional de Novo Proteins
Presentation

An Artificial Intelligence Model for Translating Natural Language into Functional de Novo Proteins

Paper Author

Timothy P. Riley, Mohammad S. Parsa, Pourya Kalantari, Ismail Naderi, Kiana Azimian, Nemya Begloo, 310 AI, San Francisco, CA, USA

Abstract

Traditional protein design is fundamentally constrained by known sequences and folds. To break free from these limitations, we introduce a new alternative: designing proteins directly from plain-language specifications. To achieve this, we trained MP4, a transformer-based model that maps natural language prompts to protein sequences, on a dataset of 3.2 billion points and 138k tokens. In a benchmark of 96 prompts representing a wide array of functions and contexts, MP4 excelled by simultaneously improving on three key metrics: sequence realism, predicted fold quality, and alignment to the requested function. This high performance is particularly significant as it was achieved using only text as input which is a major departure from other models. Experimental validation confirmed our computational predictions: two de novo designs were experimentally shown to be both expressible and thermostable, with high-resolution crystallography (1.30 Å and 1.77 Å) ultimately revealing one to possess a paradigm-shifting novel fold. Functionally, the designs were also active, demonstrating both ATP binding and hydrolysis in vitro. This work demonstrates the realization of natural-language intent as functional proteins that express, crystallize, and catalyze. Although the underlying approach is still in early development with incomplete coverage and controllability, MP4 delivers a profound impact: it lowers the barrier to protein design and vastly expands the space for creative exploration in molecular programming.

Research Paper

Previous Talks

36 talks

An Artificial Intelligence Model for Translating Natural Language into Functional de Novo Proteins

Oct 02, 2025 Timothy P. Riley, Mohammad S. Parsa, Pourya Kalantari, Ismail Naderi, Kiana Azimian, Nemya Begloo,

Self-supervised graph neural networks for polymer property prediction

Feb 20, 2025 Jana M. Weber

Learning-Order Autoregressive Models with Application to Molecular Graph Generation

Aug 07, 2025 Michalis K. Titsias