An Artificial Intelligence Model for Translating Natural Language into Functional de Novo Proteins

Paper Author

Timothy P. Riley, Mohammad S. Parsa, Pourya Kalantari, Ismail Naderi, Kiana Azimian, Nemya Begloo, 310 AI, San Francisco, CA, USA

Presented by

lukman bukenya

NUWAGIRA BRIGHTON

Abstract

Traditional protein design is fundamentally constrained by known sequences and folds. To break free from these limitations, we introduce a new alternative: designing proteins directly from plain-language specifications. To achieve this, we trained MP4, a transformer-based model that maps natural language prompts to protein sequences, on a dataset of 3.2 billion points and 138k tokens. In a benchmark of 96 prompts representing a wide array of functions and contexts, MP4 excelled by simultaneously improving on three key metrics: sequence realism, predicted fold quality, and alignment to the requested function. This high performance is particularly significant as it was achieved using only text as input which is a major departure from other models. Experimental validation confirmed our computational predictions: two de novo designs were experimentally shown to be both expressible and thermostable, with high-resolution crystallography (1.30 Å and 1.77 Å) ultimately revealing one to possess a paradigm-shifting novel fold. Functionally, the designs were also active, demonstrating both ATP binding and hydrolysis in vitro. This work demonstrates the realization of natural-language intent as functional proteins that express, crystallize, and catalyze. Although the underlying approach is still in early development with incomplete coverage and controllability, MP4 delivers a profound impact: it lowers the barrier to protein design and vastly expands the space for creative exploration in molecular programming.

Research Paper

View Full Paper

Create New Blog Post

An Artificial Intelligence Model for Translating Natural Language into Functional de Novo Proteins

Abstract

Research Paper

Previous Talks

An Artificial Intelligence Model for Translating Natural Language into Functional de Novo Proteins

Self-supervised graph neural networks for polymer property prediction

Learning-Order Autoregressive Models with Application to Molecular Graph Generation