Skip to main content

Biophysics-based protein language models for protein engineering.

Publication ,  Journal Article
Gelman, S; Johnson, B; Freschlin, CR; Sharma, A; D'Costa, S; Peters, J; Gitter, A; Romero, PA
Published in: Nature methods
September 2025

Protein language models trained on evolutionary data have emerged as powerful tools for predictive problems involving protein sequence, structure and function. However, these models overlook decades of research into biophysical factors governing protein function. We propose mutational effect transfer learning (METL), a protein language model framework that unites advanced machine learning and biophysical modeling. Using the METL framework, we pretrain transformer-based neural networks on biophysical simulation data to capture fundamental relationships between protein sequence, structure and energetics. We fine-tune METL on experimental sequence-function data to harness these biophysical signals and apply them when predicting protein properties like thermostability, catalytic activity and fluorescence. METL excels in challenging protein engineering tasks like generalizing from small training sets and position extrapolation, although existing methods that train on evolutionary signals remain powerful for many types of experimental assays. We demonstrate METL's ability to design functional green fluorescent protein variants when trained on only 64 examples, showcasing the potential of biophysics-based protein language models for protein engineering.

Duke Scholars

Published In

Nature methods

DOI

EISSN

1548-7105

ISSN

1548-7091

Publication Date

September 2025

Volume

22

Issue

9

Start / End Page

1868 / 1879

Related Subject Headings

  • Proteins
  • Protein Engineering
  • Neural Networks, Computer
  • Mutation
  • Models, Molecular
  • Machine Learning
  • Green Fluorescent Proteins
  • Developmental Biology
  • Biophysics
  • 31 Biological sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Gelman, S., Johnson, B., Freschlin, C. R., Sharma, A., D’Costa, S., Peters, J., … Romero, P. A. (2025). Biophysics-based protein language models for protein engineering. Nature Methods, 22(9), 1868–1879. https://doi.org/10.1038/s41592-025-02776-2
Gelman, Sam, Bryce Johnson, Chase R. Freschlin, Arnav Sharma, Sameer D’Costa, John Peters, Anthony Gitter, and Philip A. Romero. “Biophysics-based protein language models for protein engineering.Nature Methods 22, no. 9 (September 2025): 1868–79. https://doi.org/10.1038/s41592-025-02776-2.
Gelman S, Johnson B, Freschlin CR, Sharma A, D’Costa S, Peters J, et al. Biophysics-based protein language models for protein engineering. Nature methods. 2025 Sep;22(9):1868–79.
Gelman, Sam, et al. “Biophysics-based protein language models for protein engineering.Nature Methods, vol. 22, no. 9, Sept. 2025, pp. 1868–79. Epmc, doi:10.1038/s41592-025-02776-2.
Gelman S, Johnson B, Freschlin CR, Sharma A, D’Costa S, Peters J, Gitter A, Romero PA. Biophysics-based protein language models for protein engineering. Nature methods. 2025 Sep;22(9):1868–1879.

Published In

Nature methods

DOI

EISSN

1548-7105

ISSN

1548-7091

Publication Date

September 2025

Volume

22

Issue

9

Start / End Page

1868 / 1879

Related Subject Headings

  • Proteins
  • Protein Engineering
  • Neural Networks, Computer
  • Mutation
  • Models, Molecular
  • Machine Learning
  • Green Fluorescent Proteins
  • Developmental Biology
  • Biophysics
  • 31 Biological sciences