Skip to main content

Neural network extrapolation to distant regions of the protein fitness landscape.

Publication ,  Journal Article
Freschlin, CR; Fahlberg, SA; Heinzelman, P; Romero, PA
Published in: Nature communications
July 2024

Machine learning (ML) has transformed protein engineering by constructing models of the underlying sequence-function landscape to accelerate the discovery of new biomolecules. ML-guided protein design requires models, trained on local sequence-function information, to accurately predict distant fitness peaks. In this work, we evaluate neural networks' capacity to extrapolate beyond their training data. We perform model-guided design using a panel of neural network architectures trained on protein G (GB1)-Immunoglobulin G (IgG) binding data and experimentally test thousands of GB1 designs to systematically evaluate the models' extrapolation. We find each model architecture infers markedly different landscapes from the same data, which give rise to unique design preferences. We find simpler models excel in local extrapolation to design high fitness proteins, while more sophisticated convolutional models can venture deep into sequence space to design proteins that fold but are no longer functional. We also find that implementing a simple ensemble of convolutional neural networks enables robust design of high-performing variants in the local landscape. Our findings highlight how each architecture's inductive biases prime them to learn different aspects of the protein fitness landscape and how a simple ensembling approach makes protein engineering more robust.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Nature communications

DOI

EISSN

2041-1723

ISSN

2041-1723

Publication Date

July 2024

Volume

15

Issue

1

Start / End Page

6405

Related Subject Headings

  • Protein Engineering
  • Protein Binding
  • Neural Networks, Computer
  • Models, Molecular
  • Machine Learning
  • Immunoglobulin G
  • Bacterial Proteins
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Freschlin, C. R., Fahlberg, S. A., Heinzelman, P., & Romero, P. A. (2024). Neural network extrapolation to distant regions of the protein fitness landscape. Nature Communications, 15(1), 6405. https://doi.org/10.1038/s41467-024-50712-3
Freschlin, Chase R., Sarah A. Fahlberg, Pete Heinzelman, and Philip A. Romero. “Neural network extrapolation to distant regions of the protein fitness landscape.Nature Communications 15, no. 1 (July 2024): 6405. https://doi.org/10.1038/s41467-024-50712-3.
Freschlin CR, Fahlberg SA, Heinzelman P, Romero PA. Neural network extrapolation to distant regions of the protein fitness landscape. Nature communications. 2024 Jul;15(1):6405.
Freschlin, Chase R., et al. “Neural network extrapolation to distant regions of the protein fitness landscape.Nature Communications, vol. 15, no. 1, July 2024, p. 6405. Epmc, doi:10.1038/s41467-024-50712-3.
Freschlin CR, Fahlberg SA, Heinzelman P, Romero PA. Neural network extrapolation to distant regions of the protein fitness landscape. Nature communications. 2024 Jul;15(1):6405.

Published In

Nature communications

DOI

EISSN

2041-1723

ISSN

2041-1723

Publication Date

July 2024

Volume

15

Issue

1

Start / End Page

6405

Related Subject Headings

  • Protein Engineering
  • Protein Binding
  • Neural Networks, Computer
  • Models, Molecular
  • Machine Learning
  • Immunoglobulin G
  • Bacterial Proteins