Skip to main content
Journal cover image

Neural networks to learn protein sequence-function relationships from deep mutational scanning data.

Publication ,  Journal Article
Gelman, S; Fahlberg, SA; Heinzelman, P; Romero, PA; Gitter, A
Published in: Proceedings of the National Academy of Sciences of the United States of America
November 2021

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein's behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network's internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks' ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models' ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Proceedings of the National Academy of Sciences of the United States of America

DOI

EISSN

1091-6490

ISSN

0027-8424

Publication Date

November 2021

Volume

118

Issue

48

Start / End Page

e2104878118

Related Subject Headings

  • Structure-Activity Relationship
  • Sequence Analysis, Protein
  • Proteins
  • Neural Networks, Computer
  • Mutation
  • Machine Learning
  • Deep Learning
  • Biochemical Phenomena
  • Amino Acid Sequence
  • Algorithms
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Gelman, S., Fahlberg, S. A., Heinzelman, P., Romero, P. A., & Gitter, A. (2021). Neural networks to learn protein sequence-function relationships from deep mutational scanning data. Proceedings of the National Academy of Sciences of the United States of America, 118(48), e2104878118. https://doi.org/10.1073/pnas.2104878118
Gelman, Sam, Sarah A. Fahlberg, Pete Heinzelman, Philip A. Romero, and Anthony Gitter. “Neural networks to learn protein sequence-function relationships from deep mutational scanning data.Proceedings of the National Academy of Sciences of the United States of America 118, no. 48 (November 2021): e2104878118. https://doi.org/10.1073/pnas.2104878118.
Gelman S, Fahlberg SA, Heinzelman P, Romero PA, Gitter A. Neural networks to learn protein sequence-function relationships from deep mutational scanning data. Proceedings of the National Academy of Sciences of the United States of America. 2021 Nov;118(48):e2104878118.
Gelman, Sam, et al. “Neural networks to learn protein sequence-function relationships from deep mutational scanning data.Proceedings of the National Academy of Sciences of the United States of America, vol. 118, no. 48, Nov. 2021, p. e2104878118. Epmc, doi:10.1073/pnas.2104878118.
Gelman S, Fahlberg SA, Heinzelman P, Romero PA, Gitter A. Neural networks to learn protein sequence-function relationships from deep mutational scanning data. Proceedings of the National Academy of Sciences of the United States of America. 2021 Nov;118(48):e2104878118.
Journal cover image

Published In

Proceedings of the National Academy of Sciences of the United States of America

DOI

EISSN

1091-6490

ISSN

0027-8424

Publication Date

November 2021

Volume

118

Issue

48

Start / End Page

e2104878118

Related Subject Headings

  • Structure-Activity Relationship
  • Sequence Analysis, Protein
  • Proteins
  • Neural Networks, Computer
  • Mutation
  • Machine Learning
  • Deep Learning
  • Biochemical Phenomena
  • Amino Acid Sequence
  • Algorithms