SaLT&PepPr is an interface-predicting language model for designing peptide-guided protein degraders.
Protein-protein interactions (PPIs) are critical for biological processes and predicting the sites of these interactions is useful for both computational and experimental applications. We present a Structure-agnostic Language Transformer and Peptide Prioritization (SaLT&PepPr) pipeline to predict interaction interfaces from a protein sequence alone for the subsequent generation of peptidic binding motifs. Our model fine-tunes the ESM-2 protein language model (pLM) with a per-position prediction task to identify PPI sites using data from the PDB, and prioritizes motifs which are most likely to be involved within inter-chain binding. By only using amino acid sequence as input, our model is competitive with structural homology-based methods, but exhibits reduced performance compared with deep learning models that input both structural and sequence features. Inspired by our previous results using co-crystals to engineer target-binding "guide" peptides, we curate PPI databases to identify partners for subsequent peptide derivation. Fusing guide peptides to an E3 ubiquitin ligase domain, we demonstrate degradation of endogenous β-catenin, 4E-BP2, and TRIM8, and highlight the nanomolar binding affinity, low off-targeting propensity, and function-altering capability of our best-performing degraders in cancer cells. In total, our study suggests that prioritizing binders from natural interactions via pLMs can enable programmable protein targeting and modulation.
Duke Scholars
Altmetric Attention Stats
Dimensions Citation Stats
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Ubiquitin-Protein Ligases
- Proteins
- Peptides
- Amino Acid Sequence
- 32 Biomedical and clinical sciences
- 31 Biological sciences
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Ubiquitin-Protein Ligases
- Proteins
- Peptides
- Amino Acid Sequence
- 32 Biomedical and clinical sciences
- 31 Biological sciences