Scholars@Duke publication: Getting more from less: algorithms for rapid protein identification with multiple short peptide sequences.

Getting more from less: algorithms for rapid protein identification with multiple short peptide sequences.

Publication , Journal Article

Mackey, AJ; Haystead, TAJ; Pearson, WR

Published in: Mol Cell Proteomics

February 2002

We describe two novel sequence similarity search algorithms, FASTS and FASTF, that use multiple short peptide sequences to identify homologous sequences in protein or DNA databases. FASTS searches with peptide sequences of unknown order, as obtained by mass spectrometry-based sequencing, evaluating all possible arrangements of the peptides. FASTF searches with mixed peptide sequences, as generated by Edman sequencing of unseparated mixtures of peptides. FASTF deconvolutes the mixture, using a greedy heuristic that allows rapid identification of high scoring alignments while reducing the total number of explored alternatives. Both algorithms use the heuristic FASTA comparison strategy to accelerate the search but use alignment probability, rather than similarity score, as the criterion for alignment optimality. Statistical estimates are calculated using an empirical correction to a theoretical probability. These calculated estimates were accurate within a factor of 10 for FASTS and 1000 for FASTF on our test dataset. FASTS requires only 15-20 total residues in three or four peptides to robustly identify homologues sharing 50% or greater protein sequence identity. FASTF requires about 25% more sequence data than FASTS for equivalent sensitivity, but additional sequence data are usually available from mixed Edman experiments. Thus, both algorithms can identify homologues that diverged 100 to 500 million years ago, allowing proteomic identification from organisms whose genomes have not been sequenced.

Duke Scholars

Author Timothy Arthur James Haystead Pharmacology & Cancer Biology

Published In

Mol Cell Proteomics

DOI

10.1074/mcp.m100004-mcp200

ISSN

1535-9476

Publication Date

February 2002

Volume

Issue

Start / End Page

139 / 147

Location

United States

Related Subject Headings

Time Factors
Sequence Homology, Amino Acid
Sequence Alignment
Proteome
Molecular Sequence Data
Mass Spectrometry
Evolution, Molecular
Databases, Protein
Databases, Nucleic Acid
Biometry

Citation

APA

Chicago

ICMJE

MLA

NLM

Mackey, A. J., Haystead, T. A. J., & Pearson, W. R. (2002). Getting more from less: algorithms for rapid protein identification with multiple short peptide sequences. Mol Cell Proteomics, 1(2), 139–147. https://doi.org/10.1074/mcp.m100004-mcp200

Mackey, Aaron J., Timothy A. J. Haystead, and William R. Pearson. “Getting more from less: algorithms for rapid protein identification with multiple short peptide sequences.” Mol Cell Proteomics 1, no. 2 (February 2002): 139–47. https://doi.org/10.1074/mcp.m100004-mcp200.

Mackey AJ, Haystead TAJ, Pearson WR. Getting more from less: algorithms for rapid protein identification with multiple short peptide sequences. Mol Cell Proteomics. 2002 Feb;1(2):139–47.

Mackey, Aaron J., et al. “Getting more from less: algorithms for rapid protein identification with multiple short peptide sequences.” Mol Cell Proteomics, vol. 1, no. 2, Feb. 2002, pp. 139–47. Pubmed, doi:10.1074/mcp.m100004-mcp200.

Mackey AJ, Haystead TAJ, Pearson WR. Getting more from less: algorithms for rapid protein identification with multiple short peptide sequences. Mol Cell Proteomics. 2002 Feb;1(2):139–147.

Published In

Mol Cell Proteomics

DOI

10.1074/mcp.m100004-mcp200

ISSN

1535-9476

Publication Date

February 2002

Volume

Issue

Start / End Page

139 / 147

Location

United States

Related Subject Headings

Time Factors
Sequence Homology, Amino Acid
Sequence Alignment
Proteome
Molecular Sequence Data
Mass Spectrometry
Evolution, Molecular
Databases, Protein
Databases, Nucleic Acid
Biometry