Scholars@Duke publication: Matching isotopic distributions from metabolically labeled samples.

Matching isotopic distributions from metabolically labeled samples.

Publication , Journal Article

McIlwain, S; Page, D; Huttlin, EL; Sussman, MR

Published in: Bioinformatics

July 1, 2008

MOTIVATION: In recent years stable isotopic labeling has become a standard approach for quantitative proteomic analyses. Among the many available isotopic labeling strategies, metabolic labeling is attractive for the excellent internal control it provides. However, analysis of data from metabolic labeling experiments can be complicated because the spacing between labeled and unlabeled forms of each peptide depends on its sequence, and is thus variable from analyte to analyte. As a result, one generally needs to know the sequence of a peptide to identify its matching isotopic distributions in an automated fashion. In some experimental situations it would be necessary or desirable to match pairs of labeled and unlabeled peaks from peptides of unknown sequence. This article addresses this largely overlooked problem in the analysis of quantitative mass spectrometry data by presenting an algorithm that not only identifies isotopic distributions within a mass spectrum, but also annotates matches between natural abundance light isotopic distributions and their metabolically labeled counterparts. This algorithm is designed in two stages: first we annotate the isotopic peaks using a modified version of the IDM algorithm described last year; then we use a probabilistic classifier that is supplemented by dynamic programming to find the metabolically labeled matched isotopic pairs. Such a method is needed for high-throughput quantitative proteomic metabolomic experiments measured via mass spectrometry. RESULTS: The primary result of this article is that the dynamic programming approach performs well given perfect isotopic distribution annotations. Our algorithm achieves a true positive rate of 99% and a false positive rate of 1% using perfect isotopic distribution annotations. When the isotopic distributions are annotated given 'expert' selected peaks, the same algorithm gets a true positive rate of 77% and a false positive rate of 1%. Finally, when annotating using machine selected peaks, which may contain noise, the dynamic programming algorithm gives a true positive rate of 36% and a false positive rate of 1%. It is important to mention that these rates arise from the requirement of exact annotations of both the light and heavy isotopic distributions. In our evaluations, a match is considered 'entirely incorrect' if it is missing even one peak or containing an extraneous peak. If we only require that the 'monoisotopic' peaks exist within the two matched distributions, our algorithm obtains a positive rate of 45% and a false positive rate of 1% on the 'machine' selected data. Changes to the algorithm's scoring function and training example generation improves our 'monoisotopic' peak score true positive rate to 65% while obtaining a false positive rate of 2%. All results were obtained within 10-fold cross-validation of 41 mass spectra with a mass-to-charge range of 800-4000 m/z. There are a total of 713 isotopic distributions and 255 matched isotopic pairs that are hand-annotated for this study. AVAILABILITY: Programs are available via http://www.cs.wisc.edu/~mcilwain/IDM/.

Duke Scholars

Author David Page Biostatistics & Bioinformatics, Division of Biostatistics

Published In

Bioinformatics

DOI

10.1093/bioinformatics/btn190

EISSN

1367-4811

Publication Date

July 1, 2008

Volume

Issue

Start / End Page

i339 / i347

Location

England

Related Subject Headings

Sequence Analysis, Protein
Proteins
Peptide Mapping
Molecular Sequence Data
Mass Spectrometry
Isotope Labeling
Bioinformatics
Amino Acid Sequence
49 Mathematical sciences
46 Information and computing sciences

Citation

APA

Chicago

ICMJE

MLA

NLM

McIlwain, S., Page, D., Huttlin, E. L., & Sussman, M. R. (2008). Matching isotopic distributions from metabolically labeled samples. Bioinformatics, 24(13), i339–i347. https://doi.org/10.1093/bioinformatics/btn190

McIlwain, Sean, David Page, Edward L. Huttlin, and Michael R. Sussman. “Matching isotopic distributions from metabolically labeled samples.” Bioinformatics 24, no. 13 (July 1, 2008): i339–47. https://doi.org/10.1093/bioinformatics/btn190.

McIlwain S, Page D, Huttlin EL, Sussman MR. Matching isotopic distributions from metabolically labeled samples. Bioinformatics. 2008 Jul 1;24(13):i339–47.

McIlwain, Sean, et al. “Matching isotopic distributions from metabolically labeled samples.” Bioinformatics, vol. 24, no. 13, July 2008, pp. i339–47. Pubmed, doi:10.1093/bioinformatics/btn190.

McIlwain S, Page D, Huttlin EL, Sussman MR. Matching isotopic distributions from metabolically labeled samples. Bioinformatics. 2008 Jul 1;24(13):i339–i347.

Published In

Bioinformatics

DOI

10.1093/bioinformatics/btn190

EISSN

1367-4811

Publication Date

July 1, 2008

Volume

Issue

Start / End Page

i339 / i347

Location

England

Related Subject Headings

Sequence Analysis, Protein
Proteins
Peptide Mapping
Molecular Sequence Data
Mass Spectrometry
Isotope Labeling
Bioinformatics
Amino Acid Sequence
49 Mathematical sciences
46 Information and computing sciences