Using dynamic programming to create isotopic distribution maps from mass spectra.
MOTIVATION: This article presents a method to identify the isotopic distributions within a mass spectrum using a probabilistic classifier supplemented with dynamic programming. Such a system is needed for a variety of purposes, including generating robust and meaningful features from mass spectra to be used in classification. RESULTS: The primary result of this article is that the dynamic programming approach significantly improves sensitivity, without harming specificity, of a probabilistic classifier for identifying the isotopic distributions. When annotating isotopic distributions where an expert has performed the initial 'peak-picking' (removal of noise peaks), the dynamic programming approach gives a true positive rate of 96% and a false positive rate of 0.0%, whereas the classifier alone has a true positive rate of only 47% when the false positive rate is 0.0%. When annotating isotopic distributions in machine peak-picked spectra, which may contain many noise peaks, the dynamic programming approach gives a true positive rate of only 22.0%, but it still keeps a low false positive rate of 1.0% and still outperforms the classifier alone. It is important to note that all these rates are when we require exact matches with the distributions in annotated spectra; in our evaluation a distribution is considered 'entirely incorrect' if it is missing even one peak or contains even one extraneous peak. We compared to the THRASH and AID-MS systems using a looser requirement: correctly identifying the distribution that contains the mono-isotopic mass. Under this measure, our dynamic programming approach achieves a true positive rate of 82% and a false positive rate of 1%, which again outperforms the classifier alone. The dynamic programming approach ends up being more conservative than THRASH and AID-MS, yielding both fewer true and false peaks, but the F-score of the dynamic programming approach is significantly better than those of THRASH and AID-MS. All results were obtained with 10-fold cross-validation of 99 sections of mass spectra with a total of 214 hand-annotated isotopic distributions. AVAILABILITY: Programs are available via http://www.cs.wisc.edu/~mcilwain/IDM.
McIlwain, S; Page, D; Huttlin, EL; Sussman, MR
Volume / Issue
Start / End Page
Electronic International Standard Serial Number (EISSN)
Digital Object Identifier (DOI)