Skip to main content

Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.

Publication ,  Journal Article
Xing, C; Dunson, DB
Published in: PLoS computational biology
July 2011

Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naïve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naïve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

PLoS computational biology

DOI

EISSN

1553-7358

ISSN

1553-734X

Publication Date

July 2011

Volume

7

Issue

7

Start / End Page

e1002110

Related Subject Headings

  • Reproducibility of Results
  • ROC Curve
  • Proteins
  • Protein Interaction Mapping
  • Logistic Models
  • Humans
  • Databases, Protein
  • Computational Biology
  • Bioinformatics
  • Bayes Theorem
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Xing, C., & Dunson, D. B. (2011). Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions. PLoS Computational Biology, 7(7), e1002110. https://doi.org/10.1371/journal.pcbi.1002110
Xing, Chuanhua, and David B. Dunson. “Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.PLoS Computational Biology 7, no. 7 (July 2011): e1002110. https://doi.org/10.1371/journal.pcbi.1002110.
Xing, Chuanhua, and David B. Dunson. “Bayesian inference for genomic data integration reduces misclassification rate in predicting protein-protein interactions.PLoS Computational Biology, vol. 7, no. 7, July 2011, p. e1002110. Epmc, doi:10.1371/journal.pcbi.1002110.

Published In

PLoS computational biology

DOI

EISSN

1553-7358

ISSN

1553-734X

Publication Date

July 2011

Volume

7

Issue

7

Start / End Page

e1002110

Related Subject Headings

  • Reproducibility of Results
  • ROC Curve
  • Proteins
  • Protein Interaction Mapping
  • Logistic Models
  • Humans
  • Databases, Protein
  • Computational Biology
  • Bioinformatics
  • Bayes Theorem