Small sets of interacting proteins suggest functional linkage mechanisms via Bayesian analogical reasoning.

Published

Journal Article

Proteins and protein complexes coordinate their activity to execute cellular functions. In a number of experimental settings, including synthetic genetic arrays, genetic perturbations and RNAi screens, scientists identify a small set of protein interactions of interest. A working hypothesis is often that these interactions are the observable phenotypes of some functional process, which is not directly observable. Confirmatory analysis requires finding other pairs of proteins whose interaction may be additional phenotypical evidence about the same functional process. Extant methods for finding additional protein interactions rely heavily on the information in the newly identified set of interactions. For instance, these methods leverage the attributes of the individual proteins directly, in a supervised setting, in order to find relevant protein pairs. A small set of protein interactions provides a small sample to train parameters of prediction methods, thus leading to low confidence.We develop RBSets, a computational approach to ranking protein interactions rooted in analogical reasoning; that is, the ability to learn and generalize relations between objects. Our approach is tailored to situations where the training set of protein interactions is small, and leverages the attributes of the individual proteins indirectly, in a Bayesian ranking setting that is perhaps closest to propensity scoring in mathematical psychology. We find that RBSets leads to good performance in identifying additional interactions starting from a small evidence set of interacting proteins, for which an underlying biological logic in terms of functional processes and signaling pathways can be established with some confidence. Our approach is scalable and can be applied to large databases with minimal computational overhead. Our results suggest that analogical reasoning within a Bayesian ranking problem is a promising new approach for real-time biological discovery.Java code is available at: www.gatsby.ucl.ac.uk/~rbas.airoldi@fas.harvard.edu; kheller@mit.edu; ricardo@stats.ucl.ac.uk.

Full Text

Duke Authors

Cited Authors

  • Airoldi, EM; Heller, KA; Silva, R

Published Date

  • July 2011

Published In

Volume / Issue

  • 27 / 13

Start / End Page

  • i374 - i382

PubMed ID

  • 21685095

Pubmed Central ID

  • 21685095

Electronic International Standard Serial Number (EISSN)

  • 1367-4811

International Standard Serial Number (ISSN)

  • 1367-4803

Digital Object Identifier (DOI)

  • 10.1093/bioinformatics/btr236

Language

  • eng