Skip to main content
Journal cover image

Enhanced protein fold recognition through a novel data integration approach.

Publication ,  Journal Article
Ying, Y; Huang, K; Campbell, C
Published in: BMC bioinformatics
August 2009

Protein fold recognition is a key step in protein three-dimensional (3D) structure discovery. There are multiple fold discriminatory data sources which use physicochemical and structural properties as well as further data sources derived from local sequence alignments. This raises the issue of finding the most efficient method for combining these different informative data sources and exploring their relative significance for protein fold classification. Kernel methods have been extensively used for biological data analysis. They can incorporate separate fold discriminatory features into kernel matrices which encode the similarity between samples in their respective data sources.In this paper we consider the problem of integrating multiple data sources using a kernel-based approach. We propose a novel information-theoretic approach based on a Kullback-Leibler (KL) divergence between the output kernel matrix and the input kernel matrix so as to integrate heterogeneous data sources. One of the most appealing properties of this approach is that it can easily cope with multi-class classification and multi-task learning by an appropriate choice of the output kernel matrix. Based on the position of the output and input kernel matrices in the KL-divergence objective, there are two formulations which we respectively refer to as MKLdiv-dc and MKLdiv-conv. We propose to efficiently solve MKLdiv-dc by a difference of convex (DC) programming method and MKLdiv-conv by a projected gradient descent algorithm. The effectiveness of the proposed approaches is evaluated on a benchmark dataset for protein fold recognition and a yeast protein function prediction problem.Our proposed methods MKLdiv-dc and MKLdiv-conv are able to achieve state-of-the-art performance on the SCOP PDB-40D benchmark dataset for protein fold prediction and provide useful insights into the relative significance of informative data sources. In particular, MKLdiv-dc further improves the fold discrimination accuracy to 75.19% which is a more than 5% improvement over competitive Bayesian probabilistic and SVM margin-based kernel learning methods. Furthermore, we report a competitive performance on the yeast protein function prediction problem.

Duke Scholars

Published In

BMC bioinformatics

DOI

EISSN

1471-2105

ISSN

1471-2105

Publication Date

August 2009

Volume

10

Start / End Page

267

Related Subject Headings

  • Proteins
  • Protein Folding
  • Pattern Recognition, Automated
  • Computational Biology
  • Bioinformatics
  • Algorithms
  • 49 Mathematical sciences
  • 46 Information and computing sciences
  • 31 Biological sciences
  • 08 Information and Computing Sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Ying, Y., Huang, K., & Campbell, C. (2009). Enhanced protein fold recognition through a novel data integration approach. BMC Bioinformatics, 10, 267. https://doi.org/10.1186/1471-2105-10-267
Ying, Yiming, Kaizhu Huang, and Colin Campbell. “Enhanced protein fold recognition through a novel data integration approach.BMC Bioinformatics 10 (August 2009): 267. https://doi.org/10.1186/1471-2105-10-267.
Ying Y, Huang K, Campbell C. Enhanced protein fold recognition through a novel data integration approach. BMC bioinformatics. 2009 Aug;10:267.
Ying, Yiming, et al. “Enhanced protein fold recognition through a novel data integration approach.BMC Bioinformatics, vol. 10, Aug. 2009, p. 267. Epmc, doi:10.1186/1471-2105-10-267.
Ying Y, Huang K, Campbell C. Enhanced protein fold recognition through a novel data integration approach. BMC bioinformatics. 2009 Aug;10:267.
Journal cover image

Published In

BMC bioinformatics

DOI

EISSN

1471-2105

ISSN

1471-2105

Publication Date

August 2009

Volume

10

Start / End Page

267

Related Subject Headings

  • Proteins
  • Protein Folding
  • Pattern Recognition, Automated
  • Computational Biology
  • Bioinformatics
  • Algorithms
  • 49 Mathematical sciences
  • 46 Information and computing sciences
  • 31 Biological sciences
  • 08 Information and Computing Sciences