Alternate representation of distance matrices for characterization of protein structure

Published

Conference Paper

The most suitable method for the automated classification of protein structures remains an open problem in computational biology. In order to classify a protein structure with any accuracy, an effective representation must be chosen. Here we present two methods of representing protein structure. One involves representing the distances between the Ca atoms of a protein as a two-dimensional matrix and creating a model of the resulting surface with Zernike polynomials. The second uses a wavelet-based approach. We convert the distances between a protein's Cα atoms into a one-dimensional signal which is then decomposed using a discrete wavelet transformation. Using the Zernike coefficients and the approximation coefficients of the wavelet decomposition as feature vectors, we test the effectiveness of our representation with two different classifiers on a dataset of more than 600 proteins taken from the 27 mostpopulated SCOP folds. We find that the wavelet decomposition greatly outperforms the Zernike model.With the wavelet representation, we achieve an accuracy of approximately 56%, roughly 12% higher than results reported on a similar, but less-challenging dataset. In addition, we can couple our structure-based feature vectors with several sequence-based properties to increase accuracy another 5-7%. Finally, we use a multi-stage classification strategy on the combined features to increase performance to 78%, an improvement in accuracy of more than 15-20% and 34% over the highest reported sequence-based and structure-based classification results, respectively. © 2005 IEEE.

Full Text

Duke Authors

Cited Authors

  • Marsolo, K; Parthasarathy, S

Published Date

  • December 1, 2005

Published In

Start / End Page

  • 298 - 305

International Standard Serial Number (ISSN)

  • 1550-4786

International Standard Book Number 10 (ISBN-10)

  • 0769522785

International Standard Book Number 13 (ISBN-13)

  • 9780769522784

Digital Object Identifier (DOI)

  • 10.1109/ICDM.2005.19

Citation Source

  • Scopus