Scholars@Duke publication: Alternate representation of distance matrices for characterization of protein structure

Alternate representation of distance matrices for characterization of protein structure

Publication , Conference

Marsolo, K; Parthasarathy, S

Published in: Proceedings - IEEE International Conference on Data Mining, ICDM

December 1, 2005

The most suitable method for the automated classification of protein structures remains an open problem in computational biology. In order to classify a protein structure with any accuracy, an effective representation must be chosen. Here we present two methods of representing protein structure. One involves representing the distances between the Ca atoms of a protein as a two-dimensional matrix and creating a model of the resulting surface with Zernike polynomials. The second uses a wavelet-based approach. We convert the distances between a protein's Cα atoms into a one-dimensional signal which is then decomposed using a discrete wavelet transformation. Using the Zernike coefficients and the approximation coefficients of the wavelet decomposition as feature vectors, we test the effectiveness of our representation with two different classifiers on a dataset of more than 600 proteins taken from the 27 mostpopulated SCOP folds. We find that the wavelet decomposition greatly outperforms the Zernike model.With the wavelet representation, we achieve an accuracy of approximately 56%, roughly 12% higher than results reported on a similar, but less-challenging dataset. In addition, we can couple our structure-based feature vectors with several sequence-based properties to increase accuracy another 5-7%. Finally, we use a multi-stage classification strategy on the combined features to increase performance to 78%, an improvement in accuracy of more than 15-20% and 34% over the highest reported sequence-based and structure-based classification results, respectively. © 2005 IEEE.

Duke Scholars

Author Keith Allen Marsolo Population Health Sciences

Published In

Proceedings - IEEE International Conference on Data Mining, ICDM

DOI

10.1109/ICDM.2005.19

ISSN

1550-4786

ISBN

9780769522784

Publication Date

December 1, 2005

Start / End Page

298 / 305

Citation

APA

Chicago

ICMJE

MLA

NLM

Marsolo, K., & Parthasarathy, S. (2005). Alternate representation of distance matrices for characterization of protein structure. In Proceedings - IEEE International Conference on Data Mining, ICDM (pp. 298–305). https://doi.org/10.1109/ICDM.2005.19

Marsolo, K., and S. Parthasarathy. “Alternate representation of distance matrices for characterization of protein structure.” In Proceedings - IEEE International Conference on Data Mining, ICDM, 298–305, 2005. https://doi.org/10.1109/ICDM.2005.19.

Marsolo K, Parthasarathy S. Alternate representation of distance matrices for characterization of protein structure. In: Proceedings - IEEE International Conference on Data Mining, ICDM. 2005. p. 298–305.

Marsolo, K., and S. Parthasarathy. “Alternate representation of distance matrices for characterization of protein structure.” Proceedings - IEEE International Conference on Data Mining, ICDM, 2005, pp. 298–305. Scopus, doi:10.1109/ICDM.2005.19.

Marsolo K, Parthasarathy S. Alternate representation of distance matrices for characterization of protein structure. Proceedings - IEEE International Conference on Data Mining, ICDM. 2005. p. 298–305.

Published In

Proceedings - IEEE International Conference on Data Mining, ICDM

DOI

10.1109/ICDM.2005.19

ISSN

1550-4786

ISBN

9780769522784

Publication Date

December 1, 2005

Start / End Page

298 / 305