Skip to main content
Journal cover image

On the use of structure and sequence-based features for protein classification and retrieval

Publication ,  Journal Article
Marsolo, K; Parthasarathy, S
Published in: Knowledge and Information Systems
January 1, 2008

The need to retrieve or classify proteins using structure or sequence-based similarity underlies many biomedical applications. In drug discovery, researchers search for proteins that share specific chemical properties as sources for new treatment. With folding simulations, similar intermediate structures might be indicative of a common folding pathway. Here we present two normalized, stand-alone representations of proteins that enable fast and efficient object retrieval based on sequence or structure. To create our sequence-based representation, we take the profiles returned by the PSI-BLAST alignment algorithm and create a normalized summary using a discrete wavelet transform. For our structural representation, we transform each 3D structure into a normalized 2D distance matrix and apply a 2D wavelet decomposition to generate our descriptor. We also create a hybrid representation by concatenating together the above descriptors. We evaluate the generality of our models by using them as indices for database retrieval experiments as well as feature vectors for classification. We find that our methods provide excellent performance when compared with the state-of-the-art for each task. Our results show that the sequence-based representation is generally superior to the structure-based representation and that in the classification context, the hybrid strategy affords a significant improvement over sequence or structure. © Springer-Verlag London Limited 2007.

Duke Scholars

Published In

Knowledge and Information Systems

DOI

EISSN

0219-3116

ISSN

0219-1377

Publication Date

January 1, 2008

Volume

14

Issue

1

Start / End Page

59 / 80

Related Subject Headings

  • Information Systems
  • 46 Information and computing sciences
  • 0806 Information Systems
  • 0801 Artificial Intelligence and Image Processing
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Marsolo, K., & Parthasarathy, S. (2008). On the use of structure and sequence-based features for protein classification and retrieval. Knowledge and Information Systems, 14(1), 59–80. https://doi.org/10.1007/s10115-007-0088-0
Marsolo, K., and S. Parthasarathy. “On the use of structure and sequence-based features for protein classification and retrieval.” Knowledge and Information Systems 14, no. 1 (January 1, 2008): 59–80. https://doi.org/10.1007/s10115-007-0088-0.
Marsolo K, Parthasarathy S. On the use of structure and sequence-based features for protein classification and retrieval. Knowledge and Information Systems. 2008 Jan 1;14(1):59–80.
Marsolo, K., and S. Parthasarathy. “On the use of structure and sequence-based features for protein classification and retrieval.” Knowledge and Information Systems, vol. 14, no. 1, Jan. 2008, pp. 59–80. Scopus, doi:10.1007/s10115-007-0088-0.
Marsolo K, Parthasarathy S. On the use of structure and sequence-based features for protein classification and retrieval. Knowledge and Information Systems. 2008 Jan 1;14(1):59–80.
Journal cover image

Published In

Knowledge and Information Systems

DOI

EISSN

0219-3116

ISSN

0219-1377

Publication Date

January 1, 2008

Volume

14

Issue

1

Start / End Page

59 / 80

Related Subject Headings

  • Information Systems
  • 46 Information and computing sciences
  • 0806 Information Systems
  • 0801 Artificial Intelligence and Image Processing