HASH: a program to accurately predict protein Hα shifts from neighboring backbone shifts.

Published

Journal Article

Chemical shifts provide not only peak identities for analyzing nuclear magnetic resonance (NMR) data, but also an important source of conformational information for studying protein structures. Current structural studies requiring H(α) chemical shifts suffer from the following limitations. (1) For large proteins, the H(α) chemical shifts can be difficult to assign using conventional NMR triple-resonance experiments, mainly due to the fast transverse relaxation rate of C(α) that restricts the signal sensitivity. (2) Previous chemical shift prediction approaches either require homologous models with high sequence similarity or rely heavily on accurate backbone and side-chain structural coordinates. When neither sequence homologues nor structural coordinates are available, we must resort to other information to predict H(α) chemical shifts. Predicting accurate H(α) chemical shifts using other obtainable information, such as the chemical shifts of nearby backbone atoms (i.e., adjacent atoms in the sequence), can remedy the above dilemmas, and hence advance NMR-based structural studies of proteins. By specifically exploiting the dependencies on chemical shifts of nearby backbone atoms, we propose a novel machine learning algorithm, called HASH, to predict H(α) chemical shifts. HASH combines a new fragment-based chemical shift search approach with a non-parametric regression model, called the generalized additive model, to effectively solve the prediction problem. We demonstrate that the chemical shifts of nearby backbone atoms provide a reliable source of information for predicting accurate H(α) chemical shifts. Our testing results on different possible combinations of input data indicate that HASH has a wide rage of potential NMR applications in structural and biological studies of proteins.

Full Text

Duke Authors

Cited Authors

  • Zeng, J; Zhou, P; Donald, BR

Published Date

  • January 2013

Published In

Volume / Issue

  • 55 / 1

Start / End Page

  • 105 - 118

PubMed ID

  • 23242797

Pubmed Central ID

  • 23242797

Electronic International Standard Serial Number (EISSN)

  • 1573-5001

International Standard Serial Number (ISSN)

  • 0925-2738

Digital Object Identifier (DOI)

  • 10.1007/s10858-012-9693-7

Language

  • eng