Regression analysis of multiple protein structures.

Published

Journal Article

A general framework is presented for analyzing multiple protein structures using statistical regression methods. The regression approach can superimpose protein structures rigidly or with shear. Also, this approach can superimpose multiple structures explicitly, without resorting to pairwise superpositions. The algorithm alternates between matching corresponding landmarks among the protein structures and superimposing these landmarks. Matching is performed using a robust dynamic programming technique that uses gap penalties that adapt to the given data. Superposition is performed using either orthogonal transformations, which impose the rigid-body assumption, or affine transformations, which allow shear. The resulting regression model of a protein family measures the amount of structural variability at each landmark. A variation of our algorithm permits a separate weight for each landmark, thereby allowing one to emphasize particular segments of a protein structure or to compensate for variances that differ at various positions in a structure. In addition, a method is introduced for finding an initial correspondence, by measuring the discrete curvature along each protein backbone. Discrete curvature also characterizes the secondary structure of a protein backbone, distinguishing among helical, strand, and loop regions. An example is presented involving a set of seven globin structures. Regression analysis, using both affine and orthogonal transformations, reveals that globins are most strongly conserved structurally in helical regions, particularly in the mid-regions of the E, F, and G helices.

Full Text

Duke Authors

Cited Authors

  • Wu, TD; Schmidler, SC; Hastie, T; Brutlag, DL

Published Date

  • January 1998

Published In

Volume / Issue

  • 5 / 3

Start / End Page

  • 585 - 595

PubMed ID

  • 9773352

Pubmed Central ID

  • 9773352

Electronic International Standard Serial Number (EISSN)

  • 1557-8666

International Standard Serial Number (ISSN)

  • 1066-5277

Digital Object Identifier (DOI)

  • 10.1089/cmb.1998.5.585

Language

  • eng