Regression analysis of multiple protein structures.
A general framework is presented for analyzing multiple protein structures using statistical regression methods. The regression approach can superimpose protein structures rigidly or with shear. Also, this approach can superimpose multiple structures explicitly, without resorting to pairwise superpositions. The algorithm alternates between matching corresponding landmarks among the protein structures and superimposing these landmarks. Matching is performed using a robust dynamic programming technique that uses gap penalties that adapt to the given data. Superposition is performed using either orthogonal transformations, which impose the rigid-body assumption, or affine transformations, which allow shear. The resulting regression model of a protein family measures the amount of structural variability at each landmark. A variation of our algorithm permits a separate weight for each landmark, thereby allowing one to emphasize particular segments of a protein structure or to compensate for variances that differ at various positions in a structure. In addition, a method is introduced for finding an initial correspondence, by measuring the discrete curvature along each protein backbone. Discrete curvature also characterizes the secondary structure of a protein backbone, distinguishing among helical, strand, and loop regions. An example is presented involving a set of seven globin structures. Regression analysis, using both affine and orthogonal transformations, reveals that globins are most strongly conserved structurally in helical regions, particularly in the mid-regions of the E, F, and G helices.
Wu, TD; Schmidler, SC; Hastie, T; Brutlag, DL
Volume / Issue
Start / End Page
Pubmed Central ID
Electronic International Standard Serial Number (EISSN)
International Standard Serial Number (ISSN)
Digital Object Identifier (DOI)