Regression analysis of multiple protein structures
A general framework is presented for analyzing multiple protein structures. A family of related protein structures may be analyzed using statistical regression methods. The analysis requires alternating steps of finding correspondences among the protein structures and superimposing the corresponding landmarks. The superposition step may be performed using either affine or orthogonal transformations, thereby allowing protein structures to undergo either pure rotations or rotation plus shear operations. Regression analysis permits a separate weight for each position, allowing one to emphasize particular segments of a protein structure or to compensate for variances that differ at various positions in a structure. In addition, a novel method is introduced for finding an initial correspondence, based on matching discrete curvatures along the protein backbone. Another novel method is introduced for obtaining gap functions that adapt to the given data, thereby making dynamic programming methods more robust. An example is presented involving a set of seven globin structures. Regression analysis, using both affine and orthogonal transformations, reveals that globins are most strongly conserved structurally in the mid-regions of the E and G helices.