Maximum likelihood estimation of optimal scaling factors for expression array normalization


Journal Article

Data from expression arrays must be comparable before it can be analyzed rigorously on a large scale. Accurate normalization improves the comparability of expression data because it seeks to account for sources of variation obscuring the underlying variation of interest. Undesirable variation in reported expression levels originates in the preparation and hybridization of the sample as well as in the manufactured of the array itself, and may differ depending on the array technology being employed. Published research to date has not characterized the degree of variation associated with these sources, and results are often reported without tight statistical bounds on their significance. We analyze the distributions of reported levels of exogenous control species spiked into samples applied to 1280 Affymetrix arrays. We develop a model for explaining reported expression levels under an assumption of primarily multiplicative variation. To compute the scaling factors needed for normalization, we derive maximum likelihood and maximum a posteriori estimates for the parameters characterizing the multiplicative variation in reported spiked control expression levels. We conclude that the optimal scaling factors in this context are weighted geometric means and determine the appropriate weights. The optimal scaling factor estimates so computed can be used for subsequent array normalization.

Full Text

Duke Authors

Cited Authors

  • Hartemink, AJ; Gifford, DK; Jaakkola, TS; Young, RA

Published Date

  • January 1, 2001

Published In

Volume / Issue

  • 4266 /

Start / End Page

  • 132 - 140

International Standard Serial Number (ISSN)

  • 0277-786X

Digital Object Identifier (DOI)

  • 10.1117/12.427981

Citation Source

  • Scopus