Maximum likelihood estimation of optimal scaling factors for expression array normalization
Published
Journal Article
Data from expression arrays must be comparable before it can be analyzed rigorously on a large scale. Accurate normalization improves the comparability of expression data because it seeks to account for sources of variation obscuring the underlying variation of interest. Undesirable variation in reported expression levels originates in the preparation and hybridization of the sample as well as in the manufactured of the array itself, and may differ depending on the array technology being employed. Published research to date has not characterized the degree of variation associated with these sources, and results are often reported without tight statistical bounds on their significance. We analyze the distributions of reported levels of exogenous control species spiked into samples applied to 1280 Affymetrix arrays. We develop a model for explaining reported expression levels under an assumption of primarily multiplicative variation. To compute the scaling factors needed for normalization, we derive maximum likelihood and maximum a posteriori estimates for the parameters characterizing the multiplicative variation in reported spiked control expression levels. We conclude that the optimal scaling factors in this context are weighted geometric means and determine the appropriate weights. The optimal scaling factor estimates so computed can be used for subsequent array normalization.
Full Text
Duke Authors
Cited Authors
- Hartemink, AJ; Gifford, DK; Jaakkola, TS; Young, RA
Published Date
- January 1, 2001
Published In
Volume / Issue
- 4266 /
Start / End Page
- 132 - 140
International Standard Serial Number (ISSN)
- 0277-786X
Digital Object Identifier (DOI)
- 10.1117/12.427981
Citation Source
- Scopus