Maximum likelihood estimation of optimal scaling factors for expression array normalization
Data from expression arrays must be comparable before it can be analyzed rigorously on a large scale. Accurate normalization improves the comparability of expression data because it seeks to account for sources of variation obscuring the underlying variation of interest. Undesirable variation in reported expression levels originates in the preparation and hybridization of the sample as well as in the manufactured of the array itself, and may differ depending on the array technology being employed. Published research to date has not characterized the degree of variation associated with these sources, and results are often reported without tight statistical bounds on their significance. We analyze the distributions of reported levels of exogenous control species spiked into samples applied to 1280 Affymetrix arrays. We develop a model for explaining reported expression levels under an assumption of primarily multiplicative variation. To compute the scaling factors needed for normalization, we derive maximum likelihood and maximum a posteriori estimates for the parameters characterizing the multiplicative variation in reported spiked control expression levels. We conclude that the optimal scaling factors in this context are weighted geometric means and determine the appropriate weights. The optimal scaling factor estimates so computed can be used for subsequent array normalization.
Duke Scholars
Published In
DOI
ISSN
Publication Date
Volume
Start / End Page
Related Subject Headings
- 5102 Atomic, molecular and optical physics
- 4009 Electronics, sensors and digital hardware
- 4006 Communications engineering
Citation
Published In
DOI
ISSN
Publication Date
Volume
Start / End Page
Related Subject Headings
- 5102 Atomic, molecular and optical physics
- 4009 Electronics, sensors and digital hardware
- 4006 Communications engineering