A divide-and-conquer strategy to solve the out-of-memory problem of processing thousands of Affymetrix microarrays.

Journal Article

Out-of-memory problem was frequently encountered when processing thousands of CEL files using Bioconductor. We propose a divide-and-conquer strategy combined with randomised resampling to solve this problem. The CAMDA 2007 META-analysis data set which contains 5896 CEL files was used to test the approach on a typical commodity computer cluster by running established pre-processing algorithms for Affymetrix arrays in the Bioconductor package. The results were validated against a golden standard obtained by using a supercomputer. In addition to the performance improvement, the general divide-and-conquer strategy can be applied to any other normalisation algorithms without modifying the underlying implementation.

Full Text

Duke Authors

Cited Authors

  • Lee, C-J; Fu, D; Du, P; Jiang, H; Lin, SM; Kibbe, W

Published Date

  • January 2008

Published In

Volume / Issue

  • 1 / 4

Start / End Page

  • 396 - 405

PubMed ID

  • 20063464

Electronic International Standard Serial Number (EISSN)

  • 1756-0764

International Standard Serial Number (ISSN)

  • 1756-0756

Digital Object Identifier (DOI)

  • 10.1504/ijcbdd.2008.022209

Language

  • eng