Scholars@Duke publication: Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data.

Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data.

Publication , Journal Article

Fan, S; Kind, T; Cajka, T; Hazen, SL; Tang, WHW; Kaddurah-Daouk, R; Irvin, MR; Arnett, DK; Barupal, DK; Fiehn, O

Published in: Anal Chem

March 5, 2019

Large-scale untargeted lipidomics experiments involve the measurement of hundreds to thousands of samples. Such data sets are usually acquired on one instrument over days or weeks of analysis time. Such extensive data acquisition processes introduce a variety of systematic errors, including batch differences, longitudinal drifts, or even instrument-to-instrument variation. Technical data variance can obscure the true biological signal and hinder biological discoveries. To combat this issue, we present a novel normalization approach based on using quality control pool samples (QC). This method is called systematic error removal using random forest (SERRF) for eliminating the unwanted systematic variations in large sample sets. We compared SERRF with 15 other commonly used normalization methods using six lipidomics data sets from three large cohort studies (832, 1162, and 2696 samples). SERRF reduced the average technical errors for these data sets to 5% relative standard deviation. We conclude that SERRF outperforms other existing methods and can significantly reduce the unwanted systematic variation, revealing biological variance of interest.

Duke Scholars

Author Rima Fathi Kaddurah-Daouk Psychiatry & Behavioral Sciences, Behavioral Medicine & Neur ...

Altmetric Attention Stats

Dimensions Citation Stats

Published In

Anal Chem

DOI

10.1021/acs.analchem.8b05592

EISSN

1520-6882

Publication Date

March 5, 2019

Volume

Issue

Start / End Page

3590 / 3596

Location

United States

Related Subject Headings

Scientific Experimental Error
Quality Control
Lipidomics
Datasets as Topic
Analytical Chemistry
4004 Chemical engineering
3401 Analytical chemistry
3205 Medical biochemistry and metabolomics
0399 Other Chemical Sciences
0301 Analytical Chemistry

Citation

APA

Chicago

ICMJE

MLA

NLM

Fan, S., Kind, T., Cajka, T., Hazen, S. L., Tang, W. H. W., Kaddurah-Daouk, R., … Fiehn, O. (2019). Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data. Anal Chem, 91(5), 3590–3596. https://doi.org/10.1021/acs.analchem.8b05592

Fan, Sili, Tobias Kind, Tomas Cajka, Stanley L. Hazen, WH Wilson Tang, Rima Kaddurah-Daouk, Marguerite R. Irvin, Donna K. Arnett, Dinesh K. Barupal, and Oliver Fiehn. “Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data.” Anal Chem 91, no. 5 (March 5, 2019): 3590–96. https://doi.org/10.1021/acs.analchem.8b05592.

Fan S, Kind T, Cajka T, Hazen SL, Tang WHW, Kaddurah-Daouk R, et al. Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data. Anal Chem. 2019 Mar 5;91(5):3590–6.

Fan, Sili, et al. “Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data.” Anal Chem, vol. 91, no. 5, Mar. 2019, pp. 3590–96. Pubmed, doi:10.1021/acs.analchem.8b05592.

Fan S, Kind T, Cajka T, Hazen SL, Tang WHW, Kaddurah-Daouk R, Irvin MR, Arnett DK, Barupal DK, Fiehn O. Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data. Anal Chem. 2019 Mar 5;91(5):3590–3596.

Published In

Anal Chem

DOI

10.1021/acs.analchem.8b05592

EISSN

1520-6882

Publication Date

March 5, 2019

Volume

Issue

Start / End Page

3590 / 3596

Location

United States

Related Subject Headings

Scientific Experimental Error
Quality Control
Lipidomics
Datasets as Topic
Analytical Chemistry
4004 Chemical engineering
3401 Analytical chemistry
3205 Medical biochemistry and metabolomics
0399 Other Chemical Sciences
0301 Analytical Chemistry