Skip to main content

Transcriptome prediction performance across machine learning models and diverse ancestries.

Publication ,  Journal Article
Okoro, PC; Schubert, R; Guo, X; Johnson, WC; Rotter, JI; Hoeschele, I; Liu, Y; Im, HK; Luke, A; Dugas, LR; Wheeler, HE
Published in: HGG Adv
April 8, 2021

Transcriptome prediction methods such as PrediXcan and FUSION have become popular in complex trait mapping. Most transcriptome prediction models have been trained in European populations using methods that make parametric linear assumptions like the elastic net (EN). To potentially further optimize imputation performance of gene expression across global populations, we built transcriptome prediction models using both linear and non-linear machine learning (ML) algorithms and evaluated their performance in comparison to EN. We trained models using genotype and blood monocyte transcriptome data from the Multi-Ethnic Study of Atherosclerosis (MESA) comprising individuals of African, Hispanic, and European ancestries and tested them using genotype and whole-blood transcriptome data from the Modeling the Epidemiology Transition Study (METS) comprising individuals of African ancestries. We show that the prediction performance is highest when the training and the testing population share similar ancestries regardless of the prediction algorithm used. While EN generally outperformed random forest (RF), support vector regression (SVR), and K nearest neighbor (KNN), we found that RF outperformed EN for some genes, particularly between disparate ancestries, suggesting potential robustness and reduced variability of RF imputation performance across global populations. When applied to a high-density lipoprotein (HDL) phenotype, we show including RF prediction models in PrediXcan revealed potential gene associations missed by EN models. Therefore, by integrating other ML modeling into PrediXcan and diversifying our training populations to include more global ancestries, we may uncover new genes associated with complex traits.

Duke Scholars

Published In

HGG Adv

DOI

EISSN

2666-2477

Publication Date

April 8, 2021

Volume

2

Issue

2

Location

United States

Related Subject Headings

  • 3105 Genetics
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Okoro, P. C., Schubert, R., Guo, X., Johnson, W. C., Rotter, J. I., Hoeschele, I., … Wheeler, H. E. (2021). Transcriptome prediction performance across machine learning models and diverse ancestries. HGG Adv, 2(2). https://doi.org/10.1016/j.xhgg.2020.100019
Okoro, Paul C., Ryan Schubert, Xiuqing Guo, W Craig Johnson, Jerome I. Rotter, Ina Hoeschele, Yongmei Liu, et al. “Transcriptome prediction performance across machine learning models and diverse ancestries.HGG Adv 2, no. 2 (April 8, 2021). https://doi.org/10.1016/j.xhgg.2020.100019.
Okoro PC, Schubert R, Guo X, Johnson WC, Rotter JI, Hoeschele I, et al. Transcriptome prediction performance across machine learning models and diverse ancestries. HGG Adv. 2021 Apr 8;2(2).
Okoro, Paul C., et al. “Transcriptome prediction performance across machine learning models and diverse ancestries.HGG Adv, vol. 2, no. 2, Apr. 2021. Pubmed, doi:10.1016/j.xhgg.2020.100019.
Okoro PC, Schubert R, Guo X, Johnson WC, Rotter JI, Hoeschele I, Liu Y, Im HK, Luke A, Dugas LR, Wheeler HE. Transcriptome prediction performance across machine learning models and diverse ancestries. HGG Adv. 2021 Apr 8;2(2).

Published In

HGG Adv

DOI

EISSN

2666-2477

Publication Date

April 8, 2021

Volume

2

Issue

2

Location

United States

Related Subject Headings

  • 3105 Genetics