Skip to main content
Journal cover image

Putative biomarkers for predicting tumor sample purity based on gene expression data.

Publication ,  Journal Article
Li, Y; Umbach, DM; Bingham, A; Li, Q-J; Zhuang, Y; Li, L
Published in: BMC Genomics
December 27, 2019

BACKGROUND: Tumor purity is the percent of cancer cells present in a sample of tumor tissue. The non-cancerous cells (immune cells, fibroblasts, etc.) have an important role in tumor biology. The ability to determine tumor purity is important to understand the roles of cancerous and non-cancerous cells in a tumor. METHODS: We applied a supervised machine learning method, XGBoost, to data from 33 TCGA tumor types to predict tumor purity using RNA-seq gene expression data. RESULTS: Across the 33 tumor types, the median correlation between observed and predicted tumor-purity ranged from 0.75 to 0.87 with small root mean square errors, suggesting that tumor purity can be accurately predicted υσινγ expression data. We further confirmed that expression levels of a ten-gene set (CSF2RB, RHOH, C1S, CCDC69, CCL22, CYTIP, POU2AF1, FGR, CCL21, and IL7R) were predictive of tumor purity regardless of tumor type. We tested whether our set of ten genes could accurately predict tumor purity of a TCGA-independent data set. We showed that expression levels from our set of ten genes were highly correlated (ρ = 0.88) with the actual observed tumor purity. CONCLUSIONS: Our analyses suggested that the ten-gene set may serve as a biomarker for tumor purity prediction using gene expression data.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

BMC Genomics

DOI

EISSN

1471-2164

Publication Date

December 27, 2019

Volume

20

Issue

1

Start / End Page

1021

Location

England

Related Subject Headings

  • Supervised Machine Learning
  • Sequence Analysis, RNA
  • Reproducibility of Results
  • Neoplasms
  • Humans
  • Gene Expression Regulation, Neoplastic
  • Gene Expression Profiling
  • Databases, Genetic
  • Computational Biology
  • Biomarkers, Tumor
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Li, Y., Umbach, D. M., Bingham, A., Li, Q.-J., Zhuang, Y., & Li, L. (2019). Putative biomarkers for predicting tumor sample purity based on gene expression data. BMC Genomics, 20(1), 1021. https://doi.org/10.1186/s12864-019-6412-8
Li, Yuanyuan, David M. Umbach, Adrienna Bingham, Qi-Jing Li, Yuan Zhuang, and Leping Li. “Putative biomarkers for predicting tumor sample purity based on gene expression data.BMC Genomics 20, no. 1 (December 27, 2019): 1021. https://doi.org/10.1186/s12864-019-6412-8.
Li Y, Umbach DM, Bingham A, Li Q-J, Zhuang Y, Li L. Putative biomarkers for predicting tumor sample purity based on gene expression data. BMC Genomics. 2019 Dec 27;20(1):1021.
Li, Yuanyuan, et al. “Putative biomarkers for predicting tumor sample purity based on gene expression data.BMC Genomics, vol. 20, no. 1, Dec. 2019, p. 1021. Pubmed, doi:10.1186/s12864-019-6412-8.
Li Y, Umbach DM, Bingham A, Li Q-J, Zhuang Y, Li L. Putative biomarkers for predicting tumor sample purity based on gene expression data. BMC Genomics. 2019 Dec 27;20(1):1021.
Journal cover image

Published In

BMC Genomics

DOI

EISSN

1471-2164

Publication Date

December 27, 2019

Volume

20

Issue

1

Start / End Page

1021

Location

England

Related Subject Headings

  • Supervised Machine Learning
  • Sequence Analysis, RNA
  • Reproducibility of Results
  • Neoplasms
  • Humans
  • Gene Expression Regulation, Neoplastic
  • Gene Expression Profiling
  • Databases, Genetic
  • Computational Biology
  • Biomarkers, Tumor