Skip to main content
Journal cover image

Tree-based classification model for Long-COVID infection prediction with age stratification using data from the National COVID Cohort Collaborative.

Publication ,  Journal Article
Wang, WK; Jeong, H; Hershkovich, L; Cho, P; Singh, K; Lederer, L; Roghanizad, AR; Shandhi, MMH; Kibbe, W; Dunn, J ...
Published in: JAMIA Open
December 2024

OBJECTIVES: We propose and validate a domain knowledge-driven classification model for diagnosing post-acute sequelae of SARS-CoV-2 infection (PASC), also known as Long COVID, using Electronic Health Records (EHRs) data. MATERIALS AND METHODS: We developed a robust model that incorporates features strongly indicative of PASC or associated with the severity of COVID-19 symptoms as identified in our literature review. The XGBoost tree-based architecture was chosen for its ability to handle class-imbalanced data and its potential for high interpretability. Using the training data provided by the Long COVID Computation Challenge (L3C), which was a sample of the National COVID Cohort Collaborative (N3C), our models were fine-tuned and calibrated to optimize Area Under the Receiver Operating characteristic curve (AUROC) and the F1 score, following best practices for the class-imbalanced N3C data. RESULTS: Our age-stratified classification model demonstrated strong performance with an average 5-fold cross-validated AUROC of 0.844 and F1 score of 0.539 across the young adult, mid-aged, and older-aged populations in the training data. In an independent testing dataset, which was made available after the challenge was over, we achieved an overall AUROC score of 0.814 and F1 score of 0.545. DISCUSSION: The results demonstrated the utility of knowledge-driven feature engineering in a sparse EHR data and demographic stratification in model development to diagnose a complex and heterogeneously presenting condition like PASC. The model's architecture, mirroring natural clinician decision-making processes, contributed to its robustness and interpretability, which are crucial for clinical translatability. Further, the model's generalizability was evaluated over a new cross-sectional data as provided in the later stages of the L3C challenge. CONCLUSION: The study proposed and validated the effectiveness of age-stratified, tree-based classification models to diagnose PASC. Our approach highlights the potential of machine learning in addressing the diagnostic challenges posed by the heterogeneity of Long-COVID symptoms.

Duke Scholars

Published In

JAMIA Open

DOI

EISSN

2574-2531

Publication Date

December 2024

Volume

7

Issue

4

Start / End Page

ooae111

Location

United States

Related Subject Headings

  • 4203 Health services and systems
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wang, W. K., Jeong, H., Hershkovich, L., Cho, P., Singh, K., Lederer, L., … National COVID Cohort Collaborative (N3C) Consortium. (2024). Tree-based classification model for Long-COVID infection prediction with age stratification using data from the National COVID Cohort Collaborative. JAMIA Open, 7(4), ooae111. https://doi.org/10.1093/jamiaopen/ooae111
Wang, Will Ke, Hayoung Jeong, Leeor Hershkovich, Peter Cho, Karnika Singh, Lauren Lederer, Ali R. Roghanizad, et al. “Tree-based classification model for Long-COVID infection prediction with age stratification using data from the National COVID Cohort Collaborative.JAMIA Open 7, no. 4 (December 2024): ooae111. https://doi.org/10.1093/jamiaopen/ooae111.
Wang WK, Jeong H, Hershkovich L, Cho P, Singh K, Lederer L, et al. Tree-based classification model for Long-COVID infection prediction with age stratification using data from the National COVID Cohort Collaborative. JAMIA Open. 2024 Dec;7(4):ooae111.
Wang, Will Ke, et al. “Tree-based classification model for Long-COVID infection prediction with age stratification using data from the National COVID Cohort Collaborative.JAMIA Open, vol. 7, no. 4, Dec. 2024, p. ooae111. Pubmed, doi:10.1093/jamiaopen/ooae111.
Wang WK, Jeong H, Hershkovich L, Cho P, Singh K, Lederer L, Roghanizad AR, Shandhi MMH, Kibbe W, Dunn J, National COVID Cohort Collaborative (N3C) Consortium. Tree-based classification model for Long-COVID infection prediction with age stratification using data from the National COVID Cohort Collaborative. JAMIA Open. 2024 Dec;7(4):ooae111.
Journal cover image

Published In

JAMIA Open

DOI

EISSN

2574-2531

Publication Date

December 2024

Volume

7

Issue

4

Start / End Page

ooae111

Location

United States

Related Subject Headings

  • 4203 Health services and systems