Scholars@Duke publication: Weakly Semi-supervised phenotyping using Electronic Health records.

Weakly Semi-supervised phenotyping using Electronic Health records.

Publication , Journal Article

Nogues, I-E; Wen, J; Lin, Y; Liu, M; Tedeschi, SK; Geva, A; Cai, T; Hong, C

Published in: J Biomed Inform

October 2022

OBJECTIVE: Electronic Health Record (EHR) based phenotyping is a crucial yet challenging problem in the biomedical field. Though clinicians typically determine patient-level diagnoses via manual chart review, the sheer volume and heterogeneity of EHR data renders such tasks challenging, time-consuming, and prohibitively expensive, thus leading to a scarcity of clinical annotations in EHRs. Weakly supervised learning algorithms have been successfully applied to various EHR phenotyping problems, due to their ability to leverage information from large quantities of unlabeled samples to better inform predictions based on a far smaller number of patients. However, most weakly supervised methods are subject to the challenge to choose the right cutoff value to generate an optimal classifier. Furthermore, since they only utilize the most informative features (i.e., main ICD and NLP counts) they may fail for episodic phenotypes that cannot be consistently detected via ICD and NLP data. In this paper, we propose a label-efficient, weakly semi-supervised deep learning algorithm for EHR phenotyping (WSS-DL), which overcomes the limitations above. MATERIALS AND METHODS: WSS-DL classifies patient-level disease status through a series of learning stages: 1) generating silver standard labels, 2) deriving enhanced-silver-standard labels by fitting a weakly supervised deep learning model to data with silver standard labels as outcomes and high dimensional EHR features as input, and 3) obtaining the final prediction score and classifier by fitting a supervised learning model to data with a minimal number of gold standard labels as the outcome, and the enhanced-silver-standard labels and a minimal set of most informative EHR features as input. To assess the generalizability of WSS-DL across different phenotypes and medical institutions, we apply WSS-DL to classify a total of 17 diseases, including both acute and chronic conditions, using EHR data from three healthcare systems. Additionally, we determine the minimum quantity of training labels required by WSS-DL to outperform existing supervised and semi-supervised phenotyping methods. RESULTS: The proposed method, in combining the strengths of deep learning and weakly semi-supervised learning, successfully leverages the crucial phenotyping information contained in EHR features from unlabeled samples. Indeed, the deep learning model's ability to handle high-dimensional EHR features allows it to generate strong phenotype status predictions from silver standard labels. These predictions, in turn, provide highly effective features in the final logistic regression stage, leading to high phenotyping accuracy in notably small subsets of labeled data (e.g. n = 40 labeled samples). CONCLUSION: Our method's high performance in EHR datasets with very small numbers of labels indicates its potential value in aiding doctors to diagnose rare diseases as well as conditions susceptible to misdiagnosis.

Duke Scholars

Author Chuan Hong Biostatistics & Bioinformatics, Division of Translational Bi ...

Published In

J Biomed Inform

DOI

10.1016/j.jbi.2022.104175

EISSN

1532-0480

Publication Date

October 2022

Volume

134

Start / End Page

104175

Location

United States

Related Subject Headings

Supervised Machine Learning
Phenotype
Medical Informatics
Logistic Models
Electronic Health Records
Biomedical Engineering
Algorithms
4601 Applied computing
4203 Health services and systems
11 Medical and Health Sciences

Citation

APA

Chicago

ICMJE

MLA

NLM

Nogues, I.-E., Wen, J., Lin, Y., Liu, M., Tedeschi, S. K., Geva, A., … Hong, C. (2022). Weakly Semi-supervised phenotyping using Electronic Health records. J Biomed Inform, 134, 104175. https://doi.org/10.1016/j.jbi.2022.104175

Nogues, Isabelle-Emmanuella, Jun Wen, Yucong Lin, Molei Liu, Sara K. Tedeschi, Alon Geva, Tianxi Cai, and Chuan Hong. “Weakly Semi-supervised phenotyping using Electronic Health records.” J Biomed Inform 134 (October 2022): 104175. https://doi.org/10.1016/j.jbi.2022.104175.

Nogues I-E, Wen J, Lin Y, Liu M, Tedeschi SK, Geva A, et al. Weakly Semi-supervised phenotyping using Electronic Health records. J Biomed Inform. 2022 Oct;134:104175.

Nogues, Isabelle-Emmanuella, et al. “Weakly Semi-supervised phenotyping using Electronic Health records.” J Biomed Inform, vol. 134, Oct. 2022, p. 104175. Pubmed, doi:10.1016/j.jbi.2022.104175.

Nogues I-E, Wen J, Lin Y, Liu M, Tedeschi SK, Geva A, Cai T, Hong C. Weakly Semi-supervised phenotyping using Electronic Health records. J Biomed Inform. 2022 Oct;134:104175.

Published In

J Biomed Inform

DOI

10.1016/j.jbi.2022.104175

EISSN

1532-0480

Publication Date

October 2022

Volume

134

Start / End Page

104175

Location

United States

Related Subject Headings

Supervised Machine Learning
Phenotype
Medical Informatics
Logistic Models
Electronic Health Records
Biomedical Engineering
Algorithms
4601 Applied computing
4203 Health services and systems
11 Medical and Health Sciences