Skip to main content
construction release_alert
Scholars@Duke will be undergoing maintenance April 11-15. Some features may be unavailable during this time.
cancel
Journal cover image

Semi-supervised calibration of noisy event risk (SCANER) with electronic health records.

Publication ,  Journal Article
Hong, C; Liang, L; Yuan, Q; Cho, K; Liao, KP; Pencina, MJ; Christiani, DC; Cai, T
Published in: J Biomed Inform
August 2023

OBJECTIVE: Electronic health records (EHR), containing detailed longitudinal clinical information on a large number of patients and covering broad patient populations, open opportunities for comprehensive predictive modeling of disease progression and treatment response. However, since EHRs were originally constructed for administrative purposes not for research, in the EHR-linked studies, it is often not feasible to capture reliable information for analytical variables, especially in the survival setting, when both accurate event status and event times are needed for model building. For example, progression-free survival (PFS), a commonly used survival outcome for cancer patients, often involves complex information embedded in free-text clinical notes and cannot be extracted reliably. Proxies of PFS time such as time to the first mention of progression in the notes are at best good approximations to the true event time. This leads to difficulty in efficiently estimating event rates for an EHR patient cohort. Estimating survival rates based on error-prone outcome definitions can lead to biased results and hamper the power in the downstream analysis. On the other hand, extracting accurate event time information via manual annotation is time and resource intensive. The objective of this study is to develop a calibrated survival rate estimator using noisy outcomes from EHR data. MATERIALS AND METHODS: In this paper, we propose a two-stage semi-supervised calibration of noisy event rate (SCANER) estimator that can effectively overcome censoring induced dependency and attains more robust performance (i.e., not sensitive to misspecification of the imputation model) by fully utilizing both a small-labeled set of gold-standard survival outcomes annotated via manual chart review and a set of proxy features automatically captured via EHR in the unlabeled set. We validate the SCANER estimator by estimating the PFS rates for a virtual cohort of lung cancer patients from one large tertiary care center and the ICU-free survival rates for COVID patients from two large tertiary care centers. RESULTS: In terms of survival rate estimates, the SCANER had very similar point estimates compared to the complete-case Kaplan Meier estimator. On the other hand, other benchmark methods for comparison, which fail to account for the induced dependency between event time and the censoring time conditioning on surrogate outcomes, produced biased results across all three case studies. In terms of standard errors, the SCANER estimator was more efficient than the KM estimator, with up to 50% efficiency gain. CONCLUSION: The SCANER estimator achieves more efficient, robust, and accurate survival rate estimates compared to existing approaches. This promising new approach can also improve the resolution (i.e., granularity of event time) by using labels conditioning on multiple surrogates, particularly among less common or poorly coded conditions.

Duke Scholars

Published In

J Biomed Inform

DOI

EISSN

1532-0480

Publication Date

August 2023

Volume

144

Start / End Page

104425

Location

United States

Related Subject Headings

  • Survival Analysis
  • Medical Informatics
  • Lung Neoplasms
  • Humans
  • Electronic Health Records
  • Calibration
  • COVID-19
  • Biomedical Engineering
  • 4601 Applied computing
  • 4203 Health services and systems
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Hong, C., Liang, L., Yuan, Q., Cho, K., Liao, K. P., Pencina, M. J., … Cai, T. (2023). Semi-supervised calibration of noisy event risk (SCANER) with electronic health records. J Biomed Inform, 144, 104425. https://doi.org/10.1016/j.jbi.2023.104425
Hong, Chuan, Liang Liang, Qianyu Yuan, Kelly Cho, Katherine P. Liao, Michael J. Pencina, David C. Christiani, and Tianxi Cai. “Semi-supervised calibration of noisy event risk (SCANER) with electronic health records.J Biomed Inform 144 (August 2023): 104425. https://doi.org/10.1016/j.jbi.2023.104425.
Hong C, Liang L, Yuan Q, Cho K, Liao KP, Pencina MJ, et al. Semi-supervised calibration of noisy event risk (SCANER) with electronic health records. J Biomed Inform. 2023 Aug;144:104425.
Hong, Chuan, et al. “Semi-supervised calibration of noisy event risk (SCANER) with electronic health records.J Biomed Inform, vol. 144, Aug. 2023, p. 104425. Pubmed, doi:10.1016/j.jbi.2023.104425.
Hong C, Liang L, Yuan Q, Cho K, Liao KP, Pencina MJ, Christiani DC, Cai T. Semi-supervised calibration of noisy event risk (SCANER) with electronic health records. J Biomed Inform. 2023 Aug;144:104425.
Journal cover image

Published In

J Biomed Inform

DOI

EISSN

1532-0480

Publication Date

August 2023

Volume

144

Start / End Page

104425

Location

United States

Related Subject Headings

  • Survival Analysis
  • Medical Informatics
  • Lung Neoplasms
  • Humans
  • Electronic Health Records
  • Calibration
  • COVID-19
  • Biomedical Engineering
  • 4601 Applied computing
  • 4203 Health services and systems