Skip to main content
Journal cover image

Incorporating informatively collected laboratory data from EHR in clinical prediction models.

Publication ,  Journal Article
Sun, M; Engelhard, MM; Bedoya, AD; Goldstein, BA
Published in: BMC Med Inform Decis Mak
July 24, 2024

BACKGROUND: Electronic Health Records (EHR) are widely used to develop clinical prediction models (CPMs). However, one of the challenges is that there is often a degree of informative missing data. For example, laboratory measures are typically taken when a clinician is concerned that there is a need. When data are the so-called Not Missing at Random (NMAR), analytic strategies based on other missingness mechanisms are inappropriate. In this work, we seek to compare the impact of different strategies for handling missing data on CPMs performance. METHODS: We considered a predictive model for rapid inpatient deterioration as an exemplar implementation. This model incorporated twelve laboratory measures with varying levels of missingness. Five labs had missingness rate levels around 50%, and the other seven had missingness levels around 90%. We included them based on the belief that their missingness status can be highly informational for the prediction. In our study, we explicitly compared the various missing data strategies: mean imputation, normal-value imputation, conditional imputation, categorical encoding, and missingness embeddings. Some of these were also combined with the last observation carried forward (LOCF). We implemented logistic LASSO regression, multilayer perceptron (MLP), and long short-term memory (LSTM) models as the downstream classifiers. We compared the AUROC of testing data and used bootstrapping to construct 95% confidence intervals. RESULTS: We had 105,198 inpatient encounters, with 4.7% having experienced the deterioration outcome of interest. LSTM models generally outperformed other cross-sectional models, where embedding approaches and categorical encoding yielded the best results. For the cross-sectional models, normal-value imputation with LOCF generated the best results. CONCLUSION: Strategies that accounted for the possibility of NMAR missing data yielded better model performance than those did not. The embedding method had an advantage as it did not require prior clinical knowledge. Using LOCF could enhance the performance of cross-sectional models but have countereffects in LSTM models.

Duke Scholars

Published In

BMC Med Inform Decis Mak

DOI

EISSN

1472-6947

Publication Date

July 24, 2024

Volume

24

Issue

1

Start / End Page

206

Location

England

Related Subject Headings

  • Models, Statistical
  • Medical Informatics
  • Humans
  • Electronic Health Records
  • Clinical Laboratory Techniques
  • Clinical Deterioration
  • 4203 Health services and systems
  • 1103 Clinical Sciences
  • 0806 Information Systems
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Sun, M., Engelhard, M. M., Bedoya, A. D., & Goldstein, B. A. (2024). Incorporating informatively collected laboratory data from EHR in clinical prediction models. BMC Med Inform Decis Mak, 24(1), 206. https://doi.org/10.1186/s12911-024-02612-1
Sun, Minghui, Matthew M. Engelhard, Armando D. Bedoya, and Benjamin A. Goldstein. “Incorporating informatively collected laboratory data from EHR in clinical prediction models.BMC Med Inform Decis Mak 24, no. 1 (July 24, 2024): 206. https://doi.org/10.1186/s12911-024-02612-1.
Sun M, Engelhard MM, Bedoya AD, Goldstein BA. Incorporating informatively collected laboratory data from EHR in clinical prediction models. BMC Med Inform Decis Mak. 2024 Jul 24;24(1):206.
Sun, Minghui, et al. “Incorporating informatively collected laboratory data from EHR in clinical prediction models.BMC Med Inform Decis Mak, vol. 24, no. 1, July 2024, p. 206. Pubmed, doi:10.1186/s12911-024-02612-1.
Sun M, Engelhard MM, Bedoya AD, Goldstein BA. Incorporating informatively collected laboratory data from EHR in clinical prediction models. BMC Med Inform Decis Mak. 2024 Jul 24;24(1):206.
Journal cover image

Published In

BMC Med Inform Decis Mak

DOI

EISSN

1472-6947

Publication Date

July 24, 2024

Volume

24

Issue

1

Start / End Page

206

Location

England

Related Subject Headings

  • Models, Statistical
  • Medical Informatics
  • Humans
  • Electronic Health Records
  • Clinical Laboratory Techniques
  • Clinical Deterioration
  • 4203 Health services and systems
  • 1103 Clinical Sciences
  • 0806 Information Systems