Skip to main content
Journal cover image

Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data.

Publication ,  Journal Article
Xu, Z; Zhao, C; Scales, CD; Henao, R; Goldstein, BA
Published in: BMC medical informatics and decision making
April 2022

In the early stages of the COVID-19 pandemic our institution was interested in forecasting how long surgical patients receiving elective procedures would spend in the hospital. Initial examination of our models indicated that, due to the skewed nature of the length of stay, accurate prediction was challenging and we instead opted for a simpler classification model. In this work we perform a deeper examination of predicting in-hospital length of stay.We used electronic health record data on length of stay from 42,209 elective surgeries. We compare different loss-functions (mean squared error, mean absolute error, mean relative error), algorithms (LASSO, Random Forests, multilayer perceptron) and data transformations (log and truncation). We also assess the performance of two stage hybrid classification-regression approach.Our results show that while it is possible to accurately predict short length of stays, predicting longer length of stay is extremely challenging. As such, we opt for a two-stage model that first classifies patients into long versus short length of stays and then a second stage that fits a regresssor among those predicted to have a short length of stay.The results indicate both the challenges and considerations necessary to applying machine-learning methods to skewed outcomes.Two-stage models allow those developing clinical decision support tools to explicitly acknowledge where they can and cannot make accurate predictions.

Duke Scholars

Published In

BMC medical informatics and decision making

DOI

EISSN

1472-6947

ISSN

1472-6947

Publication Date

April 2022

Volume

22

Issue

1

Start / End Page

110

Related Subject Headings

  • Pandemics
  • Medical Informatics
  • Machine Learning
  • Length of Stay
  • Humans
  • Hospitals
  • COVID-19
  • 4203 Health services and systems
  • 1103 Clinical Sciences
  • 0806 Information Systems
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Xu, Z., Zhao, C., Scales, C. D., Henao, R., & Goldstein, B. A. (2022). Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data. BMC Medical Informatics and Decision Making, 22(1), 110. https://doi.org/10.1186/s12911-022-01855-0
Xu, Zhenhui, Congwen Zhao, Charles D. Scales, Ricardo Henao, and Benjamin A. Goldstein. “Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data.BMC Medical Informatics and Decision Making 22, no. 1 (April 2022): 110. https://doi.org/10.1186/s12911-022-01855-0.
Xu Z, Zhao C, Scales CD, Henao R, Goldstein BA. Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data. BMC medical informatics and decision making. 2022 Apr;22(1):110.
Xu, Zhenhui, et al. “Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data.BMC Medical Informatics and Decision Making, vol. 22, no. 1, Apr. 2022, p. 110. Epmc, doi:10.1186/s12911-022-01855-0.
Xu Z, Zhao C, Scales CD, Henao R, Goldstein BA. Predicting in-hospital length of stay: a two-stage modeling approach to account for highly skewed data. BMC medical informatics and decision making. 2022 Apr;22(1):110.
Journal cover image

Published In

BMC medical informatics and decision making

DOI

EISSN

1472-6947

ISSN

1472-6947

Publication Date

April 2022

Volume

22

Issue

1

Start / End Page

110

Related Subject Headings

  • Pandemics
  • Medical Informatics
  • Machine Learning
  • Length of Stay
  • Humans
  • Hospitals
  • COVID-19
  • 4203 Health services and systems
  • 1103 Clinical Sciences
  • 0806 Information Systems