Prediction of 30-Day All-Cause Readmissions in Patients Hospitalized for Heart Failure: Comparison of Machine Learning and Other Statistical Approaches.
Importance: Several attempts have been made at developing models to predict 30-day readmissions in patients with heart failure, but none have sufficient discriminatory capacity for clinical use. Machine-learning (ML) algorithms represent a novel approach and may have potential advantages over traditional statistical modeling. Objective: To develop models using a ML approach to predict all-cause readmissions 30 days after discharge from a heart failure hospitalization and to compare ML model performance with models developed using "conventional" statistically based methods. Design, Setting, and Participants: Models were developed using ML algorithms, specifically, a tree-augmented naive Bayesian network, a random forest algorithm, and a gradient-boosted model and compared with traditional statistical methods using 2 independently derived logistic regression models (a de novo model and an a priori model developed using electronic health records) and a least absolute shrinkage and selection operator method. The study sample was randomly divided into training (70%) and validation (30%) sets to develop and test model performance. This was a registry-based study, and the study sample was obtained by linking patients from the Get With the Guidelines Heart Failure registry with Medicare data. After applying appropriate inclusion and exclusion criteria, 56 477 patients were included in our analysis. The study was conducted between January 4, 2005, and December 1, 2010, and analysis of the data was conducted between November 25, 2014, and June 30, 2016. Main Outcomes and Measures: C statistics were used for comparison of discriminatory capacity across models in the validation sample. Results: The overall 30-day rehospitalization rate was 21.2% (11 959 of 56 477 patients). For the tree-augmented naive Bayesian network, random forest, gradient-boosted, logistic regression, and least absolute shrinkage and selection operator models, C statistics for the validation sets were similar: 0.618, 0.607, 0.614, 0.624, and 0.618, respectively. Applying the previously validated electronic health records model to our study sample yielded a C statistic of 0.589 for the validation set. Conclusions and Relevance: Use of a number of ML algorithms did not improve prediction of 30-day heart failure readmissions compared with more traditional prediction models. Although there will likely be further applications of ML approaches in prognostic modeling, our study fits within the literature of limited predictive ability for heart failure readmissions.
Frizzell, JD; Liang, L; Schulte, PJ; Yancy, CW; Heidenreich, PA; Hernandez, AF; Bhatt, DL; Fonarow, GC; Laskey, WK
Volume / Issue
Start / End Page
Pubmed Central ID
Electronic International Standard Serial Number (EISSN)
Digital Object Identifier (DOI)