Skip to main content

A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data.

Publication ,  Journal Article
Parikh, RB; Linn, KA; Yan, J; Maciejewski, ML; Rosland, A-M; Volpp, KG; Groeneveld, PW; Navathe, AS
Published in: PLoS One
2021

BACKGROUND: Identifying individuals at risk for future hospitalization or death has been a major priority of population health management strategies. High-risk individuals are a heterogeneous group, and existing studies describing heterogeneity in high-risk individuals have been limited by data focused on clinical comorbidities and not socioeconomic or behavioral factors. We used machine learning clustering methods and linked comorbidity-based, sociodemographic, and psychobehavioral data to identify subgroups of high-risk Veterans and study long-term outcomes, hypothesizing that factors other than comorbidities would characterize several subgroups. METHODS AND FINDINGS: In this cross-sectional study, we used data from the VA Corporate Data Warehouse, a national repository of VA administrative claims and electronic health data. To identify high-risk Veterans, we used the Care Assessment Needs (CAN) score, a routinely-used VA model that predicts a patient's percentile risk of hospitalization or death at one year. Our study population consisted of 110,000 Veterans who were randomly sampled from 1,920,436 Veterans with a CAN score≥75th percentile in 2014. We categorized patient-level data into 119 independent variables based on demographics, comorbidities, pharmacy, vital signs, laboratories, and prior utilization. We used a previously validated density-based clustering algorithm to identify 30 subgroups of high-risk Veterans ranging in size from 50 to 2,446 patients. Mean CAN score ranged from 72.4 to 90.3 among subgroups. Two-year mortality ranged from 0.9% to 45.6% and was highest in the home-based care and metastatic cancer subgroups. Mean inpatient days ranged from 1.4 to 30.5 and were highest in the post-surgery and blood loss anemia subgroups. Mean emergency room visits ranged from 1.0 to 4.3 and were highest in the chronic sedative use and polysubstance use with amphetamine predominance subgroups. Five subgroups were distinguished by psychobehavioral factors and four subgroups were distinguished by sociodemographic factors. CONCLUSIONS: High-risk Veterans are a heterogeneous population consisting of multiple distinct subgroups-many of which are not defined by clinical comorbidities-with distinct utilization and outcome patterns. To our knowledge, this represents the largest application of ML clustering methods to subgroup a high-risk population. Further study is needed to determine whether distinct subgroups may benefit from individualized interventions.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

PLoS One

DOI

EISSN

1932-6203

Publication Date

2021

Volume

16

Issue

2

Start / End Page

e0247203

Location

United States

Related Subject Headings

  • Veterans
  • United States
  • Risk Factors
  • Middle Aged
  • Male
  • Machine Learning
  • Humans
  • Hospitalization
  • Hospital Mortality
  • General Science & Technology
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Parikh, R. B., Linn, K. A., Yan, J., Maciejewski, M. L., Rosland, A.-M., Volpp, K. G., … Navathe, A. S. (2021). A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data. PLoS One, 16(2), e0247203. https://doi.org/10.1371/journal.pone.0247203
Parikh, Ravi B., Kristin A. Linn, Jiali Yan, Matthew L. Maciejewski, Ann-Marie Rosland, Kevin G. Volpp, Peter W. Groeneveld, and Amol S. Navathe. “A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data.PLoS One 16, no. 2 (2021): e0247203. https://doi.org/10.1371/journal.pone.0247203.
Parikh, Ravi B., et al. “A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data.PLoS One, vol. 16, no. 2, 2021, p. e0247203. Pubmed, doi:10.1371/journal.pone.0247203.
Parikh RB, Linn KA, Yan J, Maciejewski ML, Rosland A-M, Volpp KG, Groeneveld PW, Navathe AS. A machine learning approach to identify distinct subgroups of veterans at risk for hospitalization or death using administrative and electronic health record data. PLoS One. 2021;16(2):e0247203.

Published In

PLoS One

DOI

EISSN

1932-6203

Publication Date

2021

Volume

16

Issue

2

Start / End Page

e0247203

Location

United States

Related Subject Headings

  • Veterans
  • United States
  • Risk Factors
  • Middle Aged
  • Male
  • Machine Learning
  • Humans
  • Hospitalization
  • Hospital Mortality
  • General Science & Technology