Predicting mortality over different time horizons: which data elements are needed?

Journal Article (Journal Article)

OBJECTIVE: Electronic health records (EHRs) are a resource for "big data" analytics, containing a variety of data elements. We investigate how different categories of information contribute to prediction of mortality over different time horizons among patients undergoing hemodialysis treatment. MATERIAL AND METHODS: We derived prediction models for mortality over 7 time horizons using EHR data on older patients from a national chain of dialysis clinics linked with administrative data using LASSO (least absolute shrinkage and selection operator) regression. We assessed how different categories of information relate to risk assessment and compared discrete models to time-to-event models. RESULTS: The best predictors used all the available data (c-statistic ranged from 0.72-0.76), with stronger models in the near term. While different variable groups showed different utility, exclusion of any particular group did not lead to a meaningfully different risk assessment. Discrete time models performed better than time-to-event models. CONCLUSIONS: Different variable groups were predictive over different time horizons, with vital signs most predictive for near-term mortality and demographic and comorbidities more important in long-term mortality.

Full Text

Duke Authors

Cited Authors

  • Goldstein, BA; Pencina, MJ; Montez-Rath, ME; Winkelmayer, WC

Published Date

  • January 2017

Published In

Volume / Issue

  • 24 / 1

Start / End Page

  • 176 - 181

PubMed ID

  • 27357832

Pubmed Central ID

  • PMC5201182

Electronic International Standard Serial Number (EISSN)

  • 1527-974X

Digital Object Identifier (DOI)

  • 10.1093/jamia/ocw057


  • eng

Conference Location

  • England