A case for developing domain-specific vocabularies for extracting suicide factors from healthcare notes.

Journal Article (Journal Article)

The onset and persistence of life events (LE) such as housing instability, job instability, and reduced social connection have been shown to increase risk of suicide. Predictive models for suicide risk have low sensitivity to many of these factors due to under-reporting in structured electronic health records (EHR) data. In this study, we show how natural language processing (NLP) can help identify LE in clinical notes at higher rates than reported medical codes. We compare domain-specific lexicons formulated from Unified Medical Language System (UMLS) selection, content analysis by subject matter experts (SME) and the Gravity Project, to data-driven expansion through contextual word embedding using Word2Vec. Our analysis covers EHR from the Veterans Affairs (VA) Corporate Data Warehouse (CDW) and measures the prevalence of LE across time for patients with known underlying cause of death in the National Death Index (NDI). We found that NLP methods had higher sensitivity of detecting LE relative to structured EHR (S-EHR) variables. We observed that, on average, suicide cases had higher rates of LE over time when compared to patients who died of non-suicide related causes with no previous history of diagnosed mental illness. When used to discriminate these outcomes, the inclusion of NLP derived variables increased the concentration of LE along the top 0.1%, 0.5% and 1% of predicted risk. LE were less informative when discriminating suicide death from non-suicide related death for patients with diagnosed mental illness.

Full Text

Duke Authors

Cited Authors

  • Morrow, D; Zamora-Resendiz, R; Beckham, JC; Kimbrel, NA; Oslin, DW; Tamang, S; Million Veteran Program Suicide Exemplar Work Group, ; Crivelli, S

Published Date

  • July 2022

Published In

Volume / Issue

  • 151 /

Start / End Page

  • 328 - 338

PubMed ID

  • 35533516

Electronic International Standard Serial Number (EISSN)

  • 1879-1379

Digital Object Identifier (DOI)

  • 10.1016/j.jpsychires.2022.04.009

Language

  • eng

Conference Location

  • England