Scholars@Duke publication: SBDH-Reader: an LLM-powered method for extracting social and behavioral determinants of health from clinical notes.

SBDH-Reader: an LLM-powered method for extracting social and behavioral determinants of health from clinical notes.

Publication , Journal Article

Gu, Z; He, L; Naeem, A; Chan, PM; Mohamed, A; Khalil, H; Guo, Y; Huang, J; Villanueva-Miranda, I; Ding, Y; Shi, W; Dupre, ME; Xiao, G ...

Published in: medRxiv

June 26, 2025

Published version (DOI) Link to item

OBJECTIVE: Social and behavioral determinants of health (SBDH) are increasingly recognized as essential for prognostication and informing targeted interventions. Clinical notes often contain details about SBDH in unstructured format. Conventional extraction methods for these data tend to be labor intensive, inaccurate, and/or unscalable. In this study, we aim to develop and validate an LLM-powered method to extract structured SBDH data from clinical notes through prompt engineering. MATERIALS AND METHODS: We developed SBDH-Reader to extract six categories of granular SBDH data by prompting GPT-4o, including employment, housing, marital status, and substance use including alcohol, tobacco, and drug use. SBDH-Reader was developed using 7,225 notes from 6,382 patients in the MIMIC-III database (2001-2012) and externally validated using 971 notes from 437 patients at The University of Texas Southwestern Medical Center (UTSW; 2022-2023). We evaluated SBDH-Reader's performance against human-annotated ground truths based on precision, recall, F1, and confusion matrix. RESULTS: When tested on the UTSW validation set, SBDH-Reader achieved a macro-average F1 ranging from 0.94 to 0.98 across six SBDH categories. For clinically relevant adverse attributes, F1 ranged from 0.96 (employment; housing) to 0.99 (tobacco use). When extracting any adverse attributes across all SBDH categories, SBDH-Reader achieved an F1 of 0.97, recall of 0.97, and precision of 0.98 in the independent validation set. CONCLUSION: A general-purpose LLM can accurately extract structured SBDH data through effective prompt engineering. The SBDH-Reader has the potential to serve as a scalable and effective method for collecting real-time, patient-level SBDH data to support clinical research and care.

Duke Scholars

Author Matthew E. Dupre Population Health Sciences

Published In

medRxiv

DOI

10.1101/2025.02.19.25322576

Publication Date

June 26, 2025

Location

United States

Citation

APA

Chicago

ICMJE

MLA

NLM

Gu, Z., He, L., Naeem, A., Chan, P. M., Mohamed, A., Khalil, H., … Yang, D. M. (2025). SBDH-Reader: an LLM-powered method for extracting social and behavioral determinants of health from clinical notes. MedRxiv. https://doi.org/10.1101/2025.02.19.25322576

Gu, Zifan, Lesi He, Awais Naeem, Pui Man Chan, Asim Mohamed, Hafsa Khalil, Yujia Guo, et al. “SBDH-Reader: an LLM-powered method for extracting social and behavioral determinants of health from clinical notes.” MedRxiv, June 26, 2025. https://doi.org/10.1101/2025.02.19.25322576.

Gu Z, He L, Naeem A, Chan PM, Mohamed A, Khalil H, et al. SBDH-Reader: an LLM-powered method for extracting social and behavioral determinants of health from clinical notes. medRxiv. 2025 Jun 26;

Gu, Zifan, et al. “SBDH-Reader: an LLM-powered method for extracting social and behavioral determinants of health from clinical notes.” MedRxiv, June 2025. Pubmed, doi:10.1101/2025.02.19.25322576.

Gu Z, He L, Naeem A, Chan PM, Mohamed A, Khalil H, Guo Y, Huang J, Villanueva-Miranda I, Ding Y, Shi W, Dupre ME, Xiao G, Peterson ED, Xie Y, Navar AM, Yang DM. SBDH-Reader: an LLM-powered method for extracting social and behavioral determinants of health from clinical notes. medRxiv. 2025 Jun 26;

Published In

medRxiv

DOI

10.1101/2025.02.19.25322576

Publication Date

June 26, 2025

Location

United States