Skip to main content
Journal cover image

SBDH-Reader: a large language model-powered method for extracting social and behavioral determinants of health from clinical notes.

Publication ,  Journal Article
Gu, Z; He, L; Naeem, A; Chan, PM; Mohamed, A; Khalil, H; Guo, Y; Huang, J; Villanueva-Miranda, I; Ding, Y; Shi, W; Dupre, ME; Xiao, G ...
Published in: J Am Med Inform Assoc
October 1, 2025

OBJECTIVE: Social and behavioral determinants of health (SBDH) are increasingly recognized as essential for prognostication and informing targeted interventions. Clinical notes often contain details about SBDH in unstructured format. Conventional extraction methods for these data tend to be labor intensive, inaccurate, and/or unscalable. In this study, we aim to develop and validate a large language model (LLM)-powered method to extract structured SBDH data from clinical notes through prompt engineering. MATERIALS AND METHODS: We developed SBDH-Reader to extract 6 categories of granular SBDH data by prompting GPT-4o, including employment, housing, marital status, and substance use including alcohol, tobacco, and drug use. SBDH-Reader was developed using 7225 notes from 6382 patients in the MIMIC-III database (2001-2012) and externally validated using 971 notes from 437 patients at The University of Texas Southwestern Medical Center (UTSW; 2022-2023). We evaluated SBDH-Reader's performance against human-annotated ground truths based on precision, recall, F1, and confusion matrix. RESULTS: When tested on the UTSW validation set, SBDH-Reader achieved a macro-average F1 ranging from 0.94 to 0.98 across 6 SBDH categories. For clinically relevant adverse attributes, F1 ranged from 0.96 (employment; housing) to 0.99 (tobacco use). When extracting any adverse attributes across all SBDH categories, SBDH-Reader achieved an F1 of 0.97, recall of 0.97, and precision of 0.98 in the independent validation set. DISCUSSION: SBDH-Reader demonstrated strong performance in extracting structured SBDH data through effective prompt engineering of a general-purpose LLM, without the need for task-specific fine-tuning. Its modular design and adaptability to diverse datasets and documentation patterns support its applicability in real-world clinical settings. CONCLUSION: SBDH-Reader has the potential to serve as a scalable and effective method for collecting real-time, patient-level SBDH data to support clinical research and care.

Duke Scholars

Published In

J Am Med Inform Assoc

DOI

EISSN

1527-974X

Publication Date

October 1, 2025

Volume

32

Issue

10

Start / End Page

1570 / 1580

Location

England

Related Subject Headings

  • Social Determinants of Health
  • Natural Language Processing
  • Medical Informatics
  • Large Language Models
  • Information Storage and Retrieval
  • Humans
  • Electronic Health Records
  • 46 Information and computing sciences
  • 42 Health sciences
  • 32 Biomedical and clinical sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Gu, Z., He, L., Naeem, A., Chan, P. M., Mohamed, A., Khalil, H., … Yang, D. M. (2025). SBDH-Reader: a large language model-powered method for extracting social and behavioral determinants of health from clinical notes. J Am Med Inform Assoc, 32(10), 1570–1580. https://doi.org/10.1093/jamia/ocaf124
Gu, Zifan, Lesi He, Awais Naeem, Pui Man Chan, Asim Mohamed, Hafsa Khalil, Yujia Guo, et al. “SBDH-Reader: a large language model-powered method for extracting social and behavioral determinants of health from clinical notes.J Am Med Inform Assoc 32, no. 10 (October 1, 2025): 1570–80. https://doi.org/10.1093/jamia/ocaf124.
Gu Z, He L, Naeem A, Chan PM, Mohamed A, Khalil H, et al. SBDH-Reader: a large language model-powered method for extracting social and behavioral determinants of health from clinical notes. J Am Med Inform Assoc. 2025 Oct 1;32(10):1570–80.
Gu, Zifan, et al. “SBDH-Reader: a large language model-powered method for extracting social and behavioral determinants of health from clinical notes.J Am Med Inform Assoc, vol. 32, no. 10, Oct. 2025, pp. 1570–80. Pubmed, doi:10.1093/jamia/ocaf124.
Gu Z, He L, Naeem A, Chan PM, Mohamed A, Khalil H, Guo Y, Huang J, Villanueva-Miranda I, Ding Y, Shi W, Dupre ME, Xiao G, Peterson ED, Xie Y, Navar AM, Yang DM. SBDH-Reader: a large language model-powered method for extracting social and behavioral determinants of health from clinical notes. J Am Med Inform Assoc. 2025 Oct 1;32(10):1570–1580.
Journal cover image

Published In

J Am Med Inform Assoc

DOI

EISSN

1527-974X

Publication Date

October 1, 2025

Volume

32

Issue

10

Start / End Page

1570 / 1580

Location

England

Related Subject Headings

  • Social Determinants of Health
  • Natural Language Processing
  • Medical Informatics
  • Large Language Models
  • Information Storage and Retrieval
  • Humans
  • Electronic Health Records
  • 46 Information and computing sciences
  • 42 Health sciences
  • 32 Biomedical and clinical sciences