Skip to main content
Journal cover image

Integrating expert knowledge into large language models improves performance for psychiatric reasoning and diagnosis.

Publication ,  Journal Article
Sarma, KV; Hanss, KE; Halls, AJM; Krystal, A; Becker, DF; Glowinski, AL; Butte, AJ
Published in: Psychiatry Res
January 2026

BACKGROUND AND METHODS: The authors sought to evaluate the performance of common large language models (LLMs) in psychiatric diagnosis, and the impact of integrating expert-derived reasoning on their performance. Clinical case vignettes and associated diagnoses were retrieved from the DSM-5-TR Clinical Cases book. Diagnostic decision trees were retrieved from the DSM-5-TR Handbook of Differential Diagnosis and refined for LLM use. Three LLMs were prompted to provide diagnosis candidates for the vignettes either by directly prompting or using the decision trees. These candidates and diagnostic categories were compared against the correct diagnoses. The positive predictive value (PPV), sensitivity, and F1 statistic were used to measure performance. RESULTS: When directly prompted to predict diagnoses, the best LLM by F1 statistic (gpt-4o) had sensitivity of 76.7 % and PPV of 40.4 %. When making use of the refined decision trees, PPV was significantly increased (65.3 %) without a significant reduction in sensitivity (70.9 %). Across all experiments, the use of the decision trees statistically significantly increased the PPV, significantly increased the F1 statistic in 5/6 experiments, and significantly reduced sensitivity in 4/6 experiments. DISCUSSION: When used to predict psychiatric diagnoses from case vignettes, direct prompting of the LLMs yielded most true positive diagnoses but had significant overdiagnosis. Integrating expert-derived reasoning into the process using decision trees improved LLM performance (as measured by F1 statistic), primarily by suppressing overdiagnosis with a lower-magnitude negative impact on sensitivity. This suggests that the integration of clinical expert-derived reasoning could improve the performance of LLM-based tools in the behavioral health setting.

Duke Scholars

Published In

Psychiatry Res

DOI

EISSN

1872-7123

Publication Date

January 2026

Volume

355

Start / End Page

116844

Location

Ireland

Related Subject Headings

  • Psychiatry
  • Mental Disorders
  • Large Language Models
  • Language
  • Humans
  • Diagnostic and Statistical Manual of Mental Disorders
  • Diagnosis, Differential
  • Decision Trees
  • Clinical Reasoning
  • 5203 Clinical and health psychology
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Sarma, K. V., Hanss, K. E., Halls, A. J. M., Krystal, A., Becker, D. F., Glowinski, A. L., & Butte, A. J. (2026). Integrating expert knowledge into large language models improves performance for psychiatric reasoning and diagnosis. Psychiatry Res, 355, 116844. https://doi.org/10.1016/j.psychres.2025.116844
Sarma, Karthik V., Kaitlin E. Hanss, Andrew J. M. Halls, Andrew Krystal, Daniel F. Becker, Anne L. Glowinski, and Atul J. Butte. “Integrating expert knowledge into large language models improves performance for psychiatric reasoning and diagnosis.Psychiatry Res 355 (January 2026): 116844. https://doi.org/10.1016/j.psychres.2025.116844.
Sarma KV, Hanss KE, Halls AJM, Krystal A, Becker DF, Glowinski AL, et al. Integrating expert knowledge into large language models improves performance for psychiatric reasoning and diagnosis. Psychiatry Res. 2026 Jan;355:116844.
Sarma, Karthik V., et al. “Integrating expert knowledge into large language models improves performance for psychiatric reasoning and diagnosis.Psychiatry Res, vol. 355, Jan. 2026, p. 116844. Pubmed, doi:10.1016/j.psychres.2025.116844.
Sarma KV, Hanss KE, Halls AJM, Krystal A, Becker DF, Glowinski AL, Butte AJ. Integrating expert knowledge into large language models improves performance for psychiatric reasoning and diagnosis. Psychiatry Res. 2026 Jan;355:116844.
Journal cover image

Published In

Psychiatry Res

DOI

EISSN

1872-7123

Publication Date

January 2026

Volume

355

Start / End Page

116844

Location

Ireland

Related Subject Headings

  • Psychiatry
  • Mental Disorders
  • Large Language Models
  • Language
  • Humans
  • Diagnostic and Statistical Manual of Mental Disorders
  • Diagnosis, Differential
  • Decision Trees
  • Clinical Reasoning
  • 5203 Clinical and health psychology