Skip to main content

Artificial Intelligence in Triaging Patient Questions: An Evaluation of a Large Language Model for Distal Radius Fractures.

Publication ,  Journal Article
Kahan, R; Shen, C; Wellborn, P; Lauder, A; Berchuck, S; Javeed, H; Pean, C; Federer, A
Published in: J Am Acad Orthop Surg
January 1, 2026

INTRODUCTION: Large language models (LLMs) are promising tools for clinical decision support but require thorough validation to ensure safety and reliability. This study assessed a knowledge and intelligence messaging interface (KIMI; RevelAi Health), an LLM enhanced with retrieval-augmented generation configured with American Academy of Orthopaedic Surgeons guidelines for distal radius fracture management and a persistent system-prompt layer. The goal was to evaluate KIMI's efficacy in acuity triaging and generating appropriate patient-facing responses for distal radius fracture management. METHODS: We analyzed KIMI-generated responses to 100 simulated patient queries. Four clinical experts independently assessed responses for guideline concordance, safety, clarity, and acuity. Probabilities for adequate scoring in all domains were modeled. Bayesian mixed-effects logistic regression and ordered logistic regression models were used for binary and ordinal scoring outcomes, respectively, to account for repeated measures and within-reviewer correlations. RESULTS: Reviewer evaluations of KIMI responses demonstrated high performance across safety and quality domains. Posterior average probability of responses being rated as safe was 94.2% (95% credible interval [CI]: 91.2 to 96.9), as concordant was 88.7% (95% CI: 85.0 to 92.0), and as clear was 93.7% (95% CI: 90.5 to 96.5). Posterior average probability of exact agreement between reviewer-assigned and LLM-assigned acuity levels was 62.9% (95% CI: 58.0 to 67.7). Surgical queries were associated with slightly higher safety ratings (95.4% versus 91.3%) and acuity agreement (63.9% versus 60.6%) than nonsurgical queries. Query category markedly influenced acuity agreement. LLM-assigned acuity was markedly associated with reviewer-assigned acuity across all models even when adjusting for both query type and category (odds ratio = 2.66; 95% CI: 1.81 to 3.83). DISCUSSION: KIMI generated responses that were generally safe, clinically concordant, and clearly communicated. These findings support the feasibility of deploying enhanced LLMs for asynchronous patient engagement in low-to-moderate risk care coordination settings.

Duke Scholars

Published In

J Am Acad Orthop Surg

DOI

EISSN

1940-5480

Publication Date

January 1, 2026

Volume

34

Issue

1

Start / End Page

e106 / e115

Location

United States

Related Subject Headings

  • Wrist Fractures
  • Triage
  • Radius Fractures
  • Orthopedics
  • Large Language Models
  • Language
  • Humans
  • Decision Support Systems, Clinical
  • Bayes Theorem
  • Artificial Intelligence
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Kahan, R., Shen, C., Wellborn, P., Lauder, A., Berchuck, S., Javeed, H., … Federer, A. (2026). Artificial Intelligence in Triaging Patient Questions: An Evaluation of a Large Language Model for Distal Radius Fractures. J Am Acad Orthop Surg, 34(1), e106–e115. https://doi.org/10.5435/JAAOS-D-25-00456
Kahan, Riley, Christine Shen, Patricia Wellborn, Alexander Lauder, Samuel Berchuck, Hadi Javeed, Christian Pean, and Andrew Federer. “Artificial Intelligence in Triaging Patient Questions: An Evaluation of a Large Language Model for Distal Radius Fractures.J Am Acad Orthop Surg 34, no. 1 (January 1, 2026): e106–15. https://doi.org/10.5435/JAAOS-D-25-00456.
Kahan R, Shen C, Wellborn P, Lauder A, Berchuck S, Javeed H, et al. Artificial Intelligence in Triaging Patient Questions: An Evaluation of a Large Language Model for Distal Radius Fractures. J Am Acad Orthop Surg. 2026 Jan 1;34(1):e106–15.
Kahan, Riley, et al. “Artificial Intelligence in Triaging Patient Questions: An Evaluation of a Large Language Model for Distal Radius Fractures.J Am Acad Orthop Surg, vol. 34, no. 1, Jan. 2026, pp. e106–15. Pubmed, doi:10.5435/JAAOS-D-25-00456.
Kahan R, Shen C, Wellborn P, Lauder A, Berchuck S, Javeed H, Pean C, Federer A. Artificial Intelligence in Triaging Patient Questions: An Evaluation of a Large Language Model for Distal Radius Fractures. J Am Acad Orthop Surg. 2026 Jan 1;34(1):e106–e115.

Published In

J Am Acad Orthop Surg

DOI

EISSN

1940-5480

Publication Date

January 1, 2026

Volume

34

Issue

1

Start / End Page

e106 / e115

Location

United States

Related Subject Headings

  • Wrist Fractures
  • Triage
  • Radius Fractures
  • Orthopedics
  • Large Language Models
  • Language
  • Humans
  • Decision Support Systems, Clinical
  • Bayes Theorem
  • Artificial Intelligence