Artificial Intelligence in Triaging Patient Questions: An Evaluation of a Large Language Model for Distal Radius Fractures.
INTRODUCTION: Large language models (LLMs) are promising tools for clinical decision support but require thorough validation to ensure safety and reliability. This study assessed a knowledge and intelligence messaging interface (KIMI; RevelAi Health), an LLM enhanced with retrieval-augmented generation configured with American Academy of Orthopaedic Surgeons guidelines for distal radius fracture management and a persistent system-prompt layer. The goal was to evaluate KIMI's efficacy in acuity triaging and generating appropriate patient-facing responses for distal radius fracture management. METHODS: We analyzed KIMI-generated responses to 100 simulated patient queries. Four clinical experts independently assessed responses for guideline concordance, safety, clarity, and acuity. Probabilities for adequate scoring in all domains were modeled. Bayesian mixed-effects logistic regression and ordered logistic regression models were used for binary and ordinal scoring outcomes, respectively, to account for repeated measures and within-reviewer correlations. RESULTS: Reviewer evaluations of KIMI responses demonstrated high performance across safety and quality domains. Posterior average probability of responses being rated as safe was 94.2% (95% credible interval [CI]: 91.2 to 96.9), as concordant was 88.7% (95% CI: 85.0 to 92.0), and as clear was 93.7% (95% CI: 90.5 to 96.5). Posterior average probability of exact agreement between reviewer-assigned and LLM-assigned acuity levels was 62.9% (95% CI: 58.0 to 67.7). Surgical queries were associated with slightly higher safety ratings (95.4% versus 91.3%) and acuity agreement (63.9% versus 60.6%) than nonsurgical queries. Query category markedly influenced acuity agreement. LLM-assigned acuity was markedly associated with reviewer-assigned acuity across all models even when adjusting for both query type and category (odds ratio = 2.66; 95% CI: 1.81 to 3.83). DISCUSSION: KIMI generated responses that were generally safe, clinically concordant, and clearly communicated. These findings support the feasibility of deploying enhanced LLMs for asynchronous patient engagement in low-to-moderate risk care coordination settings.
Duke Scholars
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Location
Related Subject Headings
- Wrist Fractures
- Triage
- Radius Fractures
- Orthopedics
- Large Language Models
- Language
- Humans
- Decision Support Systems, Clinical
- Bayes Theorem
- Artificial Intelligence
Citation
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Location
Related Subject Headings
- Wrist Fractures
- Triage
- Radius Fractures
- Orthopedics
- Large Language Models
- Language
- Humans
- Decision Support Systems, Clinical
- Bayes Theorem
- Artificial Intelligence