Skip to main content

Performance of ChatGPT versus spine surgeons as an emergency department spine call consultant

Publication ,  Journal Article
Taka, TM; Sebt, S; Meng, S; Cabrera, A; Shin, D; Yacoubian, V; Chao, W; Rossie, D; Xu, Z; Erickson, M; Rocos, B; Than, K; Yu, E; Ahn, N ...
Published in: North American Spine Society Journal
March 1, 2026

Background Large language models (LLMs) like ChatGPT are increasingly being recognized as credible tools for use across diverse healthcare settings. While artificial intelligence (AI) use has previously been evaluated in emergency medicine, its use in subspecialty care - particularly spine surgery - remains underexplored. This study evaluates the clinical accuracy, management appropriateness, completeness, helpfulness, and overall quality of ChatGPT responses compared to those of board-certified, spine surgeons in response to common emergency department (ED) consultations. Methods A 7-part questionnaire was developed based on common ED spine consultations (eg, Cauda Equina Syndrome, compression fracture in elderly patients, purulent drainage from surgical wound, acute lumbar disc herniation, incomplete spinal cord injury, epidural abscess, and metastatic spine disease). Each case included 3–4 questions pertaining to examination, diagnosis, management, and counseling. Responses from ChatGPT and 7 board-certified spine surgeons were restricted to 3–4 sentences per question. Three emergency medicine physicians rated each de-identified questionnaire response using a 5-point Likert scale. Statistical analysis was conducted using a 2-sample T-test with unequal variance. Inter-rater reliability was assessed using pairwise weighted Cohen’s kappa coefficient (κ). Results When comparing AI responses versus spine surgeon responses to proposed ED consultations, AI responses were rated to be superior across all 5 metrics of clinical accuracy, management appropriateness, completeness, helpfulness, and overall quality (p<.05). Inter-rater reliability was assessed using the average pairwise weighted Cohen’s kappa coefficient which showed substantial agreement (κ=0.76). Conclusions ChatGPT responses to emergency department spine consultations were rated as significantly higher compared to board-certified spine surgeons by emergency medicine providers. Though further improvement and validation is warranted, these findings suggest that ChatGPT can be a useful clinical adjunct for spine-related emergency department consultations.

Duke Scholars

Published In

North American Spine Society Journal

DOI

EISSN

2666-5484

Publication Date

March 1, 2026

Volume

25
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Taka, T. M., Sebt, S., Meng, S., Cabrera, A., Shin, D., Yacoubian, V., … Danisa, O. (2026). Performance of ChatGPT versus spine surgeons as an emergency department spine call consultant (Accepted). North American Spine Society Journal, 25. https://doi.org/10.1016/j.xnsj.2025.100836
Taka, T. M., S. Sebt, S. Meng, A. Cabrera, D. Shin, V. Yacoubian, W. Chao, et al. “Performance of ChatGPT versus spine surgeons as an emergency department spine call consultant (Accepted).” North American Spine Society Journal 25 (March 1, 2026). https://doi.org/10.1016/j.xnsj.2025.100836.
Taka TM, Sebt S, Meng S, Cabrera A, Shin D, Yacoubian V, et al. Performance of ChatGPT versus spine surgeons as an emergency department spine call consultant (Accepted). North American Spine Society Journal. 2026 Mar 1;25.
Taka, T. M., et al. “Performance of ChatGPT versus spine surgeons as an emergency department spine call consultant (Accepted).” North American Spine Society Journal, vol. 25, Mar. 2026. Scopus, doi:10.1016/j.xnsj.2025.100836.
Taka TM, Sebt S, Meng S, Cabrera A, Shin D, Yacoubian V, Chao W, Rossie D, Xu Z, Erickson M, Rocos B, Than K, Yu E, Ahn N, Bono C, Cheng W, Danisa O. Performance of ChatGPT versus spine surgeons as an emergency department spine call consultant (Accepted). North American Spine Society Journal. 2026 Mar 1;25.

Published In

North American Spine Society Journal

DOI

EISSN

2666-5484

Publication Date

March 1, 2026

Volume

25