Scholars@Duke publication: Performance of ChatGPT versus spine surgeons as an emergency department spine call consultant

Performance of ChatGPT versus spine surgeons as an emergency department spine call consultant

Publication , Journal Article

Taka, TM; Sebt, S; Meng, S; Cabrera, A; Shin, D; Yacoubian, V; Chao, W; Rossie, D; Xu, Z; Erickson, M; Rocos, B; Than, K; Yu, E; Ahn, N ...

Published in: North American Spine Society Journal

March 1, 2026

Published version (DOI)

Background Large language models (LLMs) like ChatGPT are increasingly being recognized as credible tools for use across diverse healthcare settings. While artificial intelligence (AI) use has previously been evaluated in emergency medicine, its use in subspecialty care - particularly spine surgery - remains underexplored. This study evaluates the clinical accuracy, management appropriateness, completeness, helpfulness, and overall quality of ChatGPT responses compared to those of board-certified, spine surgeons in response to common emergency department (ED) consultations. Methods A 7-part questionnaire was developed based on common ED spine consultations (eg, Cauda Equina Syndrome, compression fracture in elderly patients, purulent drainage from surgical wound, acute lumbar disc herniation, incomplete spinal cord injury, epidural abscess, and metastatic spine disease). Each case included 3–4 questions pertaining to examination, diagnosis, management, and counseling. Responses from ChatGPT and 7 board-certified spine surgeons were restricted to 3–4 sentences per question. Three emergency medicine physicians rated each de-identified questionnaire response using a 5-point Likert scale. Statistical analysis was conducted using a 2-sample T-test with unequal variance. Inter-rater reliability was assessed using pairwise weighted Cohen’s kappa coefficient (κ). Results When comparing AI responses versus spine surgeon responses to proposed ED consultations, AI responses were rated to be superior across all 5 metrics of clinical accuracy, management appropriateness, completeness, helpfulness, and overall quality (p<.05). Inter-rater reliability was assessed using the average pairwise weighted Cohen’s kappa coefficient which showed substantial agreement (κ=0.76). Conclusions ChatGPT responses to emergency department spine consultations were rated as significantly higher compared to board-certified spine surgeons by emergency medicine providers. Though further improvement and validation is warranted, these findings suggest that ChatGPT can be a useful clinical adjunct for spine-related emergency department consultations.

Duke Scholars

Author Melissa Maria Erickson Orthopaedic Surgery

Author Olumide Ayodele Danisa Orthopaedic Surgery

Author Khoi Duc Than Neurosurgery

Author Brett Rocos Orthopaedic Surgery

Published In

North American Spine Society Journal

DOI

10.1016/j.xnsj.2025.100836

EISSN

2666-5484

Publication Date

March 1, 2026

Volume

Citation

APA

Chicago

ICMJE

MLA

NLM

Taka, T. M., Sebt, S., Meng, S., Cabrera, A., Shin, D., Yacoubian, V., … Danisa, O. (2026). Performance of ChatGPT versus spine surgeons as an emergency department spine call consultant (Accepted). North American Spine Society Journal, 25. https://doi.org/10.1016/j.xnsj.2025.100836

Taka, T. M., S. Sebt, S. Meng, A. Cabrera, D. Shin, V. Yacoubian, W. Chao, et al. “Performance of ChatGPT versus spine surgeons as an emergency department spine call consultant (Accepted).” North American Spine Society Journal 25 (March 1, 2026). https://doi.org/10.1016/j.xnsj.2025.100836.

Taka TM, Sebt S, Meng S, Cabrera A, Shin D, Yacoubian V, et al. Performance of ChatGPT versus spine surgeons as an emergency department spine call consultant (Accepted). North American Spine Society Journal. 2026 Mar 1;25.

Taka, T. M., et al. “Performance of ChatGPT versus spine surgeons as an emergency department spine call consultant (Accepted).” North American Spine Society Journal, vol. 25, Mar. 2026. Scopus, doi:10.1016/j.xnsj.2025.100836.

Taka TM, Sebt S, Meng S, Cabrera A, Shin D, Yacoubian V, Chao W, Rossie D, Xu Z, Erickson M, Rocos B, Than K, Yu E, Ahn N, Bono C, Cheng W, Danisa O. Performance of ChatGPT versus spine surgeons as an emergency department spine call consultant (Accepted). North American Spine Society Journal. 2026 Mar 1;25.

Published In

North American Spine Society Journal

DOI

10.1016/j.xnsj.2025.100836

EISSN

2666-5484

Publication Date

March 1, 2026