Scholars@Duke publication: Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study.

Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study.

Publication , Journal Article

Gianola, S; Bargeri, S; Castellini, G; Cook, C; Palese, A; Pillastrini, P; Salvalaggio, S; Turolla, A; Rossettini, G

Published in: J Orthop Sports Phys Ther

March 2024

OBJECTIVE: To compare the accuracy of an artificial intelligence chatbot to clinical practice guidelines (CPGs) recommendations for providing answers to complex clinical questions on lumbosacral radicular pain. DESIGN: Cross-sectional study. METHODS: We extracted recommendations from recent CPGs for diagnosing and treating lumbosacral radicular pain. Relative clinical questions were developed and queried to OpenAI's ChatGPT (GPT-3.5). We compared ChatGPT answers to CPGs recommendations by assessing the (1) internal consistency of ChatGPT answers by measuring the percentage of text wording similarity when a clinical question was posed 3 times, (2) reliability between 2 independent reviewers in grading ChatGPT answers, and (3) accuracy of ChatGPT answers compared to CPGs recommendations. Reliability was estimated using Fleiss' kappa (κ) coefficients, and accuracy by interobserver agreement as the frequency of the agreements among all judgments. RESULTS: We tested 9 clinical questions. The internal consistency of text ChatGPT answers was unacceptable across all 3 trials in all clinical questions (mean percentage of 49%, standard deviation of 15). Intrareliability (reviewer 1: κ = 0.90, standard error [SE] = 0.09; reviewer 2: κ = 0.90, SE = 0.10) and interreliability (κ = 0.85, SE = 0.15) between the 2 reviewers was "almost perfect." Accuracy between ChatGPT answers and CPGs recommendations was slight, demonstrating agreement in 33% of recommendations. CONCLUSION: ChatGPT performed poorly in internal consistency and accuracy of the indications generated compared to clinical practice guideline recommendations for lumbosacral radicular pain. J Orthop Sports Phys Ther 2024;54(3):1-7. Epub 29 January 2024. doi:10.2519/jospt.2024.12151.

Duke Scholars

Author Chad E. Cook Orthopaedic Surgery, Physical Therapy

Published In

J Orthop Sports Phys Ther

DOI

10.2519/jospt.2024.12151

EISSN

1938-1344

Publication Date

March 2024

Volume

Issue

Start / End Page

222 / 228

Location

United States

Related Subject Headings

Reproducibility of Results
Orthopedics
Humans
Decision Making
Cross-Sectional Studies
Back Pain
Artificial Intelligence
4207 Sports science and exercise
4201 Allied health and rehabilitation science
3202 Clinical sciences

Citation

APA

Chicago

ICMJE

MLA

NLM

Gianola, S., Bargeri, S., Castellini, G., Cook, C., Palese, A., Pillastrini, P., … Rossettini, G. (2024). Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study. J Orthop Sports Phys Ther, 54(3), 222–228. https://doi.org/10.2519/jospt.2024.12151

Gianola, Silvia, Silvia Bargeri, Greta Castellini, Chad Cook, Alvisa Palese, Paolo Pillastrini, Silvia Salvalaggio, Andrea Turolla, and Giacomo Rossettini. “Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study.” J Orthop Sports Phys Ther 54, no. 3 (March 2024): 222–28. https://doi.org/10.2519/jospt.2024.12151.

Gianola S, Bargeri S, Castellini G, Cook C, Palese A, Pillastrini P, et al. Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study. J Orthop Sports Phys Ther. 2024 Mar;54(3):222–8.

Gianola, Silvia, et al. “Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study.” J Orthop Sports Phys Ther, vol. 54, no. 3, Mar. 2024, pp. 222–28. Pubmed, doi:10.2519/jospt.2024.12151.

Gianola S, Bargeri S, Castellini G, Cook C, Palese A, Pillastrini P, Salvalaggio S, Turolla A, Rossettini G. Performance of ChatGPT Compared to Clinical Practice Guidelines in Making Informed Decisions for Lumbosacral Radicular Pain: A Cross-sectional Study. J Orthop Sports Phys Ther. 2024 Mar;54(3):222–228.

Published In

J Orthop Sports Phys Ther

DOI

10.2519/jospt.2024.12151

EISSN

1938-1344

Publication Date

March 2024

Volume

Issue

Start / End Page

222 / 228

Location

United States

Related Subject Headings

Reproducibility of Results
Orthopedics
Humans
Decision Making
Cross-Sectional Studies
Back Pain
Artificial Intelligence
4207 Sports science and exercise
4201 Allied health and rehabilitation science
3202 Clinical sciences