Scholars@Duke publication: Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.

Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.

Publication , Journal Article

Lee, Y; Shin, T; Tessier, L; Javidan, A; Jung, J; Hong, D; Strong, AT; McKechnie, T; Malone, S; Jin, D; Kroh, M; Dang, JT ...

Published in: Surg Obes Relat Dis

July 2024

Published version (DOI) Link to item

BACKGROUND: The formulation of clinical recommendations pertaining to bariatric surgery is essential in guiding healthcare professionals. However, the extensive and continuously evolving body of literature in bariatric surgery presents considerable challenge for staying abreast of latest developments and efficient information acquisition. Artificial intelligence (AI) has the potential to streamline access to the salient points of clinical recommendations in bariatric surgery. OBJECTIVES: The study aims to appraise the quality and readability of AI-chat-generated answers to frequently asked clinical inquiries in the field of bariatric and metabolic surgery. SETTING: Remote. METHODS: Question prompts inputted into AI large language models (LLMs) and were created based on pre-existing clinical practice guidelines regarding bariatric and metabolic surgery. The prompts were queried into 3 LLMs: OpenAI ChatGPT-4, Microsoft Bing, and Google Bard. The responses from each LLM were entered into a spreadsheet for randomized and blinded duplicate review. Accredited bariatric surgeons in North America independently assessed appropriateness of each recommendation using a 5-point Likert scale. Scores of 4 and 5 were deemed appropriate, while scores of 1-3 indicated lack of appropriateness. A Flesch Reading Ease (FRE) score was calculated to assess the readability of responses generated by each LLMs. RESULTS: There was a significant difference between the 3 LLMs in their 5-point Likert scores, with mean values of 4.46 (SD .82), 3.89 (.80), and 3.11 (.72) for ChatGPT-4, Bard, and Bing (P < .001). There was a significant difference between the 3 LLMs in the proportion of appropriate answers, with ChatGPT-4 at 85.7%, Bard at 74.3%, and Bing at 25.7% (P < .001). The mean FRE scores for ChatGPT-4, Bard, and Bing, were 21.68 (SD 2.78), 42.89 (4.03), and 14.64 (5.09), respectively, with higher scores representing easier readability. CONCLUSIONS: LLM-based AI chat models can effectively generate appropriate responses to clinical questions related to bariatric surgery, though the performance of different models can vary greatly. Therefore, caution should be taken when interpreting clinical information provided by LLMs, and clinician oversight is necessary to ensure accuracy. Future investigation is warranted to explore how LLMs might enhance healthcare provision and clinical decision-making in bariatric surgery.

Duke Scholars

Author James Jung Surgery, Minimally Invasive Surgery

Published In

Surg Obes Relat Dis

DOI

10.1016/j.soard.2024.03.011

EISSN

1878-7533

Publication Date

July 2024

Volume

Issue

Start / End Page

603 / 608

Location

United States

Related Subject Headings

Surgery
Practice Guidelines as Topic
Obesity, Morbid
Humans
Comprehension
Bariatric Surgery
Artificial Intelligence
4206 Public health
3202 Clinical sciences
1117 Public Health and Health Services

Citation

APA

Chicago

ICMJE

MLA

NLM

Lee, Y., Shin, T., Tessier, L., Javidan, A., Jung, J., Hong, D., … ASMBS Artificial Intelligence and Digital Surgery Task Force. (2024). Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations. Surg Obes Relat Dis, 20(7), 603–608. https://doi.org/10.1016/j.soard.2024.03.011

Lee, Yung, Thomas Shin, Léa Tessier, Arshia Javidan, James Jung, Dennis Hong, Andrew T. Strong, et al. “Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.” Surg Obes Relat Dis 20, no. 7 (July 2024): 603–8. https://doi.org/10.1016/j.soard.2024.03.011.

Lee Y, Shin T, Tessier L, Javidan A, Jung J, Hong D, et al. Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations. Surg Obes Relat Dis. 2024 Jul;20(7):603–8.

Lee, Yung, et al. “Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.” Surg Obes Relat Dis, vol. 20, no. 7, July 2024, pp. 603–08. Pubmed, doi:10.1016/j.soard.2024.03.011.

Lee Y, Shin T, Tessier L, Javidan A, Jung J, Hong D, Strong AT, McKechnie T, Malone S, Jin D, Kroh M, Dang JT, ASMBS Artificial Intelligence and Digital Surgery Task Force. Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations. Surg Obes Relat Dis. 2024 Jul;20(7):603–608.

Published In

Surg Obes Relat Dis

DOI

10.1016/j.soard.2024.03.011

EISSN

1878-7533

Publication Date

July 2024

Volume

Issue

Start / End Page

603 / 608

Location

United States

Related Subject Headings

Surgery
Practice Guidelines as Topic
Obesity, Morbid
Humans
Comprehension
Bariatric Surgery
Artificial Intelligence
4206 Public health
3202 Clinical sciences
1117 Public Health and Health Services