Scholars@Duke publication: Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers.

Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers.

Publication , Journal Article

Stadler, RD; Sudah, SY; Moverman, MA; Denard, PJ; Duralde, XA; Garrigues, GE; Klifto, CS; Levy, JC; Namdari, S; Sanchez-Sotelo, J; Menendez, ME

Published in: Arthroscopy

April 2025

Published version (DOI) Link to item

PURPOSE: To evaluate the extent to which experienced reviewers can accurately discern between artificial intelligence (AI)-generated and original research abstracts published in the field of shoulder and elbow surgery and compare this with the performance of an AI detection tool. METHODS: Twenty-five shoulder- and elbow-related articles published in high-impact journals in 2023 were randomly selected. ChatGPT was prompted with only the abstract title to create an AI-generated version of each abstract. The resulting 50 abstracts were randomly distributed to and evaluated by 8 blinded peer reviewers with at least 5 years of experience. Reviewers were tasked with distinguishing between original and AI-generated text. A Likert scale assessed reviewer confidence for each interpretation, and the primary reason guiding assessment of generated text was collected. AI output detector (0%-100%) and plagiarism (0%-100%) scores were evaluated using GPTZero. RESULTS: Reviewers correctly identified 62% of AI-generated abstracts and misclassified 38% of original abstracts as being AI generated. GPTZero reported a significantly higher probability of AI output among generated abstracts (median, 56%; interquartile range [IQR], 51%-77%) compared with original abstracts (median, 10%; IQR, 4%-37%; P < .01). Generated abstracts scored significantly lower on the plagiarism detector (median, 7%; IQR, 5%-14%) relative to original abstracts (median, 82%; IQR, 72%-92%; P < .01). Correct identification of AI-generated abstracts was predominately attributed to the presence of unrealistic data/values. The primary reason for misidentifying original abstracts as AI was attributed to writing style. CONCLUSIONS: Experienced reviewers faced difficulties in distinguishing between human and AI-generated research content within shoulder and elbow surgery. The presence of unrealistic data facilitated correct identification of AI abstracts, whereas misidentification of original abstracts was often ascribed to writing style. CLINICAL RELEVANCE: With rapidly increasing AI advancements, it is paramount that ethical standards of scientific reporting are upheld. It is therefore helpful to understand the ability of reviewers to identify AI-generated content.

Duke Scholars

Author Christopher Scott Klifto Orthopaedic Surgery

Published In

Arthroscopy

DOI

10.1016/j.arthro.2024.06.045

EISSN

1526-3231

Publication Date

April 2025

Volume

Issue

Start / End Page

916 / 924.e2

Location

United States

Related Subject Headings

Shoulder
Periodicals as Topic
Orthopedics
Humans
Generative Artificial Intelligence
Elbow
Artificial Intelligence
Abstracting and Indexing
3202 Clinical sciences
1103 Clinical Sciences

Citation

APA

Chicago

ICMJE

MLA

NLM

Stadler, R. D., Sudah, S. Y., Moverman, M. A., Denard, P. J., Duralde, X. A., Garrigues, G. E., … Menendez, M. E. (2025). Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers. Arthroscopy, 41(4), 916-924.e2. https://doi.org/10.1016/j.arthro.2024.06.045

Stadler, Ryan D., Suleiman Y. Sudah, Michael A. Moverman, Patrick J. Denard, Xavier A. Duralde, Grant E. Garrigues, Christopher S. Klifto, et al. “Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers.” Arthroscopy 41, no. 4 (April 2025): 916-924.e2. https://doi.org/10.1016/j.arthro.2024.06.045.

Stadler RD, Sudah SY, Moverman MA, Denard PJ, Duralde XA, Garrigues GE, et al. Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers. Arthroscopy. 2025 Apr;41(4):916-924.e2.

Stadler, Ryan D., et al. “Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers.” Arthroscopy, vol. 41, no. 4, Apr. 2025, pp. 916-924.e2. Pubmed, doi:10.1016/j.arthro.2024.06.045.

Stadler RD, Sudah SY, Moverman MA, Denard PJ, Duralde XA, Garrigues GE, Klifto CS, Levy JC, Namdari S, Sanchez-Sotelo J, Menendez ME. Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers. Arthroscopy. 2025 Apr;41(4):916-924.e2.

Published In

Arthroscopy

DOI

10.1016/j.arthro.2024.06.045

EISSN

1526-3231

Publication Date

April 2025

Volume

Issue

Start / End Page

916 / 924.e2

Location

United States

Related Subject Headings

Shoulder
Periodicals as Topic
Orthopedics
Humans
Generative Artificial Intelligence
Elbow
Artificial Intelligence
Abstracting and Indexing
3202 Clinical sciences
1103 Clinical Sciences