Scholars@Duke publication: Can artificial intelligence pass the test? Evaluating chatbot scores on pediatric gastroenterology board-style questions.

Can artificial intelligence pass the test? Evaluating chatbot scores on pediatric gastroenterology board-style questions.

Publication , Journal Article

Roberts, AG; Patel, R; Babu, S; Engelhard, MM; Greenberg, RG; Ajmera, A

Published in: JPGN Rep

February 2026

OBJECTIVES: The American Academy of Pediatrics (AAP) Pediatrics Review and Education Program (PREP)® Gastroenterology (GI) Self-Assessments help pediatric gastroenterologists and trainees prepare for subspecialty board exams by providing peer-reviewed questions and critiques based on American Board of Pediatrics content specifications. These assessments test knowledge of material aligned with the pediatric gastroenterology board exams. While artificial intelligence (AI) chatbots have passed various medical board exams, their ability to pass the pediatric GI boards remains untested. This study assesses the performance of Microsoft Copilot and OpenAI ChatGPT-3.5 and 4o on the 2022-2024 AAP PREP® GI Self-Assessments. METHODS: A total of 216 AAP PREP® GI Self-Assessment questions from 2022 to 2024 were entered into three AI chatbots (Microsoft Copilot, OpenAI ChatGPT-3.5, and ChatGPT-4o). Scores were compared with the passing score (> 65%) and first-time test takers' scores from the AAP for 2022-2024. RESULTS: OpenAI ChatGPT-4o and Microsoft Copilot scored above 65% (pass) on all three PREP® GI Self-Assessments from 2022 to 2024. OpenAI ChatGPT-3.5 passed the 2023 and 2024 assessments but did not pass the 2022 assessment. The chatbots collectively scored best in anatomy, motility, and mouth and esophageal disorders, and scored poorly in physiology, pharmacology, liver, stomach and duodenum disorders. CONCLUSIONS: OpenAI ChatGPT-4o and Microsoft Copilot consistently passed the PREP® GI Self-Assessments from 2022 to 2024, showing potential for good performance on the pediatric GI boards. OpenAI ChatGPT-3.5 had limitations, passing only the 2023 and 2024 assessments. Overall, advanced AI chatbots show potential to pass the Pediatric GI board exam.

Duke Scholars

Author Arun Ajmera Pediatrics, Gastroenterology, Hepatology and Nutrition

Author Matthew M. Engelhard Biostatistics & Bioinformatics, Division of Translational Bi ...

Author Rachel Gottron Greenberg Pediatrics, Neonatology

Published In

JPGN Rep

DOI

10.1002/jpr3.70121

EISSN

2691-171X

Publication Date

February 2026

Volume

Issue

Start / End Page

28 / 35

Location

United States

Citation

APA

Chicago

ICMJE

MLA

NLM

Roberts, A. G., Patel, R., Babu, S., Engelhard, M. M., Greenberg, R. G., & Ajmera, A. (2026). Can artificial intelligence pass the test? Evaluating chatbot scores on pediatric gastroenterology board-style questions. JPGN Rep, 7(1), 28–35. https://doi.org/10.1002/jpr3.70121

Roberts, Annette G., Reshma Patel, Sharmilaa Babu, Matthew M. Engelhard, Rachel G. Greenberg, and Arun Ajmera. “Can artificial intelligence pass the test? Evaluating chatbot scores on pediatric gastroenterology board-style questions.” JPGN Rep 7, no. 1 (February 2026): 28–35. https://doi.org/10.1002/jpr3.70121.

Roberts AG, Patel R, Babu S, Engelhard MM, Greenberg RG, Ajmera A. Can artificial intelligence pass the test? Evaluating chatbot scores on pediatric gastroenterology board-style questions. JPGN Rep. 2026 Feb;7(1):28–35.

Roberts, Annette G., et al. “Can artificial intelligence pass the test? Evaluating chatbot scores on pediatric gastroenterology board-style questions.” JPGN Rep, vol. 7, no. 1, Feb. 2026, pp. 28–35. Pubmed, doi:10.1002/jpr3.70121.

Published In

JPGN Rep

DOI

10.1002/jpr3.70121

EISSN

2691-171X

Publication Date

February 2026

Volume

Issue

Start / End Page

28 / 35

Location

United States