Skip to main content

Can artificial intelligence pass the test? Evaluating chatbot scores on pediatric gastroenterology board-style questions.

Publication ,  Journal Article
Roberts, AG; Patel, R; Babu, S; Engelhard, MM; Greenberg, RG; Ajmera, A
Published in: JPGN Rep
February 2026

OBJECTIVES: The American Academy of Pediatrics (AAP) Pediatrics Review and Education Program (PREP)® Gastroenterology (GI) Self-Assessments help pediatric gastroenterologists and trainees prepare for subspecialty board exams by providing peer-reviewed questions and critiques based on American Board of Pediatrics content specifications. These assessments test knowledge of material aligned with the pediatric gastroenterology board exams. While artificial intelligence (AI) chatbots have passed various medical board exams, their ability to pass the pediatric GI boards remains untested. This study assesses the performance of Microsoft Copilot and OpenAI ChatGPT-3.5 and 4o on the 2022-2024 AAP PREP® GI Self-Assessments. METHODS: A total of 216 AAP PREP® GI Self-Assessment questions from 2022 to 2024 were entered into three AI chatbots (Microsoft Copilot, OpenAI ChatGPT-3.5, and ChatGPT-4o). Scores were compared with the passing score (> 65%) and first-time test takers' scores from the AAP for 2022-2024. RESULTS: OpenAI ChatGPT-4o and Microsoft Copilot scored above 65% (pass) on all three PREP® GI Self-Assessments from 2022 to 2024. OpenAI ChatGPT-3.5 passed the 2023 and 2024 assessments but did not pass the 2022 assessment. The chatbots collectively scored best in anatomy, motility, and mouth and esophageal disorders, and scored poorly in physiology, pharmacology, liver, stomach and duodenum disorders. CONCLUSIONS: OpenAI ChatGPT-4o and Microsoft Copilot consistently passed the PREP® GI Self-Assessments from 2022 to 2024, showing potential for good performance on the pediatric GI boards. OpenAI ChatGPT-3.5 had limitations, passing only the 2023 and 2024 assessments. Overall, advanced AI chatbots show potential to pass the Pediatric GI board exam.

Duke Scholars

Published In

JPGN Rep

DOI

EISSN

2691-171X

Publication Date

February 2026

Volume

7

Issue

1

Start / End Page

28 / 35

Location

United States
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Roberts, A. G., Patel, R., Babu, S., Engelhard, M. M., Greenberg, R. G., & Ajmera, A. (2026). Can artificial intelligence pass the test? Evaluating chatbot scores on pediatric gastroenterology board-style questions. JPGN Rep, 7(1), 28–35. https://doi.org/10.1002/jpr3.70121
Roberts, Annette G., Reshma Patel, Sharmilaa Babu, Matthew M. Engelhard, Rachel G. Greenberg, and Arun Ajmera. “Can artificial intelligence pass the test? Evaluating chatbot scores on pediatric gastroenterology board-style questions.JPGN Rep 7, no. 1 (February 2026): 28–35. https://doi.org/10.1002/jpr3.70121.
Roberts AG, Patel R, Babu S, Engelhard MM, Greenberg RG, Ajmera A. Can artificial intelligence pass the test? Evaluating chatbot scores on pediatric gastroenterology board-style questions. JPGN Rep. 2026 Feb;7(1):28–35.
Roberts, Annette G., et al. “Can artificial intelligence pass the test? Evaluating chatbot scores on pediatric gastroenterology board-style questions.JPGN Rep, vol. 7, no. 1, Feb. 2026, pp. 28–35. Pubmed, doi:10.1002/jpr3.70121.
Roberts AG, Patel R, Babu S, Engelhard MM, Greenberg RG, Ajmera A. Can artificial intelligence pass the test? Evaluating chatbot scores on pediatric gastroenterology board-style questions. JPGN Rep. 2026 Feb;7(1):28–35.

Published In

JPGN Rep

DOI

EISSN

2691-171X

Publication Date

February 2026

Volume

7

Issue

1

Start / End Page

28 / 35

Location

United States