Can artificial intelligence pass the test? Evaluating chatbot scores on pediatric gastroenterology board-style questions.
OBJECTIVES: The American Academy of Pediatrics (AAP) Pediatrics Review and Education Program (PREP)® Gastroenterology (GI) Self-Assessments help pediatric gastroenterologists and trainees prepare for subspecialty board exams by providing peer-reviewed questions and critiques based on American Board of Pediatrics content specifications. These assessments test knowledge of material aligned with the pediatric gastroenterology board exams. While artificial intelligence (AI) chatbots have passed various medical board exams, their ability to pass the pediatric GI boards remains untested. This study assesses the performance of Microsoft Copilot and OpenAI ChatGPT-3.5 and 4o on the 2022-2024 AAP PREP® GI Self-Assessments. METHODS: A total of 216 AAP PREP® GI Self-Assessment questions from 2022 to 2024 were entered into three AI chatbots (Microsoft Copilot, OpenAI ChatGPT-3.5, and ChatGPT-4o). Scores were compared with the passing score (> 65%) and first-time test takers' scores from the AAP for 2022-2024. RESULTS: OpenAI ChatGPT-4o and Microsoft Copilot scored above 65% (pass) on all three PREP® GI Self-Assessments from 2022 to 2024. OpenAI ChatGPT-3.5 passed the 2023 and 2024 assessments but did not pass the 2022 assessment. The chatbots collectively scored best in anatomy, motility, and mouth and esophageal disorders, and scored poorly in physiology, pharmacology, liver, stomach and duodenum disorders. CONCLUSIONS: OpenAI ChatGPT-4o and Microsoft Copilot consistently passed the PREP® GI Self-Assessments from 2022 to 2024, showing potential for good performance on the pediatric GI boards. OpenAI ChatGPT-3.5 had limitations, passing only the 2023 and 2024 assessments. Overall, advanced AI chatbots show potential to pass the Pediatric GI board exam.