Evaluating ChatGPT’s Triage and Diagnostic Capabilities in Patients Presenting With Common Causes of Foot and Ankle Pain
Background: ChatGPT-4 has demonstrated potential in offering treatment recommendations for orthopaedic conditions following American Academy of Orthopaedic Surgeons (AAOS) clinical practice guidelines, including those pertaining to foot and ankle pathology. Although prior studies explored its performance in triaging causes of knee pain, ChatGPT-4o’s application in triaging patients into appropriate health care settings remains largely unexamined. This study evaluated ChatGPT-4o’s ability to generate differential diagnoses, recommend appropriate triage destinations, and formulate treatment plans when provided with expanded clinical information. However, its performance in foot and ankle triage remains incompletely characterized. Methods: Twenty-four standardized foot and ankle complaints were input into ChatGPT-4o, in an exploratory, hypothesis-generating vignette-based study with memory reset between entries to minimize bias. Twelve cases focused on ChatGPT-4o’s ability to generate differential diagnoses and triage decisions (Primary Care Physician, Foot and Ankle Specialist, or Emergency Department/Urgent Care), which were compared against evaluations by 2 fellowship-trained orthopaedic foot and ankle surgeons. An additional 12 expanded clinical vignettes were used to prompt a primary diagnosis and treatment recommendations, which were then graded for accuracy and suitability. Results: ChatGPT-4o generated differentials that were considered clinically appropriate for all triage conditions. The top diagnosis matched that of the surgeons in 9 of 12 cases (75%) and appeared within the first or second position of the differential list in 10 of 12 cases (83.3%). Across all differential lists, 26 of 36 diagnoses (72.2%) were identical. ChatGPT-4o’s triage recommendations matched the surgeons’ decisions in 6 cases (50%). With expanded clinical information, ChatGPT-4o maintained diagnostic accuracy (75%) and generated appropriate management plans in 11 of 12 cases (91.7%). Conclusion: ChatGPT-4o was able to generate clinically reasonable differentials for foot and ankle conditions. Although triage decision making showed variability, these findings support a limited role for ChatGPT-4o as an adjunct to central scheduling workflows, helping streamline patient triage and health care delivery. Level of Evidence: Level V, expert opinion, vignette-based study.