Skip to main content
Journal cover image

Large Language Models and the Analyses of Adherence to Reporting Guidelines in Systematic Reviews and Overviews of Reviews (PRISMA 2020 and PRIOR).

Publication ,  Journal Article
Forero, DA; Abreu, SE; Tovar, BE; Oermann, MH
Published in: Journal of medical systems
June 2025

In the context of Evidence-Based Practice (EBP), Systematic Reviews (SRs), Meta-Analyses (MAs) and overview of reviews have become cornerstones for the synthesis of research findings. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 and Preferred Reporting Items for Overviews of Reviews (PRIOR) statements have become major reporting guidelines for SRs/MAs and for overviews of reviews, respectively. In recent years, advances in Generative Artificial Intelligence (genAI) have been proposed as a potential major paradigm shift in scientific research. The main aim of this research was to examine the performance of four LLMs for the analysis of adherence to PRISMA 2020 and PRIOR, in a sample of 20 SRs and 20 overviews of reviews. We tested the free versions of four commonly used LLMs: ChatGPT (GPT-4o), DeepSeek (V3), Gemini (2.0 Flash) and Qwen (2.5 Max). Adherence to PRISMA 2020 and PRIOR was compared with scores defined previously by human experts, using several statistical tests. In our results, all the four LLMs showed a low performance for the analysis of adherence to PRISMA 2020, overestimating the percentage of adherence (from 23 to 30%). For PRIOR, the LLMs presented lower differences in the estimation of adherence (from 6 to 14%) and ChatGPT showed a performance similar to human experts. This is the first report of the performance of four commonly used LLMs for the analysis of adherence to PRISMA 2020 and PRIOR. Future studies of adherence to other reporting guidelines will be helpful in health sciences research.

Duke Scholars

Published In

Journal of medical systems

DOI

EISSN

1573-689X

ISSN

0148-5598

Publication Date

June 2025

Volume

49

Issue

1

Start / End Page

80

Related Subject Headings

  • Systematic Reviews as Topic
  • Meta-Analysis as Topic
  • Medical Informatics
  • Large Language Models
  • Humans
  • Guidelines as Topic
  • Guideline Adherence
  • Artificial Intelligence
  • 4203 Health services and systems
  • 1117 Public Health and Health Services
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Forero, D. A., Abreu, S. E., Tovar, B. E., & Oermann, M. H. (2025). Large Language Models and the Analyses of Adherence to Reporting Guidelines in Systematic Reviews and Overviews of Reviews (PRISMA 2020 and PRIOR). Journal of Medical Systems, 49(1), 80. https://doi.org/10.1007/s10916-025-02212-0
Forero, Diego A., Sandra E. Abreu, Blanca E. Tovar, and Marilyn H. Oermann. “Large Language Models and the Analyses of Adherence to Reporting Guidelines in Systematic Reviews and Overviews of Reviews (PRISMA 2020 and PRIOR).Journal of Medical Systems 49, no. 1 (June 2025): 80. https://doi.org/10.1007/s10916-025-02212-0.
Forero, Diego A., et al. “Large Language Models and the Analyses of Adherence to Reporting Guidelines in Systematic Reviews and Overviews of Reviews (PRISMA 2020 and PRIOR).Journal of Medical Systems, vol. 49, no. 1, June 2025, p. 80. Epmc, doi:10.1007/s10916-025-02212-0.
Journal cover image

Published In

Journal of medical systems

DOI

EISSN

1573-689X

ISSN

0148-5598

Publication Date

June 2025

Volume

49

Issue

1

Start / End Page

80

Related Subject Headings

  • Systematic Reviews as Topic
  • Meta-Analysis as Topic
  • Medical Informatics
  • Large Language Models
  • Humans
  • Guidelines as Topic
  • Guideline Adherence
  • Artificial Intelligence
  • 4203 Health services and systems
  • 1117 Public Health and Health Services