Enabling inclusive systematic reviews: incorporating preprint articles with large language model-driven evaluations.
OBJECTIVES: Systematic reviews in comparative effectiveness research require timely evidence synthesis. With the rapid advancement of medical research, preprint articles play an increasingly important role in accelerating knowledge dissemination. However, as preprint articles are not peer-reviewed before publication, their quality varies significantly, posing challenges for evidence inclusion in systematic reviews. MATERIALS AND METHODS: We developed AutoConfidenceScore (automated confidence score assessment), an advanced framework for predicting preprint publication, which reduces reliance on manual curation and expands the range of predictors, including three key advancements: (1) automated data extraction using natural language processing techniques, (2) semantic embeddings of titles and abstracts, and (3) large language model (LLM)-driven evaluation scores. Additionally, we employed two prediction models: a random forest classifier for binary outcome and a survival cure model that predicts both binary outcome and publication risk over time. RESULTS: The random forest classifier achieved an area under the receiver operating characteristic curve (AUROC) of 0.747 using all features. The survival cure model achieved an AUROC of 0.731 for binary outcome prediction and a concordance index of 0.667 for time-to-publication risk. DISCUSSION: Our study advances the framework for preprint publication prediction through automated data extraction and multiple feature integration. By combining semantic embeddings with LLM-driven evaluations, AutoConfidenceScore significantly enhances predictive performance while reducing manual annotation burden. CONCLUSION: AutoConfidenceScore has the potential to facilitate incorporation of preprint articles during the appraisal phase of systematic reviews, supporting researchers in more effective utilization of preprint resources.
Duke Scholars
Published In
DOI
EISSN
Publication Date
Location
Related Subject Headings
- Medical Informatics
- 46 Information and computing sciences
- 42 Health sciences
- 32 Biomedical and clinical sciences
- 11 Medical and Health Sciences
- 09 Engineering
- 08 Information and Computing Sciences
Citation
Published In
DOI
EISSN
Publication Date
Location
Related Subject Headings
- Medical Informatics
- 46 Information and computing sciences
- 42 Health sciences
- 32 Biomedical and clinical sciences
- 11 Medical and Health Sciences
- 09 Engineering
- 08 Information and Computing Sciences