Skip to main content

Retrieval-augmented generation for interpreting clinical laboratory regulations using large language models

Publication ,  Journal Article
Nanua, S; Steward, R; Neely, B; Datto, M; Youens, K
Published in: Journal of Pathology Informatics
November 1, 2025

Large language models (LLMs) have demonstrated strong performance on general knowledge tasks, but they have important limitations as standalone tools for question answering in specialized domains where accuracy and consistency are critical. Retrieval-augmented generation (RAG) is a strategy in which LLM outputs are grounded in dynamically retrieved source documents, offering advantages in accuracy, explainability, and maintainability. We developed and evaluated a custom RAG system called Raven, designed to answer laboratory regulatory questions using the part of the Code of Federal Regulations (CFR) pertaining to laboratory (42 CFR Part 493) as an authoritative source. Raven employed a vector search pipeline and a LLM to generate grounded responses via a chatbot–style interface. The system was tested using 103 synthetic laboratory regulatory questions, 88 of which were explicitly addressed in the CFR. Compared to answers generated manually by a board-certified pathologist, Raven's responses were judged to be totally complete and correct in 92.0% of those 88 cases, with little irrelevant content and a low potential for regulatory or medical error. Performance declined significantly on questions not addressed in the CFR, confirming the system's grounding in the source documents. Most suboptimal responses were attributable to faulty source document retrieval rather than model hallucination or misinterpretation. These findings demonstrate that a basic RAG system can produce useful, accurate, and verifiable answers to complex regulatory questions. With appropriate safeguards and with thoughtful integration into user workflows, tools like Raven may serve as valuable decision-support systems in laboratory medicine and other knowledge-intensive healthcare domains.

Duke Scholars

Published In

Journal of Pathology Informatics

DOI

EISSN

2153-3539

ISSN

2229-5089

Publication Date

November 1, 2025

Volume

19

Related Subject Headings

  • 4609 Information systems
  • 3102 Bioinformatics and computational biology
  • 0601 Biochemistry and Cell Biology
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Nanua, S., Steward, R., Neely, B., Datto, M., & Youens, K. (2025). Retrieval-augmented generation for interpreting clinical laboratory regulations using large language models. Journal of Pathology Informatics, 19. https://doi.org/10.1016/j.jpi.2025.100520
Nanua, S., R. Steward, B. Neely, M. Datto, and K. Youens. “Retrieval-augmented generation for interpreting clinical laboratory regulations using large language models.” Journal of Pathology Informatics 19 (November 1, 2025). https://doi.org/10.1016/j.jpi.2025.100520.
Nanua S, Steward R, Neely B, Datto M, Youens K. Retrieval-augmented generation for interpreting clinical laboratory regulations using large language models. Journal of Pathology Informatics. 2025 Nov 1;19.
Nanua, S., et al. “Retrieval-augmented generation for interpreting clinical laboratory regulations using large language models.” Journal of Pathology Informatics, vol. 19, Nov. 2025. Scopus, doi:10.1016/j.jpi.2025.100520.
Nanua S, Steward R, Neely B, Datto M, Youens K. Retrieval-augmented generation for interpreting clinical laboratory regulations using large language models. Journal of Pathology Informatics. 2025 Nov 1;19.

Published In

Journal of Pathology Informatics

DOI

EISSN

2153-3539

ISSN

2229-5089

Publication Date

November 1, 2025

Volume

19

Related Subject Headings

  • 4609 Information systems
  • 3102 Bioinformatics and computational biology
  • 0601 Biochemistry and Cell Biology