A natural language processing algorithm to extract MRI and biopsy-related phenotypes for prostate cancer patients.
Culnan, J; Goryachev, S; Chen, D; Soloviev, O; Lee, G; Bihn, J; Corrigan, J; Dulberger, KN; La, J; Swinnerton, K; Dorff, TB; Garraway, I ...
Published in: Journal of Clinical Oncology
Extraction of data from unstructured medical records is a powerful but challenging tool for deriving novel information to drive research, improve clinical care and better inform guidelines. Here, we describe the creation and validation of a novel natural language processing (NLP) algorithm for extracting components of interest from biopsy and MRI reports for prostate cancer patients. Our algorithm specifically extracts the Gleason score from biopsy reports and maximum Prostate Imaging-Reporting and Data System (PI-RADS) score, Prostate-specific androgen (PSA) density, prostate volume, and prostate dimensions from MRI reports.
MRI and biopsy pathology reports were extracted for a cohort of 155,570 patients diagnosed with prostate cancer between 1999 and 2024 either in the VA Cancer Registry System (VACRS) with prostate as their primary site of tumor or in the VA Corporate Data Warehouse with a relevant procedure or diagnosis code. These were annotated by a physician and a trained research scientist to generate data for the development and validation of our algorithm. Disagreements between annotators were adjudicated by a urologist. Our rule-based NLP algorithm was iteratively developed on 600 annotated and unannotated notes. The algorithm was validated on a manually annotated set of 250 MRI reports and 250 biopsy pathology notes from 378 patients at 78 VA centers for procedures between 2004 and 2024. Precision (true positives / (true positives + false positives)), recall (true positives / (true positives + false negatives)), and F1 score (2 * (precision * recall) / (precision + recall)) were computed to evaluate algorithm performance, with higher scores indicating better performance.
Our algorithm successfully extracted all five components from text reports with high precision and sensitivity. Item-level performance metrics for our algorithm on unseen data are reported in the table below. Post-hoc spot checking of algorithm performance on an additional sample of notes selected by procedure year revealed consistency of results across time for all components except prostate dimensions, for which a greater proportion of recent negative notes were false negatives. Further error analysis showed that many Gleason scores that our algorithm did not extract were repetitions of scores successfully extracted elsewhere in the note.
Our NLP algorithm is able to derive structured data from medical reports with highly diverse formats with excellent accuracy and reliability. This approach has great potential to facilitate data extraction in a range of settings and to drive future research and clinical questions.