Scholars@Duke publication: Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning.

Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning.

Publication , Journal Article

D'Anniballe, VM; Tushar, FI; Faryna, K; Han, S; Mazurowski, MA; Rubin, GD; Lo, JY

Published in: BMC Med Inform Decis Mak

April 15, 2022

Published version (DOI) Open Access Copy (Duke) Link to item

BACKGROUND: There is progress to be made in building artificially intelligent systems to detect abnormalities that are not only accurate but can handle the true breadth of findings that radiologists encounter in body (chest, abdomen, and pelvis) computed tomography (CT). Currently, the major bottleneck for developing multi-disease classifiers is a lack of manually annotated data. The purpose of this work was to develop high throughput multi-label annotators for body CT reports that can be applied across a variety of abnormalities, organs, and disease states thereby mitigating the need for human annotation. METHODS: We used a dictionary approach to develop rule-based algorithms (RBA) for extraction of disease labels from radiology text reports. We targeted three organ systems (lungs/pleura, liver/gallbladder, kidneys/ureters) with four diseases per system based on their prevalence in our dataset. To expand the algorithms beyond pre-defined keywords, attention-guided recurrent neural networks (RNN) were trained using the RBA-extracted labels to classify reports as being positive for one or more diseases or normal for each organ system. Alternative effects on disease classification performance were evaluated using random initialization or pre-trained embedding as well as different sizes of training datasets. The RBA was tested on a subset of 2158 manually labeled reports and performance was reported as accuracy and F-score. The RNN was tested against a test set of 48,758 reports labeled by RBA and performance was reported as area under the receiver operating characteristic curve (AUC), with 95% CIs calculated using the DeLong method. RESULTS: Manual validation of the RBA confirmed 91-99% accuracy across the 15 different labels. Our models extracted disease labels from 261,229 radiology reports of 112,501 unique subjects. Pre-trained models outperformed random initialization across all diseases. As the training dataset size was reduced, performance was robust except for a few diseases with a relatively small number of cases. Pre-trained classification AUCs reached > 0.95 for all four disease outcomes and normality across all three organ systems. CONCLUSIONS: Our label-extracting pipeline was able to encompass a variety of cases and diseases in body CT reports by generalizing beyond strict rules with exceptional accuracy. The method described can be easily adapted to enable automated labeling of hospital-scale medical data sets for training image-based disease classifiers.

Duke Scholars

Author Maciej A Mazurowski Biostatistics & Bioinformatics, Division of Translational Bi ...

Author Fakrul Islam Tushar

Author Geoffrey D Rubin Radiology

Author Joseph Yuan-Chieh Lo Radiology

Altmetric Attention Stats

Dimensions Citation Stats

Published In

BMC Med Inform Decis Mak

DOI

10.1186/s12911-022-01843-4

EISSN

1472-6947

Publication Date

April 15, 2022

Volume

Issue

Start / End Page

102

Location

England

Related Subject Headings

Tomography, X-Ray Computed
Pelvis
Neural Networks, Computer
Medical Informatics
Humans
Deep Learning
Abdomen
4203 Health services and systems
1103 Clinical Sciences
0806 Information Systems

Citation

APA

Chicago

ICMJE

MLA

NLM

D’Anniballe, V. M., Tushar, F. I., Faryna, K., Han, S., Mazurowski, M. A., Rubin, G. D., & Lo, J. Y. (2022). Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning. BMC Med Inform Decis Mak, 22(1), 102. https://doi.org/10.1186/s12911-022-01843-4

D’Anniballe, Vincent M., Fakrul Islam Tushar, Khrystyna Faryna, Songyue Han, Maciej A. Mazurowski, Geoffrey D. Rubin, and Joseph Y. Lo. “Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning.” BMC Med Inform Decis Mak 22, no. 1 (April 15, 2022): 102. https://doi.org/10.1186/s12911-022-01843-4.

D’Anniballe VM, Tushar FI, Faryna K, Han S, Mazurowski MA, Rubin GD, et al. Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning. BMC Med Inform Decis Mak. 2022 Apr 15;22(1):102.

D’Anniballe, Vincent M., et al. “Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning.” BMC Med Inform Decis Mak, vol. 22, no. 1, Apr. 2022, p. 102. Pubmed, doi:10.1186/s12911-022-01843-4.

D’Anniballe VM, Tushar FI, Faryna K, Han S, Mazurowski MA, Rubin GD, Lo JY. Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning. BMC Med Inform Decis Mak. 2022 Apr 15;22(1):102.

Published In

BMC Med Inform Decis Mak

DOI

10.1186/s12911-022-01843-4

EISSN

1472-6947

Publication Date

April 15, 2022

Volume

Issue

Start / End Page

102

Location

England

Related Subject Headings

Tomography, X-Ray Computed
Pelvis
Neural Networks, Computer
Medical Informatics
Humans
Deep Learning
Abdomen
4203 Health services and systems
1103 Clinical Sciences
0806 Information Systems