Attention-guided classification of abnormalities in semi-structured computed tomography reports
Lack of annotated data is a major challenge to machine learning algorithms, particularly in the field of radiology. Algorithms that can efficiently extract labels in a fast and precise manner are in high demand. Weak supervision is a compromise solution, particularly, when dealing with imaging modalities like Computed Tomography (CT), where the number of slices can reach 1000 per case. Radiology reports store crucial information about clinicians' findings and observations in CT slices. Automatic generation of labels from CT reports is not a trivial task due to the complexity of sentences and diversity of expression in free-text narration. In this study, we focus on abnormality classification in lungs, liver and kidneys. Firstly, a rule-based model is used to extract weak labels at the case level. Afterwards, attention guided recurrent neural network (RNN) is trained to perform binary classification of radiology reports in terms of whether the organ is normal or abnormal. Additionally, a multi-label RNN with attention mechanism is trained to perform binary classification by aggregating its output for four representative diseases (lungs: emphysema, mass-nodule, effusion and atelectasis-pneumonia; liver: dilatation, fatty infiltration-steatosis, calcification-stone-gallstone, lesion-mass; kidneys: atrophy, cyst, stone-calculi, lesion) into a single abnormal class. Performance has been evaluated using the receiver operating characteristic (ROC) area under the curve (AUC) on 274, 306 and 278 reports for lungs, liver and kidneys correspondingly, manually annotated by radiology experts. The change in performance was evaluated for different sizes of training dataset for lungs. The AUCs of multi-label pretrained models: lungs - 0.929, liver - 0.840, kidney - 0.844; multi-label models: lungs - 0.903, liver - 0.848, kidney - 0.906; binary pretrained models: lungs - 0.922, liver - 0.826, kidneys - 0.928.