2.5D CNN model for detecting lung disease using weak supervision
Our goal is to develop a 2.5D CNN model to detect multiple diseases in multiple organs in CT scans. In this study we investigated detection of 4 common diseases in the lungs, which are atelectasis, edema, pneumonia and nodule. Most existing algorithms for computer-aided diagnosis (CAD) of CT use 2D models for the axial slices. Our hypothesis is that by using information from all of the three views (coronal, sagittal and axial), we may achieve a better classification result, because some diseases may be more obvious from a different view or from the combination of multi-views. Our data consisted of 1089 CT scans, which contains 288 normal cases, 224 atelectasis cases, 156 edema cases, 225 pneumonia cases and 196 nodule cases. The cases were selected from approximately 5,000 chest CTs from Duke University Health System, and case-level labels were automatically extracted by simple rule-based filtering of the unstructured text from the radiology report. Each of these 5 categories excluded the others, which indicates that cases from each category will have either only one of the four diseases or no disease. To create 2.5D volume patches, we combined together three channels representing parallel slices in each of the three intersecting, orthogonal directions, resulting in sparsely sampled cubes of 20.2 x 20.2 x 20.2 mm. For each CT scan, the volume containing the lungs was identified with thresholding, and 30 patches were randomly sampled within that volume. Then three 3-channel images in each patch representing those 3 different directions were entered into 3 independent CNN paths separately, which were finally fused by a fully connected layer. We used a 4 fold cross-validation and evaluated our results using receiver operating characteristic (ROC) area under the curve (AUC). We achieved an average AUC of 0.891 for classifying normal vs. atelectasis disease, 0.940 for edema disease, 0.869 for pneumonia disease and 0.784 for nodule disease. We also implemented a train-validation-test process for each disease to evaluate the generalization of our model and again got comparable test results, 0.818 for atelectasis, 0.963 for edema, 0.878 for pneumonia and 0.784 for nodule. Despite the limitation of the small dataset scale, we demonstrated that we developed a generalizable 2.5D CNN model for detection of multiple lung diseases.