Database decomposition of a knowledge-based CAD system in mammography; An ensemble approach to improve detection
Although ensemble techniques have been investigated in supervised machine learning, their potential with knowledge-based systems is unexplored. The purpose of this study is to investigate the ensemble approach with a knowledge-based (KB) CAD system for the detection of masses in screening mammograms. The system is designed to determine the presence of a mass in a query mammographic region of interest (ROI) based on its similarity with previously acquired examples of mass and normal cases. Similarity between images is assessed using normalized mutual information. Two different approaches of knowledge database decomposition were investigated to create the ensemble. The first approach was random division of the knowledge database into a pre-specified number of equal size, separate groups. The second approach was based on k-means clustering of the knowledge cases according to common texture features extracted from the ROIs. The ensemble components were fused using a linear classifier. Based on a database of 1820 ROIs (901 masses and 919 and the leave-one-out crossvalidation scheme, the ensemble techniques improved the performance of the original KB-CAD system (Az = 0.86±0.01). Specifically, random division resulted in ROC area index of Az = 0.90 ± 0.01 while k-means clustering provided further improvement (A z = 0.91 ± 0.01). Although marginally better, the improvement was statistically significant. The superiority of the k-means clustering scheme was robust regardless of the number of clusters. This study supports the idea of incorporation of ensemble techniques with knowledge-based systems in mammography.