Issues in assessing multi-institutional performance of BI-RADS-based CAD systems
The purpose of this study was to investigate factors that impact the generalization of breast cancer computer-aided diagnosis (CAD) systems that utilize the Breast Imaging Reporting and Data System (BI-RADS™). Data sets from four institutions were analyzed: Duke University Medical Center, University of Pennsylvania Medical Center, Massachusetts General Hospital, and Wake Forest University. The latter two data sets are subsets of the Digital Database for Screening Mammography. Each data set consisted of descriptions of mammographic lesions according to the BI-RADS lexicon, patient age, and pathology status (benign/malignant). Models were developed to predict pathology status from the BI-RADS descriptors and the patient age. Comparisons between the models built on data from the different institutions were made in terms of empirical (non-parametric) receiver operating characteristic (ROC) curves. Results suggest that BI-RADS-based CAD systems focused on specific classes of lesions may be more generally applicable than models that cover several lesion types. However, better generalization was seen in terms of the area under the ROC curve than in the partial area index (>90% sensitivity). Previous studies have illustrated the challenges in translating a BI-RADS-based CAD system from one institution to another. This study provides new insights into possible approaches to improve the generalization of BI-RADS-based CAD systems.