The effect of data set size on computer-aided diagnosis of breast cancer: Comparing decision fusion to a linear discriminant
Data sets with relatively few observations (cases) in medical research are common, especially if the data are expensive or difficult to collect. Such small sample sizes usually do not provide enough information for computer models to learn data patterns well enough for good prediction and generalization. As a model that may be able to maintain good classification performance in the presence of limited data, we used decision fusion. In this study, we investigated the effect of sample size on the generalization ability of both linear discriminant analysis (LDA) and decision fusion. Subsets of large data sets were selected by a bootstrap sampling method, which allowed us to estimate the mean and standard deviation of the classification performance as a function of data set size. We applied the models to two breast cancer data sets and compared the models using receiver operating characteristic (ROC) analysis. For the more challenging calcification data set, decision fusion reached its maximum classification performance of AUC = 0.80±0.04 at 50 samples and pAUC = 0.34±0.05 at 100 samples. The LDA reached a lower performance and required many more cases, with a maximum of AUC = 0.68±0.04 and pAUC = 0.12±0.05 at 450 samples. For the mass data set, the two classifiers had more similar performance, with AUC = 0.92±0.02 and pAUC = 0.48±0.02 at 50 samples for decision fusion and AUC = 0.92±0.03 and pAUC = 0.55±0.04 at 500 samples for the LDA.