Skip to main content

The effect of data set size on computer-aided diagnosis of breast cancer: Comparing decision fusion to a linear discriminant

Publication ,  Journal Article
Jesneck, JL; Nolte, LW; Baker, JA; Lo, JY
Published in: Progress in Biomedical Optics and Imaging - Proceedings of SPIE
June 23, 2006

Data sets with relatively few observations (cases) in medical research are common, especially if the data are expensive or difficult to collect. Such small sample sizes usually do not provide enough information for computer models to learn data patterns well enough for good prediction and generalization. As a model that may be able to maintain good classification performance in the presence of limited data, we used decision fusion. In this study, we investigated the effect of sample size on the generalization ability of both linear discriminant analysis (LDA) and decision fusion. Subsets of large data sets were selected by a bootstrap sampling method, which allowed us to estimate the mean and standard deviation of the classification performance as a function of data set size. We applied the models to two breast cancer data sets and compared the models using receiver operating characteristic (ROC) analysis. For the more challenging calcification data set, decision fusion reached its maximum classification performance of AUC = 0.80±0.04 at 50 samples and pAUC = 0.34±0.05 at 100 samples. The LDA reached a lower performance and required many more cases, with a maximum of AUC = 0.68±0.04 and pAUC = 0.12±0.05 at 450 samples. For the mass data set, the two classifiers had more similar performance, with AUC = 0.92±0.02 and pAUC = 0.48±0.02 at 50 samples for decision fusion and AUC = 0.92±0.03 and pAUC = 0.55±0.04 at 500 samples for the LDA.

Duke Scholars

Published In

Progress in Biomedical Optics and Imaging - Proceedings of SPIE

DOI

ISSN

1605-7422

Publication Date

June 23, 2006

Volume

6146
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Jesneck, J. L., Nolte, L. W., Baker, J. A., & Lo, J. Y. (2006). The effect of data set size on computer-aided diagnosis of breast cancer: Comparing decision fusion to a linear discriminant. Progress in Biomedical Optics and Imaging - Proceedings of SPIE, 6146. https://doi.org/10.1117/12.655235
Jesneck, J. L., L. W. Nolte, J. A. Baker, and J. Y. Lo. “The effect of data set size on computer-aided diagnosis of breast cancer: Comparing decision fusion to a linear discriminant.” Progress in Biomedical Optics and Imaging - Proceedings of SPIE 6146 (June 23, 2006). https://doi.org/10.1117/12.655235.
Jesneck JL, Nolte LW, Baker JA, Lo JY. The effect of data set size on computer-aided diagnosis of breast cancer: Comparing decision fusion to a linear discriminant. Progress in Biomedical Optics and Imaging - Proceedings of SPIE. 2006 Jun 23;6146.
Jesneck, J. L., et al. “The effect of data set size on computer-aided diagnosis of breast cancer: Comparing decision fusion to a linear discriminant.” Progress in Biomedical Optics and Imaging - Proceedings of SPIE, vol. 6146, June 2006. Scopus, doi:10.1117/12.655235.
Jesneck JL, Nolte LW, Baker JA, Lo JY. The effect of data set size on computer-aided diagnosis of breast cancer: Comparing decision fusion to a linear discriminant. Progress in Biomedical Optics and Imaging - Proceedings of SPIE. 2006 Jun 23;6146.

Published In

Progress in Biomedical Optics and Imaging - Proceedings of SPIE

DOI

ISSN

1605-7422

Publication Date

June 23, 2006

Volume

6146