Stacked generalization in computer-assisted decision systems: Empirical comparison of data handling schemes
Computer-assisted decision (CAD) systems are becoming increasingly popular for the diagnostic interpretation of radiologic images. These CAD systems often involve the stacked generalization of several different decision models. Combining decision models is a common meta-analysis strategy to improve upon the diagnostic performance of each individual model. This study investigates how different data handling schemes may affect the performance evaluation of CAD systems that rely on stacked generalization. The study is based on a multistage CAD system for the detection of masses in screening mammograms. The CAD system consists of a series of knowledge-based modules that operate at Level 0 capturing morphological as well as multiscale textural information. Then, the knowledge-based predictions are combined with a Level 1 classifier. The study shows that a leave-one-out sampling scheme appears to be an effective and relatively unbiased strategy for the estimation of the overall performance of a CAD system that is based on stacked generalization. However, extra caution should be placed on the complexity of the Level 1 combiner. When the available dataset is relatively small, a relatively simple learning system such as a backpropagation neural network with very few hidden nodes is preferable to avoid optimistically biased estimates of diagnostic performance. ©2007 IEEE.