Scholars@Duke publication: Box drawings for learning with imbalanced data

Box drawings for learning with imbalanced data

Publication , Conference

Goh, ST; Rudin, C

Published in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

January 1, 2014

The vast majority of real world classification problems are imbalanced, meaning there are far fewer data from the class of interest (the positive class) than from other classes. We propose two machine learning algorithms to handle highly imbalanced classification problems. The classifiers are disjunctions of conjunctions, and are created as unions of parallel axis rectangles around the positive examples, and thus have the benefit of being interpretable. The first algorithm uses mixed integer programming to optimize a weighted balance between positive and negative class accuracies. Regularization is introduced to improve generalization performance. The second method uses an approximation in order to assist with scalability. Specifically, it follows a \textit{characterize then discriminate} approach, where the positive class is characterized first by boxes, and then each box boundary becomes a separate discriminative classifier. This method has the computational advantages that it can be easily parallelized, and considers only the relevant regions of feature space. © 2014 ACM.

Duke Scholars

Author Cynthia D. Rudin Computer Science

Altmetric Attention Stats

Dimensions Citation Stats

Published In

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

DOI

10.1145/2623330.2623648

ISBN

9781450329569

Publication Date

January 1, 2014

Start / End Page

333 / 342

Citation

APA

Chicago

ICMJE

MLA

NLM

Goh, S. T., & Rudin, C. (2014). Box drawings for learning with imbalanced data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 333–342). https://doi.org/10.1145/2623330.2623648

Goh, S. T., and C. Rudin. “Box drawings for learning with imbalanced data.” In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 333–42, 2014. https://doi.org/10.1145/2623330.2623648.

Goh ST, Rudin C. Box drawings for learning with imbalanced data. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014. p. 333–42.

Goh, S. T., and C. Rudin. “Box drawings for learning with imbalanced data.” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, pp. 333–42. Scopus, doi:10.1145/2623330.2623648.

Goh ST, Rudin C. Box drawings for learning with imbalanced data. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014. p. 333–342.

Published In

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

DOI

10.1145/2623330.2623648

ISBN

9781450329569

Publication Date

January 1, 2014

Start / End Page

333 / 342