Skip to main content

Classification Trees for Imbalanced Data: Surface-to-Volume Regularization

Publication ,  Journal Article
Zhu, Y; Li, C; Dunson, DB
Published in: Journal of the American Statistical Association
January 1, 2023

Classification algorithms face difficulties when one or more classes have limited training data. We are particularly interested in classification trees, due to their interpretability and flexibility. When data are limited in one or more of the classes, the estimated decision boundaries are often irregularly shaped due to the limited sample size, leading to poor generalization error. We propose a novel approach that penalizes the Surface-to-Volume Ratio (SVR) of the decision set, obtaining a new class of SVR-Tree algorithms. We develop a simple and computationally efficient implementation while proving estimation consistency for SVR-Tree and rate of convergence for an idealized empirical risk minimizer of SVR-Tree. SVR-Tree is compared with multiple algorithms that are designed to deal with imbalance through real data applications. Supplementary materials for this article are available online.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Journal of the American Statistical Association

DOI

EISSN

1537-274X

ISSN

0162-1459

Publication Date

January 1, 2023

Volume

118

Issue

543

Start / End Page

1707 / 1717

Related Subject Headings

  • Statistics & Probability
  • 4905 Statistics
  • 3802 Econometrics
  • 1603 Demography
  • 1403 Econometrics
  • 0104 Statistics
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Zhu, Y., Li, C., & Dunson, D. B. (2023). Classification Trees for Imbalanced Data: Surface-to-Volume Regularization. Journal of the American Statistical Association, 118(543), 1707–1717. https://doi.org/10.1080/01621459.2021.2005609
Zhu, Y., C. Li, and D. B. Dunson. “Classification Trees for Imbalanced Data: Surface-to-Volume Regularization.” Journal of the American Statistical Association 118, no. 543 (January 1, 2023): 1707–17. https://doi.org/10.1080/01621459.2021.2005609.
Zhu Y, Li C, Dunson DB. Classification Trees for Imbalanced Data: Surface-to-Volume Regularization. Journal of the American Statistical Association. 2023 Jan 1;118(543):1707–17.
Zhu, Y., et al. “Classification Trees for Imbalanced Data: Surface-to-Volume Regularization.” Journal of the American Statistical Association, vol. 118, no. 543, Jan. 2023, pp. 1707–17. Scopus, doi:10.1080/01621459.2021.2005609.
Zhu Y, Li C, Dunson DB. Classification Trees for Imbalanced Data: Surface-to-Volume Regularization. Journal of the American Statistical Association. 2023 Jan 1;118(543):1707–1717.

Published In

Journal of the American Statistical Association

DOI

EISSN

1537-274X

ISSN

0162-1459

Publication Date

January 1, 2023

Volume

118

Issue

543

Start / End Page

1707 / 1717

Related Subject Headings

  • Statistics & Probability
  • 4905 Statistics
  • 3802 Econometrics
  • 1603 Demography
  • 1403 Econometrics
  • 0104 Statistics