Skip to main content
Journal cover image

Text categorization with diversity random forests

Publication ,  Chapter
Yang, C; Yin, XC; Huang, K
January 1, 2014

Text categorization (TC), has many typical traits, such as large and difficult category taxonomies, noise and incremental data, etc. Random Forests, one of the most important but simple state-of-the-art ensemble methods, has been used to solve such type of subjects with good performance. most current Random Forests approaches with diversity-related issues focus on maximizing tree diversity while producing and training component trees. There are much diverse characteristics for component trees in TC trained on data of noise, huge categories and features. Consequently, given numerous component trees from the original Random Forests, we propose a novel method, Diversity Random Forests, which diversely and adaptively select and combine tree classifiers with diversity learning and sample weighting. Diversity Random Forests includes two key issues. First, by designing a matrix for the data distribution creatively, we formulate a unified optimization model for learning and selecting diverse trees, where tree weights are learned through a convex quadratic programming problem with given sample weights. Second, we propose a new self-training algorithm to iteratively run the convex optimization and automatically learn the sample weights. Extensive experiments on a variety of text categorization benchmark data sets show that the proposed approach consistently outperforms state-of-the-art methods.

Duke Scholars

DOI

ISBN

9783319126425

Publication Date

January 1, 2014

Volume

8836

Start / End Page

317 / 324

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Yang, C., Yin, X. C., & Huang, K. (2014). Text categorization with diversity random forests (Vol. 8836, pp. 317–324). https://doi.org/10.1007/978-3-319-12643-2_39
Yang, C., X. C. Yin, and K. Huang. “Text categorization with diversity random forests,” 8836:317–24, 2014. https://doi.org/10.1007/978-3-319-12643-2_39.
Yang C, Yin XC, Huang K. Text categorization with diversity random forests. In 2014. p. 317–24.
Yang, C., et al. Text categorization with diversity random forests. Vol. 8836, 2014, pp. 317–24. Scopus, doi:10.1007/978-3-319-12643-2_39.
Yang C, Yin XC, Huang K. Text categorization with diversity random forests. 2014. p. 317–324.
Journal cover image

DOI

ISBN

9783319126425

Publication Date

January 1, 2014

Volume

8836

Start / End Page

317 / 324

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences