Scholars@Duke publication: Suppressing model overfitting in mining concept-drifting data streams

Suppressing model overfitting in mining concept-drifting data streams

Publication , Conference

Wang, H; Yin, J; Pei, J; Yu, PS; Yu, JX

Published in: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

January 1, 2006

Mining data streams of changing class distributions is important for real-time business decision support. The stream classifier must evolve to reflect the current class distribution. This poses a serious challenge. On the one hand, relying on historical data may increase the chances of learning obsolete models. On the other hand, learning only from the latest data may lead to biased classifiers, as the latest data is often an unrepresentative sample of the current class distribution. The problem is particularly acute in classifying rare events, when, for example, instances of the rare class do not even show up in the most recent training data. In this paper, we use a stochastic model to describe the concept shifting patterns and formulate this problem as an optimization one: from the historical and the current training data that we have observed, find the most-likely current distribution, and learn a classifier based on the most-likely distribution. We derive an analytic solution and approximate this solution with an efficient algorithm, which calibrates the influence of historical data carefully to create an accurate classifier. We evaluate our algorithm with both synthetic and real-world datasets. Our results show that our algorithm produces accurate and efficient classification. Copyright 2006 ACM.

Duke Scholars

Author Jian Pei Computer Science

Published In

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

DOI

10.1145/1150402.1150496

Publication Date

January 1, 2006

Volume

2006

Start / End Page

736 / 741

Citation

APA

Chicago

ICMJE

MLA

NLM

Wang, H., Yin, J., Pei, J., Yu, P. S., & Yu, J. X. (2006). Suppressing model overfitting in mining concept-drifting data streams. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Vol. 2006, pp. 736–741). https://doi.org/10.1145/1150402.1150496

Wang, H., J. Yin, J. Pei, P. S. Yu, and J. X. Yu. “Suppressing model overfitting in mining concept-drifting data streams.” In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006:736–41, 2006. https://doi.org/10.1145/1150402.1150496.

Wang H, Yin J, Pei J, Yu PS, Yu JX. Suppressing model overfitting in mining concept-drifting data streams. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006. p. 736–41.

Wang, H., et al. “Suppressing model overfitting in mining concept-drifting data streams.” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 2006, 2006, pp. 736–41. Scopus, doi:10.1145/1150402.1150496.

Wang H, Yin J, Pei J, Yu PS, Yu JX. Suppressing model overfitting in mining concept-drifting data streams. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006. p. 736–741.

Published In

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

DOI

10.1145/1150402.1150496

Publication Date

January 1, 2006

Volume

2006

Start / End Page

736 / 741