Scholars@Duke publication: Mining the most general multidimensional summarization of "probable groups" in data warehouses

Mining the most general multidimensional summarization of "probable groups" in data warehouses

Publication , Conference

Yu, H; Pei, J; Tang, S; Yang, D

Published in: Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM

December 1, 2005

Data summarization is an important data analysis task in data warehousing and online analytic processing. In this paper, we consider a novel type of summarization queries, probable group queries, such as "What are the groups of patients that have a 50% or more opportunity to get lung cancer than the average?" An aggregate cell satisfying the requirement is called a probable group. To make the answer succinct and effective, we propose that only the most general probable groups should be mined. For example, if both groups (smoking, drinking) and (smoking, *) are probable, then the former groups should not be returned. The problem of mining the most general probable groups is challenging since the probable groups can be widely scattered in the cube lattice, and do not present any monotonicity in group containment order. We extend the state-of-the-art BUC algorithm to tackle the problem, and develop techniques and heuristics to speed up the search. An extensive performance study is reported to illustrate the effect of our approach.

Duke Scholars

Author Jian Pei Computer Science

Published In

Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM

ISSN

1099-3371

Publication Date

December 1, 2005

Start / End Page

185 / 194

Citation

APA

Chicago

ICMJE

MLA

NLM

Yu, H., Pei, J., Tang, S., & Yang, D. (2005). Mining the most general multidimensional summarization of "probable groups" in data warehouses. In Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM (pp. 185–194).

Yu, H., J. Pei, S. Tang, and D. Yang. “Mining the most general multidimensional summarization of "probable groups" in data warehouses.” In Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM, 185–94, 2005.

Yu H, Pei J, Tang S, Yang D. Mining the most general multidimensional summarization of "probable groups" in data warehouses. In: Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM. 2005. p. 185–94.

Yu, H., et al. “Mining the most general multidimensional summarization of "probable groups" in data warehouses.” Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM, 2005, pp. 185–94.

Yu H, Pei J, Tang S, Yang D. Mining the most general multidimensional summarization of "probable groups" in data warehouses. Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM. 2005. p. 185–194.

Published In

Proceedings of the International Conference on Scientific and Statistical Database Management, SSDBM

ISSN

1099-3371

Publication Date

December 1, 2005

Start / End Page

185 / 194