Scholars@Duke publication: Efficiently answering top-k typicality queries on large databases

Efficiently answering top-k typicality queries on large databases

Publication , Conference

Hua, M; Pei, J; Fu, AWC; Lin, X; Leung, HF

Published in: 33rd International Conference on Very Large Data Bases VLDB 2007 Conference Proceedings

January 1, 2007

Finding typical instances is an effective approach to understand and analyze large data sets. In this paper, we apply the idea of typicality analysis from psychology and cognition science to database query answering, and study the novel problem of answering top-k typicality queries. We model typicality in large data sets systematically. To answer questions like "Who are the top-k most typical NBA players?", the measure of simple typicality is developed. To answer questions like "Who are the top-k most typical guards distinguishing guards from other players?", the notion of discriminative typicality is proposed. Computing the exact answer to a top-k typicality query requires quadratic time which is often too costly for online query answering on large databases. We develop a series of approximation methods for various situations. (1) The randomized tournament algorithm has linear complexity though it does not provide a theoretical guarantee on the quality of the answers. (2) The direct local typicality approximation using VP-trees provides an approximation quality guarantee. (3) A VP-tree can be exploited to index a large set of objects. Then, typicality queries can be answered efficiently with quality guarantees by a tournament method based on a Local Typicality Tree data structure. An extensive performance study using two real data sets and a series of synthetic data sets clearly show that top-k typicality queries are meaningful and our methods are practical.

Duke Scholars

Author Jian Pei Computer Science

Published In

33rd International Conference on Very Large Data Bases VLDB 2007 Conference Proceedings

Publication Date

January 1, 2007

Start / End Page

890 / 901

Citation

APA

Chicago

ICMJE

MLA

NLM

Hua, M., Pei, J., Fu, A. W. C., Lin, X., & Leung, H. F. (2007). Efficiently answering top-k typicality queries on large databases. In 33rd International Conference on Very Large Data Bases VLDB 2007 Conference Proceedings (pp. 890–901).

Hua, M., J. Pei, A. W. C. Fu, X. Lin, and H. F. Leung. “Efficiently answering top-k typicality queries on large databases.” In 33rd International Conference on Very Large Data Bases VLDB 2007 Conference Proceedings, 890–901, 2007.

Hua M, Pei J, Fu AWC, Lin X, Leung HF. Efficiently answering top-k typicality queries on large databases. In: 33rd International Conference on Very Large Data Bases VLDB 2007 Conference Proceedings. 2007. p. 890–901.

Hua, M., et al. “Efficiently answering top-k typicality queries on large databases.” 33rd International Conference on Very Large Data Bases VLDB 2007 Conference Proceedings, 2007, pp. 890–901.

Hua M, Pei J, Fu AWC, Lin X, Leung HF. Efficiently answering top-k typicality queries on large databases. 33rd International Conference on Very Large Data Bases VLDB 2007 Conference Proceedings. 2007. p. 890–901.

Published In

33rd International Conference on Very Large Data Bases VLDB 2007 Conference Proceedings

Publication Date

January 1, 2007

Start / End Page

890 / 901