Indexing uncertain data

Published

Journal Article

Querying uncertain data has emerged as an important problem in data management due to the imprecise nature of many measurement data. In this paper we study answering range queries over uncertain data. Specifically, we are given a collection P of n points in ℝ, each represented by its one-dimensional probability density function (pdf). The goal is to build an index on P such that given a query interval I and a probability threshold t , we can quickly report all points of P that lie in I with probability at least t . We present various indexing schemes with linear or near-linear space and logarithmic query time. Our schemes support pdf's that are either histograms or more complex ones such as Gaussian or piecewise algebraic. They also extend to the external memory model in which the goal is to minimize the number of disk accesses when querying the index. Copyright 2009 ACM.

Full Text

Duke Authors

Cited Authors

  • Agarwal, PK; Cheng, SW; Tao, Y; Yi, K

Published Date

  • November 9, 2009

Published In

  • Proceedings of the Acm Sigact Sigmod Sigart Symposium on Principles of Database Systems

Start / End Page

  • 137 - 146

Digital Object Identifier (DOI)

  • 10.1145/1559795.1559816

Citation Source

  • Scopus