Exceeding expectations and clustering uncertain data

Conference Paper

Database technology is playing an increasingly important role in understanding and solving large-scale and complex scientific and societal problems and phenomena, for instance, understanding biological networks, climate modeling, electronic markets, etc. In these settings, uncertainty or imprecise information is a pervasive issue that becomes a serious impediment to understanding and effiectively utilizing such systems. Clustering is one of the key problems in this context. In this paper we focus on the problem of clustering, specifically the k-center problem. Since the problem is NP-Hard in deterministic setting, a natural avenue is to consider approximation algorithms with a bounded performance ratio. In an earlier paper Cormode and McGregor had considered certain variants of this problem, but failed to provide approximations that preserved the number of centers. In this paper we remedy the situation and provide true approximation algorithms for a wider class of these problems. However, the key aspect of this paper is to devise general techniques for optimization under uncertainty. We show that a particular formulation which uses the contribution of a random variable above its expectation is useful in this context. We believe these techniques will find wider applications in optimization under uncertainty. Copyright 2009 ACM.

Full Text

Duke Authors

Cited Authors

  • Guha, S; Munagala, K

Published Date

  • November 9, 2009

Published In

  • Proceedings of the Acm Sigact Sigmod Sigart Symposium on Principles of Database Systems

Start / End Page

  • 269 - 277

International Standard Book Number 13 (ISBN-13)

  • 9781605585536

Digital Object Identifier (DOI)

  • 10.1145/1559795.1559836

Citation Source

  • Scopus