The pipelined set cover problem

Journal Article

A classical problem in query optimization is to find the optimal ordering of a set of possibly correlated selections. We provide an abstraction of this problem as a generalization of set cover called pipelined set cover, where the sets are applied sequentially to the elements to be covered and the elements covered at each stage are discarded. We show that several natural heuristics for this NP-hard problem, such as the greedy set-cover heuristic and a local-search heuristic, can be analyzed using a linear-programming framework. These heuristics lead to efficient algorithms for pipelined set cover that can be applied to order possibly correlated selections in conventional database systems as well as data-stream processing systems. We use our linear-programming framework to show that the greedy and local-search algorithms are 4-approximations for pipelined set cover. We extend our analysis to minimize the lp-norm of the costs paid by the sets, where p ≥ 2 is an integer, to examine the improvement in performance when the total cost has increasing contribution from initial sets in the pipeline. Finally, we consider the online version of pipelined set cover and present a competitive algorithm with a logarithmic performance guarantee. Our analysis framework may be applicable to other problems in query optimization where it is important to account for correlations. © 2005 Springer-Verlag.

Full Text

Duke Authors

Cited Authors

  • Munagala, K; Babu, S; Motwani, R; Widom, J

Published Date

  • December 1, 2005

Published In

Volume / Issue

  • 3363 LNCS /

Start / End Page

  • 83 - 98

Electronic International Standard Serial Number (EISSN)

  • 1611-3349

International Standard Serial Number (ISSN)

  • 0302-9743

Digital Object Identifier (DOI)

  • 10.1007/978-3-540-30570-5_6

Citation Source

  • Scopus