Skip to main content
Journal cover image

Operator scheduling in data stream systems

Publication ,  Journal Article
Babcock, B; Babu, S; Datar, M; Motwani, R; Thomas, D
Published in: VLDB Journal
December 1, 2004

In many applications involving continuous data streams, data arrival is bursty and data rate fluctuates over time. Systems that seek to give rapid or real-time query responses in such an environment must be prepared to deal gracefully with bursts in data arrival without compromising system performance. We discuss one strategy for processing bursty streams - adaptive, load-aware scheduling of query operators to minimize resource consumption during times of peak load. We show that the choice of an operator scheduling strategy can have significant impact on the runtime system memory usage as well as output latency. Our aim is to design a scheduling strategy that minimizes the maximum runtime system memory while maintaining the output latency within prespecified bounds. We first present Chain scheduling, an operator scheduling strategy for data stream systems that is near-optimal in minimizing runtime memory usage for any collection of single-stream queries involving selections, projections, and foreign-key joins with stored relations. Chain scheduling also performs well for queries with sliding-window joins over multiple streams and multiple queries of the above types. However, during bursts in input streams, when there is a buildup of unprocessed tuples, Chain scheduling may lead to high output latency. We study the online problem of minimizing maximum runtime memory, subject to a constraint on maximum latency. We present preliminary observations, negative results, and heuristics for this problem. A thorough experimental evaluation is provided where we demonstrate the potential benefits of Chain scheduling and its different variants, compare it with competing scheduling strategies, and validate our analytical conclusions.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

VLDB Journal

DOI

ISSN

1066-8888

Publication Date

December 1, 2004

Volume

13

Issue

4

Start / End Page

333 / 353

Related Subject Headings

  • Information Systems
  • 4605 Data management and data science
  • 0806 Information Systems
  • 0805 Distributed Computing
  • 0804 Data Format
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Babcock, B., Babu, S., Datar, M., Motwani, R., & Thomas, D. (2004). Operator scheduling in data stream systems. VLDB Journal, 13(4), 333–353. https://doi.org/10.1007/s00778-004-0132-6
Babcock, B., S. Babu, M. Datar, R. Motwani, and D. Thomas. “Operator scheduling in data stream systems.” VLDB Journal 13, no. 4 (December 1, 2004): 333–53. https://doi.org/10.1007/s00778-004-0132-6.
Babcock B, Babu S, Datar M, Motwani R, Thomas D. Operator scheduling in data stream systems. VLDB Journal. 2004 Dec 1;13(4):333–53.
Babcock, B., et al. “Operator scheduling in data stream systems.” VLDB Journal, vol. 13, no. 4, Dec. 2004, pp. 333–53. Scopus, doi:10.1007/s00778-004-0132-6.
Babcock B, Babu S, Datar M, Motwani R, Thomas D. Operator scheduling in data stream systems. VLDB Journal. 2004 Dec 1;13(4):333–353.
Journal cover image

Published In

VLDB Journal

DOI

ISSN

1066-8888

Publication Date

December 1, 2004

Volume

13

Issue

4

Start / End Page

333 / 353

Related Subject Headings

  • Information Systems
  • 4605 Data management and data science
  • 0806 Information Systems
  • 0805 Distributed Computing
  • 0804 Data Format