Skip to main content

Shivnath Babu

Adjunct Associate Professor of Computer Science
Computer Science
Box 90129, Durham, NC 27708-0129
D338 LSRC, Durham, NC 27708

Selected Publications


Black or White? How to Develop an AutoTuner for Memory-based Analytics

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · June 14, 2020 There is a lot of interest today in building autonomous (or, self-driving) data processing systems. An emerging school of thought is to leverage AI-driven "black box" algorithms for this purpose. In this paper, we present a contrarian view. We study the pr ... Full text Cite

Reflective control for an elastic cloud application: An automated experiment workbench

Conference Workshop on Hot Topics in Cloud Computing, HotCloud 2009 · January 1, 2020 © Workshop on Hot Topics in Cloud Computing, HotCloud 2009.All right reserved. This paper addresses “reflective” control for applications that use server resources from a shared cloud infrastructure opportunistically. In this approach, an external reflecti ... Cite

Reflective control for an elastic cloud application: An automated experiment workbench

Conference Workshop on Hot Topics in Cloud Computing, HotCloud 2009 · January 1, 2020 © Workshop on Hot Topics in Cloud Computing, HotCloud 2009.All right reserved. This paper addresses “reflective” control for applications that use server resources from a shared cloud infrastructure opportunistically. In this approach, an external reflecti ... Cite

MIFO: A query-semantic aware resource allocation policy

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · June 25, 2019 Data Analytics Frameworks encourage sharing of clusters for execution of mixed workloads by promising fairness and isolation along with high performance and resource utilization. However, concurrent query executions on such shared clusters result in increa ... Full text Cite

Cost-effective, workload-adaptive migration of big data applications to the cloud

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · June 25, 2019 More than 10,000 enterprises worldwide use the big data stack composed of multiple distributed systems. At Unravel, we build the next-generation APM platform for the big data stack, and we have worked with a representative sample of these enterprises that ... Full text Cite

iQCAR: inter-Query Contention Analyzer for Data Analytics Frameworks.

Conference Proceedings. ACM-SIGMOD International Conference on Management of Data · June 2019 Resource interferences caused by concurrent queries is one of the key reasons for unpredictable performance and missed workload SLAs in cluster computing systems. Analyzing these inter-query resource interactions is critical in order to answer time-sensiti ... Full text Cite

Automated performance management for the big data stack

Conference CIDR 2019 - 9th Biennial Conference on Innovative Data Systems Research · January 1, 2019 More than 10,000 enterprises worldwide today use the big data stack that is composed of multiple distributed systems. At Unravel, we have worked with a representative sample of these enterprises that covers most industry verticals. This sample also covers ... Cite

iQCAR: A Demonstration of an Inter-Query Contention Analyzer for Cluster Computing Frameworks.

Conference Proceedings. ACM-SIGMOD International Conference on Management of Data · June 2018 Unpredictability in query runtimes can arise in a shared cluster as a result of resource contentions caused by inter-query interactions. iQCAR - inter Query Contention AnalyzeR is a system that formally models these inter ... Full text Cite

Speedup your analytics: Automatic parameter tuning for databases and big data systems

Conference Proceedings of the VLDB Endowment · January 1, 2018 Database and big data analytics systems such as Hadoop and Spark have a large number of configuration parameters that control memory distribution, I/O optimization, parallelism, and compression. Improper parameter settings can cause significant performance ... Full text Cite

Thoth in action: Memory management in modern data analytics

Conference Proceedings of the VLDB Endowment · August 1, 2017 Allocation and usage of memory in modern data-processing platforms is based on an interplay of algorithms at multiple levels: (i) at the resource-management level across containers allocated by resource managers like Mesos and Yarn, (ii) at the container l ... Full text Cite

ROBUS: Fair cache allocation for data-parallel workloads

Conference Proceedings of the ACM SIGMOD International Conference on Management of Data · May 9, 2017 Systems for processing big data-e.g., Hadoop, Spark, and massively parallel databases-need to run workloads on behalf of multiple tenants simultaneously. The abundant disk-based storage in these systems is usually complemented by a smaller, but much faster ... Full text Cite

Cümülön: MatrixBased data analytics in the cloud with spot instances

Chapter · January 1, 2016 We describe Cümülön, a system aimed at helping users develop and deploy matrix-based data analysis programs in a public cloud. A key feature of Cümülön is its end-to-end support for the so-called spot instances-machines whose market price fluctuates over t ... Cite

Tempo: Robust and self-tuning resource management in multi-tenant parallel databases

Conference Proceedings of the VLDB Endowment · January 1, 2016 Multi-tenant database systems have a component called the Resource Manager, or RM that is responsible for allocating resources to tenants. RMs today do not provide direct support for performance objectives such as: "Average job response time of tenant A mu ... Full text Cite

Tutorial: SQL-on-hadoop systems

Chapter · January 1, 2015 Full text Cite

Execution and optimization of continuous windowed aggregation queries

Journal Article Proceedings - International Conference on Data Engineering · January 1, 2014 The desire of companies to analyze web-site activity data quickly in order to show personalized content and advertisements to users has led to renewed interest in continuous query processing. One important query class here is windowed aggregation which doe ... Full text Cite

Proceedings of the 3rd Workshop on Data Analytics in the Cloud, DanaC 2014 - In Conjunction with ACM SIGMOD/PODS Conference: Foreword

Journal Article Proceedings of the 3rd Workshop on Data Analytics in the Cloud, DanaC 2014 - In Conjunction with ACM SIGMOD/PODS Conference · January 1, 2014 Cite

Thoth: Towards managing a multi-system cluster

Journal Article Proceedings of the VLDB Endowment · January 1, 2014 Following the 'no one size fits all' philosophy, active research in big data platforms is focusing on creating an environment for multiple 'one-size' systems to co-exist and cooperate in the same cluster. Consequently, it has now become imperative to provi ... Full text Cite

PStorM: Profile storage and matching for feedback-based tuning of MapReduce jobs

Conference Advances in Database Technology - EDBT 2014: 17th International Conference on Extending Database Technology, Proceedings · January 1, 2014 The MapReduce programming model has become widely adopted for large scale analytics on big data. MapReduce systems such as Hadoop have many tuning parameters, many of which have a significant impact on performance. The map and reduce functions that make up ... Full text Cite

Workload management for big data analytics

Journal Article Proceedings - International Conference on Data Engineering · August 15, 2013 Parallel database systems and MapReduce systems (most notably Hadoop) are essential components of today's infrastructure for Big Data analytics. These systems process multiple concurrent workloads consisting of complex user requests, where each request is ... Full text Cite

Workload management for big data analytics

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · July 29, 2013 It is our great pleasure to welcome you to the 2013 ACM SIGMOD Conference on Management of Data, SIGMOD'13. This year the conference is being held in New York City, at the Millennium Broadway Hotel in the Times Square theater district. New York City provid ... Full text Cite

Execution and optimization of continuous queries with cyclops

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · July 29, 2013 As the data collected by enterprises grows in scale, there is a growing trend of performing data analytics on large datasets. Batch processing systems that can handle petabyte scale of data, such as Hadoop, have flourished and gained traction in the indust ... Full text Cite

Cumulon: Optimizing statistical data analysis in the cloud

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · July 29, 2013 We present Cumulon, a system designed to help users rapidly develop and intelligently deploy matrix-based big-data analysis programs in the cloud. Cumulon features a flexible execution model and new operators especially suited for such workloads. We show h ... Full text Cite

Rapid experimentation for testing and tuning a production database deployment

Conference ACM International Conference Proceeding Series · May 2, 2013 The need to perform testing and tuning of database instances with production-like workloads (W), configurations (C), data (D), and resources (R) arises routinely. The further W, C, D, and R used in testing and tuning deviate from what is observed on the pr ... Full text Cite

How to fit when no one size fits

Conference CIDR 2013 - 6th Biennial Conference on Innovative Data Systems Research · January 1, 2013 While “no one size fits all” is a sound philosophy for system designers to follow, it poses multiple challenges for application developers and system administrators. It can be hard for an application developer to pick one system when the needs of her appli ... Cite

A practical concurrent index for solid-state drives

Journal Article ACM International Conference Proceeding Series · December 19, 2012 Solid-state drives are becoming a viable alternative to magnetic disks in database systems, but their performance characteristics, particularly those caused by their erase-before-write behavior, make conventional database indexes a poor fit. There have bee ... Full text Cite

Massively parallel databases and MapReduce systems

Journal Article Foundations and Trends in Databases · December 1, 2012 Timely and cost-effective analytics over "big data" has emerged as a key ingredient for success in many businesses, scientific and engineering disciplines, and government endeavors. Web clicks, social media, scientific experiments, and datacenter monitorin ... Full text Cite

Stubby: A transformation-based optimizer for MapReduce Workflows

Journal Article Proceedings of the VLDB Endowment · January 1, 2012 There is a growing trend of performing analysis on large datasets using workflows composed of MapReduce jobs connected through producer-consumer relationships based on data. This trend has spurred the development of a number of interfaces-ranging from prog ... Full text Cite

No one (cluster) size fits all: Automatic cluster sizing for data-intensive analytics

Journal Article Proceedings of the 2nd ACM Symposium on Cloud Computing, SOCC 2011 · November 30, 2011 Infrastructure-as-a-Service (IaaS) cloud platforms have brought two unprecedented changes to cluster provisioning practices. First, any (nonexpert) user can provision a cluster of any size on the cloud within minutes to run her data-processing jobs. The us ... Full text Cite

Starfish: A self-tuning system for big data analytics

Journal Article CIDR 2011 - 5th Biennial Conference on Innovative Data Systems Research, Conference Proceedings · October 11, 2011 Timely and cost-effective analytics over "Big Data" is now a key ingredient for success in many businesses, scientific and engineering disciplines, and government endeavors. The Hadoop software stack-which consists of an extensible MapReduce execution engi ... Cite

Proactive detection and repair of data corruption: Towards a hasslefree declarative approach with Amulet

Journal Article Proceedings of the VLDB Endowment · August 1, 2011 Occasional corruption of stored data is an unfortunate byproduct of the complexity of modern systems. Hardware errors, software bugs, and mistakes by human administrators can corrupt important sources of data. The dominant practice to deal with data corrup ... Cite

Mapreduce programming and costbased optimization? Crossing this chasm with starfish

Journal Article Proceedings of the VLDB Endowment · August 1, 2011 MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph ... Cite

Interaction-aware scheduling of report-generation workloads

Journal Article VLDB Journal · August 1, 2011 The typical workload in a database system consists of a mix of multiple queries of different types that run concurrently. Interactions among the different queries in a query mix can have a significant impact on database performance. Hence, optimizing datab ... Full text Cite

Dealing proactively with data corruption: Challenges and opportunities

Journal Article Proceedings - International Conference on Data Engineering · June 10, 2011 The danger of production or backup data becoming corrupted is a problem that database administrators dread. This position paper aims to bring this problem to the attention of the database research community, which, surprisingly, has by and large overlooked ... Full text Cite

Profiling, what-if analysis, and costbased optimization of mapreduce programs

Journal Article Proceedings of the VLDB Endowment · January 1, 2011 MapReduce has emerged as a viable competitor to database systems in big data analytics. MapReduce programs are being written for a wide variety of application domains including business data processing, text analysis, natural language processing, Web graph ... Full text Cite

Query optimization techniques for partitioned tables

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · January 1, 2011 Table partitioning splits a table into smaller parts that can be accessed, stored, and maintained independent of one another. From their traditional use in improving query performance, partitioning strategies have evolved into a powerful mechanism to impro ... Full text Cite

Warding off the dangers of data corruption with amulet

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · January 1, 2011 Occasional corruption of stored data is an unfortunate byproduct of the complexity of modern systems. Hardware errors, software bugs, and mistakes by human administrators can corrupt important sources of data. The dominant practice to deal with data corrup ... Full text Cite

Predicting completion times of batch query workloads using interaction-aware models and simulation

Journal Article ACM International Conference Proceeding Series · January 1, 2011 A question that database administrators (DBAs) routinely need to answer is how long a batch query workload will take to complete. This question arises, for example, while planning the execution of different report-generation workloads to fit within availab ... Full text Cite

Towards automatic optimization of MapReduce programs

Journal Article Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC '10 · July 30, 2010 Timely and cost-effective processing of large datasets has become a critical ingredient for the success of many academic, government, and industrial organizations. The combination of MapReduce frameworks and cloud computing is an attractive proposition for ... Full text Cite

iTuned: A tool for configuring and visualizing database parameters

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · July 23, 2010 iTuned is a tool that takes a SQL workload as input and recommends good settings for database configuration parameters such as buffer pool sizes, multi-programming level, and number of I/O daemons. iTuned also provides response-surface and sensitivity-anal ... Full text Cite

Automated control for elastic storage

Journal Article Proceeding of the 7th International Conference on Autonomic Computing, ICAC '10 and Co-located Workshops · July 23, 2010 Elasticity - where systems acquire and release resources in response to dynamic workloads, while paying only for what they need - is a driving property of cloud computing. At the core of any elastic system is an automated controller. This paper addresses e ... Full text Cite

Interaction-aware prediction of business intelligence workload completion times

Journal Article Proceedings - International Conference on Data Engineering · June 1, 2010 While planning the execution of report-generation workloads, database administrators often need to know how long different query workloads will take to run. Database systems run mixes of multiple queries of different types concurrently. Hence, estimating t ... Full text Cite

Message from the SMDB'10 workshop organizers

Journal Article Proceedings - International Conference on Data Engineering · May 28, 2010 Full text Cite

Xplus: A SQL-Tuning-Aware Query Optimizer

Journal Article Proceedings of the VLDB Endowment · January 1, 2010 The need to improve a suboptimal execution plan picked by the query optimizer for a repeatedly run SQL query arises routinely. Complex expressions, skewed or correlated data, and changing conditions can cause the optimizer to make mistakes. For example, th ... Full text Cite

Automated control in cloud computing: Challenges and opportunities

Journal Article Proceedings of the 1st Workshop on Automated Control for Datacenters and Clouds, ACDC '09 · November 30, 2009 With advances in virtualization technology, virtual machine services offered by cloud utility providers are becoming increasingly powerful, anchoring the ecosystem of cloud services. Virtual computing services are attractive in part because they enable cus ... Cite

Automated SQL tuning through trial and (sometimes) error

Journal Article Proceedings of the 2nd International Workshop on Testing Database Systems, DBTest '09 · November 23, 2009 SQL tuning - the attempt to improve a poorly-performing execution plan produced by the database query optimizer - is a critical aspect of database performance tuning. Ironically, as commercial databases strive to improve on the manageability front, SQL tun ... Full text Cite

Query interactions in database workloads

Journal Article Proceedings of the 2nd International Workshop on Testing Database Systems, DBTest '09 · November 23, 2009 Database workloads consist of mixes of queries that run concurrently and interact with each other. In this paper, we demonstrate that query interactions can have a significant impact on database system performance. Hence, we argue that it is important to t ... Full text Cite

Automated diagnosis of system failures with Fa

Journal Article Proceedings - International Conference on Data Engineering · July 8, 2009 Full text Cite

Fa: A system for automating failure diagnosis

Conference Proceedings - International Conference on Data Engineering · July 8, 2009 Failures of Internet services and enterprise systems lead to user dissatisfaction and considerable loss of revenue. Since manual diagnosis is often laborious and slow, there is considerable interest in tools that can diagnose the cause of failures quickly ... Full text Cite

Shaman: A self-healing database system

Journal Article Proceedings - International Conference on Data Engineering · July 8, 2009 Full text Cite

DIADS: Addressing the “my-problem-or-yours” syndrome with integrated SAN and database diagnosis

Conference Proceedings of the 7th USENIX Conference on File and Storage Technologies, FAST 2009 · January 1, 2009 We present DIADS, an integrated DIAgnosis tool for Databases and Storage area networks (SANs). Existing diagnosis tools in this domain have a database-only (e.g., [11]) or SAN-only (e.g., [28]) focus. DIADS is a first-of-a-kind framework based on a careful ... Cite

Reflective control for an elastic cloud application: An automated experiment workbench

Conference Workshop on Hot Topics in Cloud Computing, HotCloud 2009 · January 1, 2009 This paper addresses “reflective” control for applications that use server resources from a shared cloud infrastructure opportunistically. In this approach, an external reflective controller launches application functions based on knowledge of what resourc ... Cite

Automated experiment-driven management of (database) systems

Conference Proceedings of HotOS 2009 - 12th Workshop on Hot Topics in Operating Systems · January 1, 2009 In this position paper, we argue that an important piece of the system administration puzzle has largely been left untouched by researchers. This piece involves mechanisms and policies to identify as well as collect missing instrumentation data; the missin ... Cite

DIADS: A problem diagnosis tool for databases and storage area networks

Journal Article Proceedings of the VLDB Endowment · January 1, 2009 Many enterprise environments have databases running on network-attached storage infrastructure (referred toas Storage Area Networks or SANs). Both the database and the SAN are complex subsystems that are managed by separate teams of administrators. As ofte ... Full text Cite

Large-scale uncertainty management systems: Learning and exploiting your data

Conference SIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems · January 1, 2009 The database community has made rapid strides in capturing, representing, and querying uncertain data. Probabilistic databases capture the inherent uncertainty in derived tuples as probability estimates. Data acquisition and stream systems can produce succ ... Full text Cite

Tuning database configuration parameters with ituned

Journal Article Proceedings of the VLDB Endowment · January 1, 2009 Database systems have a large number of configuration parameters that control memory distribution, I/O optimization, costing of query plans, parallelism, many aspects of logging, recovery, and other behavior. Regular users and even expert database administ ... Full text Cite

Modeling and exploiting query interactions in database systems

Conference International Conference on Information and Knowledge Management, Proceedings · December 1, 2008 The typical workload in a database system consists of a mixture of multiple queries of different types, running concurrently and interacting with each other. Hence, optimizing performance requires reasoning about query mixes and their interactions, rather ... Full text Cite

Finding good configurations in high-dimensional spaces: Doing more with less

Journal Article 2008 IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS · December 1, 2008 Manually tuning tens to hundreds of configuration parameters in a complex software system like a database or an application server is an arduous task. Recent work has looked into automated approaches for recommending good configuration settings that adopti ... Full text Cite

QShuffler: Getting the query mix right

Journal Article Proceedings - International Conference on Data Engineering · October 1, 2008 The typical workload in a database system consists of a mixture of multiple queries of different types, running concurrently and interacting with each other. Hence, optimizing performance requires reasoning about query mixes and their interactions, rather ... Full text Cite

Processing diagnosis queries: A principled and scalable approach

Conference Proceedings - International Conference on Data Engineering · October 1, 2008 Many popular Web sites suffer occasional user-visible problems such as slow responses, blank pages or error messages being displayed, items not being added to shopping carts, database slowdowns, and others. Such deviations of systems from desired behavior, ... Full text Cite

Guided problem diagnosis through active learning

Journal Article 5th International Conference on Autonomic Computing, ICAC 2008 · September 18, 2008 There is widespread interest today in developing tools that can diagnose the cause of a system failure accurately and efficiently based on monitoring data collected from the system. Over time, the system monitoring data will contain two types of failure da ... Full text Cite

Cutting corners: Workbench automation for server benchmarking

Conference Proceedings of the 2008 USENIX Annual Technical Conference, USENIX 2008 · January 1, 2008 A common approach to benchmarking a server is to measure its behavior under load from a workload generator. Often a set of such experiments is required—perhaps with different server configurations or workload parameters—to obtain a statistically sound resu ... Cite

Empirical comparison of techniques for automated failure diagnosis

Conference 3rd Workshop on Tackling Computer Systems Problems with Machine Learning Techniques, SysML 2008 · January 1, 2008 Automated techniques to diagnose the cause of system failures based on monitoring data is an active area of research at the intersection of systems and machine learning. In this paper, we identify three tasks that form key building blocks in automated diag ... Cite

Toward self-healing multitier services

Journal Article Proceedings - International Conference on Data Engineering · December 1, 2007 Are self-healing database-centric multitier services utopia or just a hard puzzle? We argue for the latter and aim to identify the missing pieces of this puzzle. We advocate robust and scalable learning-based approaches to self-healing that we expect to wo ... Full text Cite

Query suspend and resume

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · October 30, 2007 Suppose a long-running analytical query is executing on a database server and has been allocated a large amount of physical memory. A high-priority task comes in and we need to run it immediately with all available resources. We have several choices. We co ... Full text Cite

Automated and on-demand provisioning of virtual machines for database applications

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · October 30, 2007 Utility computing delivers compute and storage resources to applications as an 'on-demand utility', much like electricity, from a distributed collection of computing resources. There is great interest in running database applications on utility resources ( ... Full text Cite

On suspending and resuming dataflows

Journal Article Proceedings - International Conference on Data Engineering · September 24, 2007 Full text Cite

Towards an autonomic computing testbed

Conference Proceedings of the 2nd International Workshop on Hot Topics in Autonomic Computing, HotAC 2007, Held in conjunction with ICAC 2007 · January 1, 2007 This paper introduces Automat, a testbed architecture and prototype for research in autonomic services and hosting centers. Automat is an interactive web-based laboratory in which users allocate resources from an ondemand server cluster to experiment with ... Cite

Processing forecasting queries

Conference 33rd International Conference on Very Large Data Bases, VLDB 2007 - Conference Proceedings · January 1, 2007 Forecasting future events based on historic data is useful in many domains like system management, adaptive query processing, environmental monitoring, and financial planning. We describe the Fa system where users and applications can pose declarative fore ... Cite

Proactive identification of performance problems

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · December 1, 2006 We propose to demonstrate Fa, an automated tool for timely and accurate prediction of Service-Level-Agreement (SLA) violations caused by performance problems in database systems. Fa periodically collects performance data at three levels: applications, data ... Full text Cite

Learning application models for utility resource planning

Journal Article Proceedings - 3rd International Conference on Autonomic Computing, ICAC 2006 · December 1, 2006 Shared computing utilities allocate compute, network, and storage resources to competing applications on demand. An awareness of the demands and behaviors of the hosted applications can help the system to manage its resources more effectively. This paper p ... Cite

The CQL continuous query language: Semantic foundations and query execution

Journal Article VLDB Journal · June 1, 2006 CQL, a continuous query language, is supported by the STREAM prototype data stream management system (DSMS) at Stanford. CQL is an expressive SQL-based declarative language for registering continuous queries against streams and stored relations. We begin b ... Full text Cite

Active and accelerated learning of cost models for optimizing scientific applications

Conference VLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases · January 1, 2006 We present the NIMO system that automatically learns cost models for predicting the execution time of computational-science applications running on large-scale networked utilities such as computational grids. Accurate cost models are important for selectin ... Cite

Adaptive caching for continuous queries

Journal Article Proceedings - International Conference on Data Engineering · December 12, 2005 We address the problem of executing continuous multiway join queries in unpredictable and volatile environments. Our query class captures windowed join queries in data stream systems as well as conventional maintenance of materialized join views. Our adapt ... Full text Cite

The pipelined set cover problem

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · December 1, 2005 A classical problem in query optimization is to find the optimal ordering of a set of possibly correlated selections. We provide an abstraction of this problem as a generalization of set cover called pipelined set cover, where the sets are applied sequenti ... Full text Cite

Content-based routing: Different plans for different data

Journal Article VLDB 2005 - Proceedings of 31st International Conference on Very Large Data Bases · December 1, 2005 Query optimizers in current database systems are designed to pick a single efficient plan for a given query based on current statistical properties of the data. However, different subsets of the data can sometimes have very different statistical properties ... Cite

Adaptive query processing in the looking glass

Journal Article 2nd Biennial Conference on Innovative Data Systems Research, CIDR 2005 · December 1, 2005 A great deal of work on adaptive query processing has been done over the last few years: Adaptive query processing has been used to detect and correct optimizer errors due to incorrect statistics or simplified cost metrics; it has been applied to long-runn ... Cite

Proactive re-optimization with Rio

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · December 1, 2005 Traditional query optimizers rely' on the accuracy of estimated statistics of intermediate subexpressions to choose good query execution plans, This design often leads to suboptiinal plan choices for complex queries since errors in estimates grow exponenti ... Full text Cite

Proactive re-optimization

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · December 1, 2005 Traditional query optimizers rely on the accuracy of estimated statistics to choose good execution plans. This design often leads to suboptimal plan choices for complex queries, since errors in estimates for intermediate subexpressions grow exponentially i ... Full text Cite

Operator scheduling in data stream systems

Journal Article VLDB Journal · December 1, 2004 In many applications involving continuous data streams, data arrival is bursty and data rate fluctuates over time. Systems that seek to give rapid or real-time query responses in such an environment must be prepared to deal gracefully with bursts in data a ... Full text Cite

Exploiting k-constraints to reduce memory overhead in continuous queries over data streams

Journal Article ACM Transactions on Database Systems · September 1, 2004 Continuous queries often require significant run-time state over arbitrary data streams. However, streams may exhibit certain data or arrival patterns, or constraints, that can be detected and exploited to reduce state considerably without compromising cor ... Full text Cite

Streamon: An adaptive engine for stream query processing

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · July 27, 2004 StreaMon is the adaptive query processing engine of the STREAM prototype Data Stream Management System (DSMS). A fundamental challenge in many DSMS applications (e.g., network monitoring, financial monitoring over stock tickers, sensor processing) is that ... Cite

Characterizing memory requirements for queries over continuous data streams

Journal Article ACM Transactions on Database Systems · March 1, 2004 This article deals with continuous conjunctive queries with arithmetic comparisons and optional aggregation over multiple data streams. An algorithm is presented for determining whether or not any given query can be evaluated using a bounded amount of memo ... Full text Cite

CQL: A language for continuous queries over streams and relations

Journal Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2004 Despite the recent surge of research in query processing over data streams, little attention has been devoted to defining precise semantics for continuous queries over streams. We first present an abstract semantics based on several building blocks: formal ... Full text Cite

Adaptive ordering of pipelined stream filters

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · January 1, 2004 We consider the problem of pipelined filters, where a continuous stream of tuples is processed by a set of commutative filters. Pipelined filters are common in stream applications and capture a large class of multiway stream joins. We focus on the problem ... Full text Cite

Stream: The Stanford Stream Data Manager (Demonstration Description)

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · December 1, 2003 Cite

Chain: Operator Scheduling for Memory Minimization in Data Stream Systems

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · December 1, 2003 In many applications involving continuous data streams, data arrival is bursty and data rate fluctuates over time. Systems that seek to give rapid or real-time query responses in such an environment must be prepared to deal gracefully with bursts in data a ... Cite

Models and issues in data stream systems

Journal Article Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems · January 1, 2002 In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data stream ... Full text Cite

Characterizing memory requirements for queries over continuous data streams

Journal Article Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems · January 1, 2002 We consider conjunctive queries with arithmetic comparisons over multiple continuous data streams. We specify an algorithm for determining whether or not a query can be evaluated using a bounded amount of memory for all possible instances of the data strea ... Full text Cite

SPARTAN: A model-based semantic compression system for massive data tables

Journal Article Proceedings of the ACM SIGMOD International Conference on Management of Data · September 29, 2001 While a variety of lossy compression schemes have been developed for certain forms of digital data (e.g., images, audio, video), the area of lossy compression techniques for arbitrary data tables has been left relatively un-explored. Nevertheless, such tec ... Cite

A distributed real-time MAC protocol for WDM-based LANs

Journal Article Computer Communications · April 1, 2001 In this paper, we consider the problem of developing a MAC layer protocol to support real-time as well as non-real-time packet streams in Broadcast and Select WDM optical networks having rapidly tunable transmitters. We propose a MAC layer protocol which i ... Full text Cite

Continuous queries over data streams

Journal Article SIGMOD Record (ACM Special Interest Group on Management of Data) · January 1, 2001 In many recent applications, data may take the form of continuous data streams, rather than finite stored data sets. Several aspects of data management need to be reconsidered in the presence of data streams, offering a new research direction for the datab ... Full text Cite

SPARTAN: A model-based semantic compression system for massive data tables

Journal Article SIGMOD Record (ACM Special Interest Group on Management of Data) · January 1, 2001 While a variety of lossy compression schemes have been developed for certain forms of digital data (e.g., images, audio, video), the area of lossy compression techniques for arbitrary data tables has been left relatively unexplored. Nevertheless, such tech ... Full text Cite

Foreword

Journal Article Respiratory Care Clinics of North America · January 1, 2001 Full text Cite

Foreword

Journal Article Social Science and Medicine · January 1, 1986 Full text Cite