ConferenceACM International Conference Proceeding Series · June 9, 2024
Declarative querying is a cornerstone of the success and longevity of database systems, yet it is challenging for novice learners accustomed to different coding paradigms. The transition is further hampered by a lack of query debugging tools compared to th ...
Full textCite
Journal ArticleProceedings of the ACM on Management of Data · May 29, 2024
We describe a system called Qr-Hint that, given a (correct) target query Q* and a (wrong) working query Q, both expressed in SQL, provides actionable hints for the user to fix the working query so that it becomes semantically equivalent to the targ ...
Full textCite
Journal ArticleVaccine · April 2024
We present VaxConcerns, a taxonomy for vaccine concerns and misinformation. VaxConcerns is an easy-to-teach taxonomy of concerns and misinformation commonly found among online anti-vaccination media and is evaluated to produce high-quality data annotations ...
Full textCite
ConferenceLeibniz International Proceedings in Informatics, LIPIcs · March 1, 2024
We are given a set Z = {(R1, s1), ..., (Rn, sn)}, where each Ri is a range in Rd, such as rectangle or ball, and si ∈ [0, 1] denotes its selectivity. The goal is to compute a small-size discrete data distribution D = {(q1, w1), ..., (qm, wm)}, where qj ∈ R ...
Full textCite
ConferenceLeibniz International Proceedings in Informatics, LIPIcs · March 1, 2024
Data analytics skills have become an indispensable part of any education that seeks to prepare its students for the modern workforce. Essential in this skill set is the ability to work with structured relational data. Relational queries are based on logic ...
Full textCite
ConferenceFindings of the Association for Computational Linguistics: NAACL 2024 - Findings · January 1, 2024
One way to personalize chatbot interactions is by establishing common ground with the intended reader. A domain where establishing mutual understanding could be particularly impactful is vaccine concerns and misinformation. Vaccine interventions are forms ...
Cite
ConferenceProceedings of the ACM SIGMOD International Conference on Management of Data · June 4, 2023
Example database instances can be very helpful in understanding complex queries. Different examples may illustrate alternative situations in which answers emerge in the query results and can be useful for testing. Examples can also help reveal semantic dif ...
Full textCite
ConferenceConference on Human Factors in Computing Systems - Proceedings · April 19, 2023
Human data labeling is an important and expensive task at the heart of supervised learning systems. Hierarchies help humans understand and organize concepts. We ask whether and how concept hierarchies can inform the design of annotation interfaces to impro ...
Full textCite
ConferenceProceedings of the ACM SIGMOD International Conference on Management of Data · June 10, 2022
This paper explores the use of machine learning for estimating the selectivity of range queries in database systems. Using classic learning theory for real-valued functions based on shattering dimension, we show that the selectivity function of a range spa ...
Full textCite
ConferenceProceedings of the ACM SIGMOD International Conference on Management of Data · June 10, 2022
This paper studies multi-way join queries over temporal data, where each tuple is associated with a valid time interval indicating when the tuple is valid. A temporal join requires that joining tuples' valid intervals intersect. Previous work on temporal j ...
Full textCite
ConferenceProceedings of the ACM SIGMOD International Conference on Management of Data · June 10, 2022
A powerful way to understand a complex query is by observing how it operates on data instances. However, specific database instances are not ideal for such observations: they often include large amounts of superfluous details that are not only irrelevant t ...
Full textCite
Journal ArticleLeibniz International Proceedings in Informatics, LIPIcs · July 1, 2021
This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of n points A,B in ℝd, a metric φ(·), and a distance threshold r > 0, report all pairs of points (a, b) ∈ A × B with φ(a, b) ≤ r. Our goal is to store ...
Full textCite
Journal ArticleProceedings - International Conference on Data Engineering · April 1, 2021
A way of finding interesting or exceptional records from instant-stamped temporal data is to consider their "durability, "or, intuitively speaking, how well they compare with other records that arrived earlier or later, and how long they retain their supre ...
Full textCite
Journal ArticleProceedings of the ACM SIGMOD International Conference on Management of Data · January 1, 2021
We consider a class of queries called durability prediction queries that arise commonly in predictive analytics, where we use a given predictive model to answer questions about possible futures to inform our decisions. Examples of durability prediction que ...
Full textCite
ConferenceSenSys 2020 - Proceedings of the 2020 18th ACM Conference on Embedded Networked Sensor Systems · November 16, 2020
Physical distancing between individuals is key to preventing the spread of a disease such as COVID-19. On the one hand, having access to information about physical interactions is critical for decision makers; on the other, this information is sensitive an ...
Full textCite
Journal ArticleProceedings of the VLDB Endowment · January 1, 2020
We study the optimization problem of selecting numerical quantities to clean in order to fact-check claims based on such data. Oftentimes, such claims are technically correct, but they can still mislead for two reasons. First, data may contain uncertainty ...
Full textCite
ConferenceProceedings of the VLDB Endowment · January 1, 2020
We study the problem of efficiently estimating counts for queries involving complex filters, such as user-defined functions, or predicates involving self-joins and correlated subqueries. For such queries, traditional sampling techniques may not be applicab ...
Full textCite
Journal ArticleProceedings of the VLDB Endowment · January 1, 2020
We demonstrate I-REX1, a system designed to help users understand SQL query evaluation and debug SQL queries. I-REX lets users interactively “trace” the evaluation of complex SQL queries, including those with correlated subqueries. I-REX also explains why ...
Full textCite
ConferenceProceedings. ACM-SIGMOD International Conference on Management of Data · June 2019
We present a system called RATEST, designed to help debug relational queries against reference queries and test database instances. In many applications, e.g., classroom learning and regression testing, we test the correctness of a user query Q by e ...
Full textCite
Journal ArticleProceedings. ACM-SIGMOD International Conference on Management of Data · June 2019
For testing the correctness of SQL queries, e.g., evaluating student submissions in a database course, a standard practice is to execute the query in question on some test database instance and compare its result with that of the correct query. Given two q ...
Full textCite
ConferenceProceedings. ACM-SIGMOD International Conference on Management of Data · June 2018
Methods for summarizing and diversifying query results have drawn significant attention recently, because they help present query results with lots of tuples to users in more informative ways. We present QAGView (Quick AGgregate View), which provides a hol ...
Full textCite
Journal ArticleProceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems · June 2018
We investigate the complexity of computing an optimal repair of an inconsistent database, in the case where integrity constraints are Functional Dependencies (FDs). We focus on two types of repairs: an optimal subset repair (optimal S-repair) that is obtai ...
Full textCite
ConferenceProceedings of the VLDB Endowment · January 1, 2018
We present a system for summarization and interactive exploration of high-valued aggregate query answers to make a large set of possible answers more informative to the user. Our system outputs a set of clusters on the high-valued query answers showing the ...
Full textCite
ConferenceProceedings of the VLDB Endowment · January 1, 2018
Many datasets have a temporal dimension and contain a wealth of historical information. When using such data to make decisions, we often want to examine not only the current snapshot of the data but also its history. For example, given a result object of a ...
Full textCite
Journal ArticleProceedings of the VLDB Endowment · January 1, 2018
Estimation of the accuracy of a large-scale knowledge graph (KG) often requires humans to annotate samples from the graph. How to obtain statistically meaningful estimates for accuracy evaluation while keeping human annotation costs low is a problem critic ...
Full textCite
ConferenceProceedings - International Conference on Data Engineering · May 16, 2017
Log-structure merge (LSM) is an increasingly prevalent approach to indexing, especially for modern writeheavy workloads. LSM organizes data in levels with geometrically increasing sizes. Records enter the top level; whenever a level fills up, it is merged ...
Full textCite
ConferenceProceedings of the ACM SIGMOD International Conference on Management of Data · May 9, 2017
Iceberg queries, commonly used for decision support, find groups whose aggregate values are above or below a threshold. In practice, iceberg queries are often posed over complex joins that are expensive to evaluate. This paper proposes a framework for comb ...
Full textCite
ConferenceProceedings of the ACM SIGMOD International Conference on Management of Data · May 9, 2017
Large-scale data analytics using statistical machine learning (ML), popularly called advanced analytics, underpins many modern data-driven applications. The data management community has been working for over a decade on tackling data management-related ch ...
Full textCite
Journal ArticleACM Transactions on Database Systems · January 1, 2017
Our media is saturated with claims of "facts" made from data. Database research has in the past focused on how to answer queries, but has not devotedmuch attention to discerningmore subtle qualities of the resulting claims, for example, is a claim "cherry- ...
Full textCite
ConferenceProceedings of the VLDB Endowment · January 1, 2017
In many applications, the system needs to selectively present a small subset of answers to users. The set of all possible answers can be seen as an elevation surface over a domain, where the elevation measures the quality of each answer, and the dimensions ...
Full textCite
ConferenceProceedings of the VLDB Endowment · January 1, 2017
We present a system called Cümülön-D for matrix-based data analysis in a spot market of a public cloud. Prices in such markets fluctuate over time: while users can acquire machines usually at a very low bid price, the cloud can terminate these machines as ...
Full textCite
Journal ArticleIEEE Transactions on Knowledge and Data Engineering · February 1, 2016
Given a set of objects O, each with d numeric attributes, a top-k preference scores these objects using a linear combination of their attribute values, where the weight on each attribute reflects the interest in this attribute. Given a query preference q, ...
Full textCite
Chapter · January 1, 2016
We describe Cümülön, a system aimed at helping users develop and deploy matrix-based data analysis programs in a public cloud. A key feature of Cümülön is its end-to-end support for the so-called spot instances-machines whose market price fluctuates over t ...
Cite
Chapter · January 1, 2015
The most effective way to explore data is through visualizing the results of exploration queries. For example, an exploration query could be an aggregate of some measures over time intervals, and a pattern or abnormality can be discovered through a time se ...
Full textCite
ConferenceProceedings of the VLDB Endowment · January 1, 2015
We present a system, Perada, for parallel perturbation analysis of database queries. Perturbation analysis considers the results of a query evaluated with (a typically large number of) different parameter settings, to help discover leads and evaluate claim ...
Full textCite
Journal ArticleProceedings of the VLDB Endowment · January 1, 2014
Our news are saturated with claims of "facts" made from data.Database research has in the past focused on how to answer queries,but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim "cherry-picking ...
Full textCite
Journal ArticleProceedings - International Conference on Data Engineering · January 1, 2014
Given a set of objects O, each with d numeric attributes, a top-k preference scores these objects using a linear combination of their attribute values, where the weight on each attribute reflects the interest in this attribute. Given a query preference q, ...
Full textCite
Journal ArticleProceedings - International Conference on Data Engineering · January 1, 2014
We study the novel problem of finding new, prominent situational facts, which are emerging statements about objects that stand out within certain contexts. Many such facts are newsworthy - e.g., an athlete's outstanding performance in a game, or a viral vi ...
Full textCite
Journal ArticleProceedings of the ACM SIGMOD International Conference on Management of Data · January 1, 2014
Are you fed up with "lies, d - ned lies, and statistics" made up from data in our media? For claims based on structured data, we present a system to automatically assess the quality of claims (beyond their correctness) and counter misleading claims that ch ...
Full textCite
Journal ArticleProceedings of the VLDB Endowment · January 1, 2014
Towards computational journalism, we present FactWatcher, a system that helps journalists identify data-backed, attention-seizing facts which serve as leads to news stories. FactWatcher discovers three types of facts, including situational facts, one-of-th ...
Full textCite
Journal ArticleProceedings of the ACM SIGMOD International Conference on Management of Data · July 29, 2013
We present Cumulon, a system designed to help users rapidly develop and intelligently deploy matrix-based big-data analysis programs in the cloud. Cumulon features a flexible execution model and new operators especially suited for such workloads. We show h ...
Full textCite
Journal ArticleIEEE Transactions on Knowledge and Data Engineering · April 8, 2013
Wireless sensor networks are widely used to continuously collect data from the environment. Because of energy constraints on battery-powered nodes, it is critical to minimize communication. Suppression has been proposed as a way to reduce communication by ...
Full textCite
Journal ArticleComputational Geometry: Theory and Applications · April 1, 2013
We present external memory data structures for efficiently answering range-aggregate queries. The range-aggregate problem is defined as follows: Given a set of weighted points in Rd, compute the aggregate of the weights of the points that lie inside a d-di ...
Full textCite
Journal ArticleProceedings of the VLDB Endowment · January 1, 2013
Permutation is a fundamental operator for array data, with applications in, for example, changing matrix layouts and reorganizing data cubes. We consider the problem of permuting large quantities of data stored on secondary storage that supports fast rando ...
Full textCite
Journal ArticleACM International Conference Proceeding Series · December 19, 2012
Solid-state drives are becoming a viable alternative to magnetic disks in database systems, but their performance characteristics, particularly those caused by their erase-before-write behavior, make conventional database indexes a poor fit. There have bee ...
Full textCite
Journal ArticleProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining · September 14, 2012
Objects with multiple numeric attributes can be compared within any "subspace" (subset of attributes). In applications such as computational journalism, users are interested in claims of the form: Karl Malone is one of the only two players in NBA history w ...
Full textCite
Journal ArticleIEEE Transactions on Knowledge and Data Engineering · August 29, 2012
We study the problem of assigning subscribers to brokers in a wide-area content-based publish/subscribe system. A good assignment should consider both subscriber interests in the event space and subscriber locations in the network space, and balance multip ...
Full textCite
Journal ArticleProceedings - International Conference on Data Engineering · July 30, 2012
We consider how to support a large number of users over a wide-area network whose interests are characterised by range top-k continuous queries. Given an object update, we need to notify users whose top-k results are affected. Simple solutions include usin ...
Full textCite
Journal ArticleProceedings of the ACM SIGMOD International Conference on Management of Data · June 28, 2012
Given a set of objects, each with multiple numeric attributes, a (preference) top-k query retrieves the k objects with the highest scores according to a user preference, defined as a linear combination of attribute values. We consider the problem of proces ...
Full textCite
Journal ArticleProceedings of the VLDB Endowment · January 1, 2012
Big array analytics is becoming indispensable in answering important scientific and business questions. Most analysis tasks consist of multiple steps, each making one or multiple passes over the arrays to be analyzed and generating intermediate results. In ...
Full textCite
Journal ArticleFoundations and Trends in Databases · December 1, 2011
Materialized views are queries whose results are stored and maintained in order to facilitate access to data in their underlying base tables. In the SQL setting, they are now considered a mature technology implemented by most commercial database systems an ...
Full textCite
Journal ArticleEcological applications : a publication of the Ecological Society of America · July 2011
Recent developments suggest that predictive modeling could begin to play a larger role not only for data analysis, but also for data collection. We address the example of efficient wireless sensor networks, where inferential ecosystem models can be used to ...
Full textCite
Journal ArticleProceedings - International Conference on Data Engineering · June 6, 2011
We study the problem of assigning subscribers to brokers in a wide-area content-based publish/subscribe system. A good assignment should consider both subscriber interests in the event space and subscriber locations in the network space, and balance multip ...
Full textCite
ConferenceProceedings of the VLDB Endowment · January 1, 2011
We consider the problem of storing arrays on disk to support scalable data analysis involving linear algebra. We propose Linearized Array B-tree, or LAB-tree, which supports flexible array layouts and automatically adapts to varying sparsity across parts o ...
Full textCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · August 3, 2010Cite
Journal ArticleProceedings - International Conference on Data Engineering · June 1, 2010
Statistical analysis of massive data is becoming indispensable to science, commerce, and society today. Such analysis requires efficient, flexible storage support and special optimization techniques. In this demo, we present RIOT (R with I/O Transparency), ...
Full textCite
Journal ArticleSIGMOD-PODS'09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems · December 4, 2009
Most information extraction (IE) approaches have considered only static text corpora, over which we apply IE only once. Many real-world text corpora however are dynamic. They evolve over time, and so to keep extracted information up to date we often must a ...
Full textCite
Journal ArticleCIDR 2009 - 4th Biennal Conference on Innovative Data Systems Research · December 1, 2009
R is a numerical computing environment that is widely popular for statistical data analysis. Like many such environments, R performs poorly for large datasets whose sizes exceed that of physical memory. We present our vision of RIOT (R with I/O Transparenc ...
Cite
Journal ArticleACM Transactions on Database Systems · August 1, 2009
This article considers the problem of scalably processing a large number of continuous queries. Our approach, consisting of novel data structures and algorithms and a flexible processing framework, advances the state-of-the-art in several ways. First, our ...
Full textCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · July 15, 2009Cite
Journal ArticleProceedings - International Conference on Data Engineering · July 8, 2009
We consider the problem of efficiently computing weighted proximity best-joins over multiple lists, with applications in information retrieval and extraction. We are given a multi-term query, and for each query term, a list of all its matches with scores, ...
Full textCite
Journal ArticleProceedings of the ACM SIGMOD International Conference on Management of Data · December 10, 2008
We demonstrate ProSem, a scalable wide-area publish/subscribe system that supports complex, stateful subscriptions as well as simple ones. One unique feature of ProSem is its cost-based joint optimization of both subscription processing and notification di ...
Full textCite
Journal Article5th International Workshop on Data Management for Sensor Networks, DMSN'08, In Conjunction with the 34th International Conference on Very Large Data Bases · December 1, 2008Cite
Journal ArticleProceedings - International Conference on Data Engineering · October 1, 2008
There has been a recent resurgence of interest in research on noisy and incomplete data. Many applications require information to be recovered from such data. Ideally, an approach for information recovery should have the following features. First, it shoul ...
Full textCite
Journal ArticleProceedings - International Conference on Data Engineering · October 1, 2008
Most current information extraction (IE) approaches have considered only static text corpora, over which we typically have to apply IE only once. Many real-world text corpora however are dynamic. They evolve over time, and to keep extracted information up ...
Full textCite
Journal ArticleProceedings of the VLDB Endowment · January 1, 2008
We address the problem of supporting a large number of select-join subscriptions for wide-area publish/subscribe. Subscriptions are joins over different tables, with varying interests expressed as range selection conditions over table attributes. Naive sch ...
Full textCite
ConferenceCIDR 2007 - 3rd Biennial Conference on Innovative Data Systems Research · December 1, 2007
Wireless sensor networks are poised to enable continuous data collection on unprecedented scales, in terms of area location and size, and frequency. This is a great boon to fields such as ecological modeling. We are collaborating with researchers to build ...
Cite
Journal ArticleSIGMOD Record · December 1, 2007
A report on the Fourth International Workshop on Data Management for Sensor Networks (DMSN), which was held on September 24, 2007, is presented. The topics presented include a keystone address, three research sessions, panel discussion on the present and t ...
Full textCite
Journal ArticleProceedings of the ACM SIGMOD International Conference on Management of Data · October 30, 2007
Suppose a long-running analytical query is executing on a database server and has been allocated a large amount of physical memory. A high-priority task comes in and we need to run it immediately with all available resources. We have several choices. We co ...
Full textCite
Journal ArticleProceedings of the ACM SIGMOD International Conference on Management of Data · October 30, 2007
Query processing over graph-structured data is enjoying a growing number of applications. A top-k keyword search query on a graph finds the top k answers according to some ranking criteria, where each answer is a substructure of the graph containing all qu ...
Full textCite
Journal ArticleProceedings - International Conference on Data Engineering · September 24, 2007
Wireless sensor networks have enormous potential to aid data collection in a number of areas, such as environmental and wildlife research. In this paper, we address the challenges of supporting many-to-many aggregation in a sensor network. An application o ...
Full textCite
Conference33rd International Conference on Very Large Data Bases, VLDB 2007 - Conference Proceedings · January 1, 2007
Sensor networks allow continuous data collection on unprecedented scales. The primary limiting factor of such networks is energy, of which communication is the dominant consumer. The default strategy of nodes continually reporting their data to the root re ...
Cite
Conference33rd International Conference on Very Large Data Bases, VLDB 2007 - Conference Proceedings · January 1, 2007
We address the problem of providing scalable support for subscriptions with personalized value-based notification conditions in wide-area publish/subscribe systems. Notification conditions can be fine-tuned by subscribers, allowing precise and flexible con ...
Cite
ConferenceLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2007
Wireless sensor networks can be viewed as the integration of three subsystems: a low-impact in situ data acquisition and collection system, a system for inference of process models from observed data and a priori information, and a system that controls the ...
Full textCite
Journal ArticleProceedings of the ACM SIGMOD International Conference on Management of Data · December 1, 2006
Monitoring extreme values (MAX or MIN) is a fundamental problem in wireless sensor networks (and in general, complex dynamic systems). This problem presents very different algorithmic challenges from aggregate and selection queries, in the sense that an in ...
Full textCite
Journal ArticleProceedings of the ACM SIGMOD International Conference on Management of Data · December 1, 2006
Wireless sensor networks have created new opportunities for data collection in a variety of scenarios, such as environmental and industrial, where we expect data to be temporally and spatially correlated. Researchers may want to continuously collect all se ...
Full textCite
Journal ArticleProceedings of the ACM SIGMOD International Conference on Management of Data · December 1, 2006
The work performed by a publish/subscribe system can conceptually be divided into subscription processing and notification dissemination. Traditionally, research in the database and networking communities has focused on these aspects in isolation. The inte ...
Full textCite
Journal ArticleProceedings - International Conference on Data Engineering · October 17, 2006
Wireless sensor networks generate a vast amount of data. This data, however, must be sparingly extracted to conserve energy, usually the most precious resource in battery-powered sensors. When approximation is acceptable, a model-driven approach to query p ...
Full textCite
Journal ArticleProceedings - International Conference on Data Engineering · October 17, 2006
Graph reachability is fundamental to a wide range of applications, including XML indexing, geographic navigation, Internet routing, ontology queries based on RDF/OWL, etc. Many applications involve huge graphs and require fast answering of reachability que ...
Full textCite
Journal ArticleProceedings - International Conference on Data Engineering · October 17, 2006
Environmental monitoring is a promising application for sensor networks. Many scenarios produce geographically correlated readings, making them visually interesting and good targets for the isoline query. This query depicts boundaries showing how values ch ...
Full textCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · July 7, 2006
As networks continue to grow in size and complexity, distributed network monitoring and resource querying are becoming increasingly difficult. Our aim is to design, build, and evaluate a scalable infrastructure for answering queries over distributed measur ...
Full textCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2006
Next-generation wireless sensor networks may revolutionize understanding of environmental change by assimilating heterogeneous data, assessing the relative value and costs of data collection, and scheduling activities accordingly. Thus, they are dynamic, d ...
Full textCite
ConferenceVLDB 2006 - Proceedings of the 32nd International Conference on Very Large Data Bases · January 1, 2006
This paper considers the problem of scalably processing a large number of continuous queries. We propose a flexible framework with novel data structures and algorithms for group-processing and indexing continuous queries by exploiting potential overlaps in ...
Cite
Journal ArticleProceedings - International Conference on Data Engineering · December 12, 2005
Incremental view maintenance has found a growing number of applications recently, including data warehousing, continuous query processing, publish/subscribe systems, etc. Batch processing of base table modifications, when applicable, can be much more effic ...
Full textCite
Journal ArticleProceedings - International Conference on Data Engineering · December 12, 2005
Order-based element labeling for tree-structured XML data is an important technique in XML processing. It lies at the core of many fundamental XML operations such as containment join and twig matching. While labeling for static XML documents is well unders ...
Full textCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · December 1, 2005
A continuous query is a standing query over a dynamic data set whose query result needs to be constantly updated as new data arrive. We consider the problem of constructing a data structure on a set of continuous band-join queries over two data sets R and ...
Full textCite
Journal ArticleProceedings of the ACM SIGMOD International Conference on Management of Data · December 1, 2005
We consider the problem of joining data streams using limited cache memory, with the goal of producing as many result tuples as possible from the cache. Many cache replacement heuristics have been proposed in the past. Their performance often relies on imp ...
Full textCite
Journal ArticleLecture Notes in Computer Science · January 1, 2005
A materialized view is a certain synopsis structure precomputed from one or more data sets (called base tables) in order to facilitate various queries on the data. When the underlying base tables change, the materialized view also needs to be updated accor ...
Full textCite
Journal ArticleInternational Conference on Information and Knowledge Management, Proceedings · January 1, 2005
Testing reachability between nodes in a graph is a well-known problem with many important applications, including knowledge representation, program analysis, and more recently, biological and ontology databases inferencing as well as XML query processing. ...
Full textCite
Journal ArticleProceedings of the International Database Engineering and Applications Symposium, IDEAS · October 25, 2004
The Web has greatly facilitated access to information. However, information presented in HTML is mainly intended to be browsed by humans, and the problem of automatically extracting such information remains an important and challenging task. In this work, ...
Full textCite
Journal ArticleProceedings - International Conference on Data Engineering · June 1, 2004
XML and other types of semi-structured data are typically represented by a labeled directed graph. To speed up path expression queries over the graph, a variety of structural indexes have been proposed. They usually work by partitioning nodes in the data g ...
Cite
Journal ArticleProceedings - International Conference on Data Engineering · June 1, 2004
XML plays an important role in delivering data over the Internet, and the need to store and manipulate XML in its native format has become increasingly relevant. This growing need necessitates work on developing native XML operators, especially for one as ...
Full textCite
Journal ArticleProceedings of the ACM SIGMOD International Conference on Management of Data · January 1, 2004
Increasing popularity of XML in recent years has generated much interest in query processing over graph-structured data. To support officient evaluation of path expressions, many structural indexes have been proposed. The most popular ones are the 1-index, ...
Full textCite
Journal ArticleProceedings - International Conference on Data Engineering · December 2, 2003
We tackle the problem of maintaining materialized top-k views in this paper. Top-k queries, including MIN and MAX as important special cases, occur frequently in common database workloads. A top-k view can be materialized to improve query performance, but ...
Cite
Journal ArticleVLDB Journal · October 1, 2003
We consider the problems of computing aggregation queries in temporal databases and of maintaining materialized temporal aggregate views efficiently. The latter problem is particularly challenging since a single data update can cause aggregate results to c ...
Full textCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2003
We develop several linear or near-linear space and I/O-efficient dynamic data structures for orthogonal range-max queries and stabbing-max queries. Given a set of N weighted points in ℝd, the range-max problem asks for the maximum-weight point in a query h ...
Full textCite
Journal ArticleJournal of Computer Science and Technology · January 1, 2003
The study on database technologies, or more generally, the technologies of data and information management, is an important and active research field. Recently, many exciting results have been reported. In this fast growing field, Chinese researchers play ...
Full textCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2003
Google's successful PageRank brings to the Web an order that well reflects the relative importance of Web pages. Inspired by PageRank, we propose a similar scheme called TupleRank for ranking tuples in a relational database. Database tuples naturally relat ...
Full textCite
Journal ArticleProceedings - International Conference on Data Engineering · January 1, 2001
We consider the problems of computing aggregation queries in temporal databases, and of maintaining materialized temporal aggregate views efficiently. The latter problem is particularly challenging since a single data update can cause aggregate results to ...
Cite
ConferenceProceedings of the 26th International Conference on Very Large Data Bases, VLDB'00 · December 1, 2000
A well-known challenge in data warehousing is the efficient incremental maintenance of warehouse data in the presence of source data updates. In this paper. we identify several critical data representation and algorithmic choices that must be made when dev ...
Cite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 2000
View self-maintenance refers to maintaining materialized views without accessing base data. Self-maintenance is particularly useful in data warehousing settings, where base data comes from sources that may be inaccessible. Self-maintenance has been studied ...
Full textCite
Journal ArticleSIGMOD Record · January 1, 2000
Commercial relational database systems today provide only limited temporal support. To address the needs of applications requiring rich temporal data and queries, we have built TIP (Temporal Information Processor), a temporal extension to the Informix data ...
Full textCite
Journal ArticleLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) · January 1, 1998
An important use of data warehousing is to provide temporal views over the history of source data that may itself be non-temporal. While recent work in view maintenance is applicable to data warehousing, only non-temporal views have been considered. In thi ...
Full textCite