Explaining Wrong Queries Using Small Examples.

Journal Article (Journal Article)

For testing the correctness of SQL queries, e.g., evaluating student submissions in a database course, a standard practice is to execute the query in question on some test database instance and compare its result with that of the correct query. Given two queries Q 1 and Q 2 , we say that a database instance D is a counterexample (for Q 1 and Q 2 ) if Q 1 (D ) differs from Q 2 (D ); such a counterexample can serve as an explanation of why Q 1 and Q 2 are not equivalent. While the test database instance may serve as a counterexample, it may be too large or complex to read and understand where the inequivalence comes from. Therefore, in this paper, given a known counterexample D for Q 1 and Q 2 , we aim to find the smallest counterexample D'D where Q 1 (D' ) ≠ Q 2 (D' ). The problem in general is NP-hard. We give a suite of algorithms for finding the smallest counterexample for different classes of queries, some more tractable than others. We also present an efficient provenance-based algorithm for SPJUD queries that uses a constraint solver, and extend it to more complex queries with aggregation, group-by, and nested queries. We perform extensive experiments indicating the effectiveness and scalability of our solution on student queries from an undergraduate database course and on queries from the TPC-H benchmark. We also report a user study from the course where we deployed our tool to help students with an assignment on relational algebra.

Full Text

Duke Authors

Cited Authors

  • Miao, Z; Roy, S; Yang, J

Published Date

  • June 2019

Published In

Volume / Issue

  • 2019 /

Start / End Page

  • 503 - 520

PubMed ID

  • 31439982

Pubmed Central ID

  • PMC6705612

International Standard Serial Number (ISSN)

  • 0730-8078

Digital Object Identifier (DOI)

  • 10.1145/3299869.3319866

Language

  • eng