Selectivity Functions of Range Queries are Learnable

Conference Paper

This paper explores the use of machine learning for estimating the selectivity of range queries in database systems. Using classic learning theory for real-valued functions based on shattering dimension, we show that the selectivity function of a range space with bounded VC-dimension is learnable. As many popular classes of queries (e.g., orthogonal range search, inequalities involving linear combination of attributes, distance-based search, etc.) represent range spaces with finite VC-dimension, our result immediately implies that their selectivity functions are also learnable. To the best of our knowledge, this is the first attempt at formally explaining the role of machine learning techniques in selectivity estimation, and complements the growing literature in empirical studies in this direction. Supplementing these theoretical results, our experimental results demonstrate that, empirically, even a basic learning algorithm with generic models is able to produce accurate predictions across settings, matching state-of-art methods designed for specific queries, and using training sample sizes commensurate with our theory.

Full Text

Duke Authors

Cited Authors

  • Hu, X; Liu, Y; Xiu, H; Agarwal, PK; Panigrahi, D; Roy, S; Yang, J

Published Date

  • June 10, 2022

Published In

Start / End Page

  • 959 - 972

International Standard Serial Number (ISSN)

  • 0730-8078

International Standard Book Number 13 (ISBN-13)

  • 9781450392495

Digital Object Identifier (DOI)

  • 10.1145/3514221.3517896

Citation Source

  • Scopus