Scholars@Duke publication: Selectivity Functions of Range Queries are Learnable

Selectivity Functions of Range Queries are Learnable

Publication , Conference

Hu, X; Liu, Y; Xiu, H; Agarwal, PK; Panigrahi, D; Roy, S; Yang, J

Published in: Proceedings of the ACM SIGMOD International Conference on Management of Data

June 11, 2022

This paper explores the use of machine learning for estimating the selectivity of range queries in database systems. Using classic learning theory for real-valued functions based on shattering dimension, we show that the selectivity function of a range space with bounded VC-dimension is learnable. As many popular classes of queries (e.g., orthogonal range search, inequalities involving linear combination of attributes, distance-based search, etc.) represent range spaces with finite VC-dimension, our result immediately implies that their selectivity functions are also learnable. To the best of our knowledge, this is the first attempt at formally explaining the role of machine learning techniques in selectivity estimation, and complements the growing literature in empirical studies in this direction. Supplementing these theoretical results, our experimental results demonstrate that, empirically, even a basic learning algorithm with generic models is able to produce accurate predictions across settings, matching state-of-art methods designed for specific queries, and using training sample sizes commensurate with our theory.

Duke Scholars

Author Haibo Xiu

Author Debmalya Panigrahi Computer Science

Author Sudeepa Roy Computer Science

Author Jun Yang Computer Science

Published In

Proceedings of the ACM SIGMOD International Conference on Management of Data

DOI

10.1145/3514221.3517896

ISSN

0730-8078

Publication Date

June 11, 2022

Start / End Page

959 / 972

Citation

APA

Chicago

ICMJE

MLA

NLM

Hu, X., Liu, Y., Xiu, H., Agarwal, P. K., Panigrahi, D., Roy, S., & Yang, J. (2022). Selectivity Functions of Range Queries are Learnable. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 959–972). https://doi.org/10.1145/3514221.3517896

Hu, X., Y. Liu, H. Xiu, P. K. Agarwal, D. Panigrahi, S. Roy, and J. Yang. “Selectivity Functions of Range Queries are Learnable.” In Proceedings of the ACM SIGMOD International Conference on Management of Data, 959–72, 2022. https://doi.org/10.1145/3514221.3517896.

Hu X, Liu Y, Xiu H, Agarwal PK, Panigrahi D, Roy S, et al. Selectivity Functions of Range Queries are Learnable. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2022. p. 959–72.

Hu, X., et al. “Selectivity Functions of Range Queries are Learnable.” Proceedings of the ACM SIGMOD International Conference on Management of Data, 2022, pp. 959–72. Scopus, doi:10.1145/3514221.3517896.

Hu X, Liu Y, Xiu H, Agarwal PK, Panigrahi D, Roy S, Yang J. Selectivity Functions of Range Queries are Learnable. Proceedings of the ACM SIGMOD International Conference on Management of Data. 2022. p. 959–972.

Published In

Proceedings of the ACM SIGMOD International Conference on Management of Data

DOI

10.1145/3514221.3517896

ISSN

0730-8078

Publication Date

June 11, 2022

Start / End Page

959 / 972