Skip to main content

Data management in machine learning: Challenges, techniques, and systems

Publication ,  Conference
Kumar, A; Boehm, M; Yang, J
Published in: Proceedings of the ACM SIGMOD International Conference on Management of Data
May 9, 2017

Large-scale data analytics using statistical machine learning (ML), popularly called advanced analytics, underpins many modern data-driven applications. The data management community has been working for over a decade on tackling data management-related challenges that arise in ML workloads, and has built several systems for advanced analytics. This tutorial provides a comprehensive review of such systems and analyzes key data management challenges and techniques. We focus on three complementary lines of work: (1) integrating ML algorithms and languages with existing data systems such as RDBMSs, (2) adapting data management-inspired techniques such as query optimization, partitioning, and compression to new systems that target ML workloads, and (3) combining data management and ML ideas to build systems that improve ML lifecycle-related tasks. Finally, we identify key open data management challenges for future research in this important area.

Duke Scholars

Published In

Proceedings of the ACM SIGMOD International Conference on Management of Data

DOI

ISSN

0730-8078

ISBN

9781450341974

Publication Date

May 9, 2017

Volume

Part F127746

Start / End Page

1717 / 1722
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Kumar, A., Boehm, M., & Yang, J. (2017). Data management in machine learning: Challenges, techniques, and systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data (Vol. Part F127746, pp. 1717–1722). https://doi.org/10.1145/3035918.3054775
Kumar, A., M. Boehm, and J. Yang. “Data management in machine learning: Challenges, techniques, and systems.” In Proceedings of the ACM SIGMOD International Conference on Management of Data, Part F127746:1717–22, 2017. https://doi.org/10.1145/3035918.3054775.
Kumar A, Boehm M, Yang J. Data management in machine learning: Challenges, techniques, and systems. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2017. p. 1717–22.
Kumar, A., et al. “Data management in machine learning: Challenges, techniques, and systems.” Proceedings of the ACM SIGMOD International Conference on Management of Data, vol. Part F127746, 2017, pp. 1717–22. Scopus, doi:10.1145/3035918.3054775.
Kumar A, Boehm M, Yang J. Data management in machine learning: Challenges, techniques, and systems. Proceedings of the ACM SIGMOD International Conference on Management of Data. 2017. p. 1717–1722.

Published In

Proceedings of the ACM SIGMOD International Conference on Management of Data

DOI

ISSN

0730-8078

ISBN

9781450341974

Publication Date

May 9, 2017

Volume

Part F127746

Start / End Page

1717 / 1722