Scholars@Duke publication: Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

Publication , Journal Article

Wu, Y; Lentz, M; Zhuo, D; Lu, Y

Published in: Proceedings of the VLDB Endowment

November 1, 2022

With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources for machine learning inference have increasingly moved to the edge of the network. Existing machine learning inference platforms typically assume a homogeneous infrastructure and do not take into account the more complex and tiered computing infrastructure that includes edge devices, local hubs, edge datacenters, and cloud datacenters. On the other hand, recent AutoML efforts have provided viable solutions for model compression, pruning and quantization for heterogeneous environments; for a machine learning model, now we may easily find or even generate a series of model variants with different tradeoffs between accuracy and efficiency. We design and implement JellyBean, a system for serving and optimizing machine learning inference workflows on heterogeneous infrastructures. Given service-level objectives (e.g., throughput, accuracy), JellyBean picks the most cost-efficient models that meet the accuracy target and decides how to deploy them across different tiers of infrastructures. Evaluations show that JellyBean reduces the total serving cost of visual question answering by up to 58% and vehicle tracking from the NVIDIA AI City Challenge by up to 36%, compared with state-of-the-art model selection and worker assignment solutions. JellyBean also outperforms prior ML serving systems (e.g., Spark on the cloud) up to 5x in serving costs.

Duke Scholars

Author Matthew Lentz Computer Science

Author Danyang Zhuo Computer Science

Published In

Proceedings of the VLDB Endowment

DOI

10.14778/3570690.3570692

EISSN

2150-8097

Publication Date

November 1, 2022

Volume

Issue

Start / End Page

406 / 419

Related Subject Headings

4605 Data management and data science
0807 Library and Information Studies
0806 Information Systems
0802 Computation Theory and Mathematics

Citation

APA

Chicago

ICMJE

MLA

NLM

Wu, Y., Lentz, M., Zhuo, D., & Lu, Y. (2022). Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures. Proceedings of the VLDB Endowment, 16(3), 406–419. https://doi.org/10.14778/3570690.3570692

Wu, Y., M. Lentz, D. Zhuo, and Y. Lu. “Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures.” Proceedings of the VLDB Endowment 16, no. 3 (November 1, 2022): 406–19. https://doi.org/10.14778/3570690.3570692.

Wu Y, Lentz M, Zhuo D, Lu Y. Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures. Proceedings of the VLDB Endowment. 2022 Nov 1;16(3):406–19.

Wu, Y., et al. “Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures.” Proceedings of the VLDB Endowment, vol. 16, no. 3, Nov. 2022, pp. 406–19. Scopus, doi:10.14778/3570690.3570692.

Wu Y, Lentz M, Zhuo D, Lu Y. Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures. Proceedings of the VLDB Endowment. 2022 Nov 1;16(3):406–419.

Published In

Proceedings of the VLDB Endowment

DOI

10.14778/3570690.3570692

EISSN

2150-8097

Publication Date

November 1, 2022

Volume

Issue

Start / End Page

406 / 419

Related Subject Headings

4605 Data management and data science
0807 Library and Information Studies
0806 Information Systems
0802 Computation Theory and Mathematics