Scholars@Duke publication: Job Recommendation Service for GPU Sharing in Kubernetes

Job Recommendation Service for GPU Sharing in Kubernetes

Publication , Conference

Ray, A; Lafata, K; Zhang, Z; Xiong, Y; Chakrabarty, K

Published in: Proceedings - 2023 IEEE Cloud Summit, Cloud Summit 2023

January 1, 2023

Cloud infrastructures encourage the multi-tenancy of hardware resources. User-defined Machine Learning (ML) training jobs are offloaded to the cloud for efficient training. State-of-the-art resource schedulers do not preserve user privacy by accessing sensitive meta-data of the user-defined training workload. We present the design of a fine-grain, online, privacy-preserving job scheduler built on top of the Kubernetes platform in combination with Argo workflow. We categorize ML training workloads on standard benchmark architectures and datasets over sixty-six different features, cluster them based on exploratory data analysis, and perform inter - and intra-cluster task interference. We assume black-box access to the user-defined ML training jobs and refrain from accessing sensitive meta-data. We define three scheduler-level objectives to maximize gains from users' and cloud providers' perspectives. Our scheduler promotes multi-tenancy by intelligently selecting competitor jobs for concurrent execution in every pod while abiding by scheduler-level objectives.

Duke Scholars

Author Kyle Jon Lafata Radiation Oncology

Published In

Proceedings - 2023 IEEE Cloud Summit, Cloud Summit 2023

DOI

10.1109/CloudSummit57601.2023.00008

Publication Date

January 1, 2023

Start / End Page

7 / 14

Citation

APA

Chicago

ICMJE

MLA

NLM

Ray, A., Lafata, K., Zhang, Z., Xiong, Y., & Chakrabarty, K. (2023). Job Recommendation Service for GPU Sharing in Kubernetes. In Proceedings - 2023 IEEE Cloud Summit, Cloud Summit 2023 (pp. 7–14). https://doi.org/10.1109/CloudSummit57601.2023.00008

Ray, A., K. Lafata, Z. Zhang, Y. Xiong, and K. Chakrabarty. “Job Recommendation Service for GPU Sharing in Kubernetes.” In Proceedings - 2023 IEEE Cloud Summit, Cloud Summit 2023, 7–14, 2023. https://doi.org/10.1109/CloudSummit57601.2023.00008.

Ray A, Lafata K, Zhang Z, Xiong Y, Chakrabarty K. Job Recommendation Service for GPU Sharing in Kubernetes. In: Proceedings - 2023 IEEE Cloud Summit, Cloud Summit 2023. 2023. p. 7–14.

Ray, A., et al. “Job Recommendation Service for GPU Sharing in Kubernetes.” Proceedings - 2023 IEEE Cloud Summit, Cloud Summit 2023, 2023, pp. 7–14. Scopus, doi:10.1109/CloudSummit57601.2023.00008.

Ray A, Lafata K, Zhang Z, Xiong Y, Chakrabarty K. Job Recommendation Service for GPU Sharing in Kubernetes. Proceedings - 2023 IEEE Cloud Summit, Cloud Summit 2023. 2023. p. 7–14.

Published In

Proceedings - 2023 IEEE Cloud Summit, Cloud Summit 2023

DOI

10.1109/CloudSummit57601.2023.00008

Publication Date

January 1, 2023

Start / End Page

7 / 14