Skip to main content

Job Recommendation Service for GPU Sharing in Kubernetes

Publication ,  Conference
Ray, A; Lafata, K; Zhang, Z; Xiong, Y; Chakrabarty, K
Published in: Proceedings 2023 IEEE Cloud Summit Cloud Summit 2023
January 1, 2023

Cloud infrastructures encourage the multi-tenancy of hardware resources. User-defined Machine Learning (ML) training jobs are offloaded to the cloud for efficient training. State-of-the-art resource schedulers do not preserve user privacy by accessing sensitive meta-data of the user-defined training workload. We present the design of a fine-grain, online, privacy-preserving job scheduler built on top of the Kubernetes platform in combination with Argo workflow. We categorize ML training workloads on standard benchmark architectures and datasets over sixty-six different features, cluster them based on exploratory data analysis, and perform inter - and intra-cluster task interference. We assume black-box access to the user-defined ML training jobs and refrain from accessing sensitive meta-data. We define three scheduler-level objectives to maximize gains from users' and cloud providers' perspectives. Our scheduler promotes multi-tenancy by intelligently selecting competitor jobs for concurrent execution in every pod while abiding by scheduler-level objectives.

Duke Scholars

Published In

Proceedings 2023 IEEE Cloud Summit Cloud Summit 2023

DOI

Publication Date

January 1, 2023

Start / End Page

7 / 14
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Ray, A., Lafata, K., Zhang, Z., Xiong, Y., & Chakrabarty, K. (2023). Job Recommendation Service for GPU Sharing in Kubernetes. In Proceedings 2023 IEEE Cloud Summit Cloud Summit 2023 (pp. 7–14). https://doi.org/10.1109/CloudSummit57601.2023.00008
Ray, A., K. Lafata, Z. Zhang, Y. Xiong, and K. Chakrabarty. “Job Recommendation Service for GPU Sharing in Kubernetes.” In Proceedings 2023 IEEE Cloud Summit Cloud Summit 2023, 7–14, 2023. https://doi.org/10.1109/CloudSummit57601.2023.00008.
Ray A, Lafata K, Zhang Z, Xiong Y, Chakrabarty K. Job Recommendation Service for GPU Sharing in Kubernetes. In: Proceedings 2023 IEEE Cloud Summit Cloud Summit 2023. 2023. p. 7–14.
Ray, A., et al. “Job Recommendation Service for GPU Sharing in Kubernetes.” Proceedings 2023 IEEE Cloud Summit Cloud Summit 2023, 2023, pp. 7–14. Scopus, doi:10.1109/CloudSummit57601.2023.00008.
Ray A, Lafata K, Zhang Z, Xiong Y, Chakrabarty K. Job Recommendation Service for GPU Sharing in Kubernetes. Proceedings 2023 IEEE Cloud Summit Cloud Summit 2023. 2023. p. 7–14.

Published In

Proceedings 2023 IEEE Cloud Summit Cloud Summit 2023

DOI

Publication Date

January 1, 2023

Start / End Page

7 / 14