Adaptive simultaneous multi-tenancy for GPUs

Conference Paper

Graphics Processing Units (GPUs) are energy-efficient massively parallel accelerators that are increasingly deployed in multi-tenant environments such as data-centers for general-purpose computing as well as graphics applications. Using GPUs in multi-tenant setups requires an efficient and low-overhead method for sharing the device among multiple users that improves system throughput while adapting to the changes in workload. This requires mechanisms to control the resources allocated to each kernel, and an efficient policy to make decisions about this allocation. In this paper, we propose adaptive simultaneous multi-tenancy to address these issues. Adaptive simultaneous multi-tenancy allows for sharing the GPU among multiple kernels, as opposed to single kernel multi-tenancy that only runs one kernel on the GPU at any given time and static simultaneous multi-tenancy that does not adapt to events in the system. Our proposed system dynamically adjusts the kernels’ parameters at run-time when a new kernel arrives or a running kernel ends. Evaluations using our prototype implementation show that, compared to sequentially executing the kernels, system throughput is improved by an average of 9.8% (and up to 22.4%) for combinations of kernels that include at least one low-utilization kernel.

Full Text

Duke Authors

Cited Authors

  • Bashizade, R; Li, Y; Lebeck, AR

Published Date

  • January 1, 2019

Published In

Volume / Issue

  • 11332 LNCS /

Start / End Page

  • 83 - 106

Electronic International Standard Serial Number (EISSN)

  • 1611-3349

International Standard Serial Number (ISSN)

  • 0302-9743

International Standard Book Number 13 (ISBN-13)

  • 9783030106317

Digital Object Identifier (DOI)

  • 10.1007/978-3-030-10632-4_5

Citation Source

  • Scopus