Skip to main content

Thread batching for high-performance energy-efficient GPU memory design

Publication ,  Journal Article
Li, B; Mao, M; Liu, X; Liu, T; Liu, Z; Wen, W; Chen, Y; Li, HH
Published in: ACM Journal on Emerging Technologies in Computing Systems
December 1, 2019

Massive multi-threading in GPU imposes tremendous pressure on memory subsystems. Due to rapid growth in thread-level parallelism of GPU and slowly improved peak memory bandwidth, memory becomes a bottleneck of GPU’s performance and energy efficiency. In this article, we propose an integrated architectural scheme to optimize the memory accesses and therefore boost the performance and energy efficiency of GPU. First, we propose a thread batch enabled memory partitioning (TEMP) to improve GPU memory access parallelism. In particular, TEMP groups multiple thread blocks that share the same set of pages into a thread batch and applies a page coloring mechanism to bound each stream multiprocessor (SM) to the dedicated memory banks. After that, TEMP dispatches the thread batch to an SM to ensure high-parallel memory-access streaming from the different thread blocks. Second, a thread batch-aware scheduling (TBAS) scheme is introduced to improve the GPU memory access locality and to reduce the contention on memory controllers and interconnection networks. Experimental results show that the integration of TEMP and TBAS can achieve up to 10.3% performance improvement and 11.3% DRAM energy reduction across diverse GPU applications. We also evaluate the performance interference of the mixed CPU+GPU workloads when they are run on a heterogeneous system that employs our proposed schemes. Our results show that a simple solution can effectively ensure the efficient execution of both GPU and CPU applications.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

ACM Journal on Emerging Technologies in Computing Systems

DOI

EISSN

1550-4840

ISSN

1550-4832

Publication Date

December 1, 2019

Volume

15

Issue

4

Related Subject Headings

  • Computer Hardware & Architecture
  • 4606 Distributed computing and systems software
  • 1007 Nanotechnology
  • 1006 Computer Hardware
  • 0906 Electrical and Electronic Engineering
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Li, B., Mao, M., Liu, X., Liu, T., Liu, Z., Wen, W., … Li, H. H. (2019). Thread batching for high-performance energy-efficient GPU memory design. ACM Journal on Emerging Technologies in Computing Systems, 15(4). https://doi.org/10.1145/3330152
Li, B., M. Mao, X. Liu, T. Liu, Z. Liu, W. Wen, Y. Chen, and H. H. Li. “Thread batching for high-performance energy-efficient GPU memory design.” ACM Journal on Emerging Technologies in Computing Systems 15, no. 4 (December 1, 2019). https://doi.org/10.1145/3330152.
Li B, Mao M, Liu X, Liu T, Liu Z, Wen W, et al. Thread batching for high-performance energy-efficient GPU memory design. ACM Journal on Emerging Technologies in Computing Systems. 2019 Dec 1;15(4).
Li, B., et al. “Thread batching for high-performance energy-efficient GPU memory design.” ACM Journal on Emerging Technologies in Computing Systems, vol. 15, no. 4, Dec. 2019. Scopus, doi:10.1145/3330152.
Li B, Mao M, Liu X, Liu T, Liu Z, Wen W, Chen Y, Li HH. Thread batching for high-performance energy-efficient GPU memory design. ACM Journal on Emerging Technologies in Computing Systems. 2019 Dec 1;15(4).

Published In

ACM Journal on Emerging Technologies in Computing Systems

DOI

EISSN

1550-4840

ISSN

1550-4832

Publication Date

December 1, 2019

Volume

15

Issue

4

Related Subject Headings

  • Computer Hardware & Architecture
  • 4606 Distributed computing and systems software
  • 1007 Nanotechnology
  • 1006 Computer Hardware
  • 0906 Electrical and Electronic Engineering