Scholars@Duke publication: AccPar: Tensor partitioning for heterogeneous deep learning accelerators

AccPar: Tensor partitioning for heterogeneous deep learning accelerators

Publication , Conference

Song, L; Chen, F; Zhuo, Y; Qian, X; Li, H; Chen, Y

Published in: Proceedings 2020 IEEE International Symposium on High Performance Computer Architecture Hpca 2020

February 1, 2020

Deep neural network (DNN) accelerators as an example of domain-specific architecture have demonstrated great success in DNN inference. However, the architecture acceleration for equally important DNN training has not yet been fully studied. With data forward, error backward and gradient calculation, DNN training is a more complicated process with higher computation and communication intensity. Because the recent research demonstrates a diminishing specialization return, namely, 'accelerator wall', we believe that a promising approach is to explore coarse-grained parallelism among multiple performance-bounded accelerators to support DNN training. Distributing computations on multiple heterogeneous accelerators to achieve high throughput and balanced execution, however, remaining challenging. We present AccPar, a principled and systematic method of determining the tensor partition among heterogeneous accelerator arrays. Compared to prior empirical or unsystematic methods, AccPar considers the complete tensor partition space and can reveal previously unknown new parallelism configurations. AccPar optimizes the performance based on a cost model that takes into account both computation and communication costs of a heterogeneous execution environment. Hence, our method can avoid the drawbacks of existing approaches that use communication as a proxy of the performance. The enhanced flexibility of tensor partitioning in AccPar allows the flexible ratio of computations to be distributed among accelerators with different performances. The proposed search algorithm is also applicable to the emerging multi-path patterns in modern DNNs such as ResNet. We simulate AccPar on a heterogeneous accelerator array composed of both TPU-v2 and TPU-v3 accelerators for the training of large-scale DNN models such as Alexnet, Vgg series, and Resnet series. The average performance improvements of the state-of-the-art 'one weird trick' (OWT) and HYPAR, and AccPar, normalized to the baseline data parallelism scheme where each accelerator replicates the model and processes different input data in parallel, are 2.98×, 3.78×, and 6.30×, respectively.

Duke Scholars

Author Hai "Helen" Li Electrical and Computer Engineering

Author Yiran Chen Electrical and Computer Engineering

Published In

Proceedings 2020 IEEE International Symposium on High Performance Computer Architecture Hpca 2020

DOI

10.1109/HPCA47549.2020.00036

Publication Date

February 1, 2020

Start / End Page

342 / 355

Citation

APA

Chicago

ICMJE

MLA

NLM

Song, L., Chen, F., Zhuo, Y., Qian, X., Li, H., & Chen, Y. (2020). AccPar: Tensor partitioning for heterogeneous deep learning accelerators. In Proceedings 2020 IEEE International Symposium on High Performance Computer Architecture Hpca 2020 (pp. 342–355). https://doi.org/10.1109/HPCA47549.2020.00036

Song, L., F. Chen, Y. Zhuo, X. Qian, H. Li, and Y. Chen. “AccPar: Tensor partitioning for heterogeneous deep learning accelerators.” In Proceedings 2020 IEEE International Symposium on High Performance Computer Architecture Hpca 2020, 342–55, 2020. https://doi.org/10.1109/HPCA47549.2020.00036.

Song L, Chen F, Zhuo Y, Qian X, Li H, Chen Y. AccPar: Tensor partitioning for heterogeneous deep learning accelerators. In: Proceedings 2020 IEEE International Symposium on High Performance Computer Architecture Hpca 2020. 2020. p. 342–55.

Song, L., et al. “AccPar: Tensor partitioning for heterogeneous deep learning accelerators.” Proceedings 2020 IEEE International Symposium on High Performance Computer Architecture Hpca 2020, 2020, pp. 342–55. Scopus, doi:10.1109/HPCA47549.2020.00036.

Song L, Chen F, Zhuo Y, Qian X, Li H, Chen Y. AccPar: Tensor partitioning for heterogeneous deep learning accelerators. Proceedings 2020 IEEE International Symposium on High Performance Computer Architecture Hpca 2020. 2020. p. 342–355.

Published In

Proceedings 2020 IEEE International Symposium on High Performance Computer Architecture Hpca 2020

DOI

10.1109/HPCA47549.2020.00036

Publication Date

February 1, 2020

Start / End Page

342 / 355