Scholars@Duke publication: INCA: Input-stationary Dataflow at Outside-the-box Thinking about Deep Learning Accelerators

INCA: Input-stationary Dataflow at Outside-the-box Thinking about Deep Learning Accelerators

Publication , Conference

Kim, B; Li, S; Li, H

Published in: Proceedings - International Symposium on High-Performance Computer Architecture

January 1, 2023

This paper first presents an input-stationary (IS) implemented crossbar accelerator (INCA), supporting inference and training for deep neural networks (DNNs). Processing-in-memory (PIM) accelerators for DNNs have been actively researched, specifically, with resistive random-access memory (RRAM), due to RRAM's computing and memorizing capabilities and device merits. To the best of our knowledge, all previous PIM accelerators have saved weights into RRAMs and inputs (activations) into conventional memories - it naturally forms weight-stationary (WS) dataflow. WS has generally been considered the most optimized choice for high parallelism and data reuse. How-ever, WS-based PIM accelerators show fundamental limitations: first, remaining high dependency on DRAM and buffers for fetching and saving inputs (activations); second, a remarkable number of extra RRAMs for transposed weights and additional computational intermediates in training; third, coarse-grained arrays demanding high-bit analog-to-digital converters (ADCs) and introducing poor utilization in depthwise and pointwise convolution; last, degraded accuracy due to its sensitivity to weights which are affected by RRAM's nonideality. On the other hand, we observe that IS dataflow, where RRAMs retain inputs (activations), can effectively address the limitations of WS, because of low dependency by only loading weights, no need for extra RRAMs, feasibility of fine-grained accelerator design, and less impact of input (activation) variance on accuracy. But IS dataflow is hardly achievable by the existing crossbar structure because it is difficult to implement kernel sliding and preserve the high parallelism. To support kernel movement, we constitute a cell structure with two-transistor-one-RRAM (2T1R). Based on the 2T1R cell, we design a novel three-dimensional (3D) architecture for high parallelism in batch training. Our experiment results prove the potential of INCA. Compared to the WS accelerator, INCA achieves up to 20.6× and 260× energy efficiency improvement in inference and training, respectively; 4.8× (inference) and 18.6× (training) speedup as well. While accuracy in WS drops to 15% in our high-noise simulation, INCA presents an even more robust result as 86% accuracy.

Duke Scholars

Author Hai "Helen" Li Electrical and Computer Engineering

Published In

Proceedings - International Symposium on High-Performance Computer Architecture

DOI

10.1109/HPCA56546.2023.10070992

ISSN

1530-0897

Publication Date

January 1, 2023

Volume

2023-February

Start / End Page

29 / 41

Citation

APA

Chicago

ICMJE

MLA

NLM

Kim, B., Li, S., & Li, H. (2023). INCA: Input-stationary Dataflow at Outside-the-box Thinking about Deep Learning Accelerators. In Proceedings - International Symposium on High-Performance Computer Architecture (Vol. 2023-February, pp. 29–41). https://doi.org/10.1109/HPCA56546.2023.10070992

Kim, B., S. Li, and H. Li. “INCA: Input-stationary Dataflow at Outside-the-box Thinking about Deep Learning Accelerators.” In Proceedings - International Symposium on High-Performance Computer Architecture, 2023-February:29–41, 2023. https://doi.org/10.1109/HPCA56546.2023.10070992.

Kim B, Li S, Li H. INCA: Input-stationary Dataflow at Outside-the-box Thinking about Deep Learning Accelerators. In: Proceedings - International Symposium on High-Performance Computer Architecture. 2023. p. 29–41.

Kim, B., et al. “INCA: Input-stationary Dataflow at Outside-the-box Thinking about Deep Learning Accelerators.” Proceedings - International Symposium on High-Performance Computer Architecture, vol. 2023-February, 2023, pp. 29–41. Scopus, doi:10.1109/HPCA56546.2023.10070992.

Kim B, Li S, Li H. INCA: Input-stationary Dataflow at Outside-the-box Thinking about Deep Learning Accelerators. Proceedings - International Symposium on High-Performance Computer Architecture. 2023. p. 29–41.

Published In

Proceedings - International Symposium on High-Performance Computer Architecture

DOI

10.1109/HPCA56546.2023.10070992

ISSN

1530-0897

Publication Date

January 1, 2023

Volume

2023-February

Start / End Page

29 / 41