Scholars@Duke publication: IVQ: In-Memory Acceleration of DNN Inference Exploiting Varied Quantization

IVQ: In-Memory Acceleration of DNN Inference Exploiting Varied Quantization

Publication , Journal Article

Liu, F; Zhao, W; Wang, Z; Zhao, Y; Yang, T; Chen, Y; Jiang, L

Published in: IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems

December 1, 2022

Weight quantization is well adapted to cope with the ever-growing complexity of the deep neural network (DNN) model. Diversified quantization schemes lead to diverse quantized bit width and formats of the weights, thereby, subject to different hardware implementations. Such variety prevents a general NPU to leverage different quantization schemes to gain performance and energy efficiency. More importantly, a trend of quantization diversity emerges that applies multiple quantization schemes to different fine-grained structures (e.g., a layer or a channel of weight) of a DNN. Therefore, a general architecture is desired to exploit varied quantization schemes. The crossbar-based processing-in-memory (PIM) architecture, a promising DNN accelerator, is well known for its highly efficient matrix-vector multiplication. However, PIM suffers from the inflexible intracrossbar data path because the weight is stationary on the crossbar and binds to the 'add' operation along the bitline. Therefore, many nonuniform quantization methods must rollback the quantization before mapping the weights onto the crossbar. Counterintuitively, this article discovers a unique opportunity of the PIM architecture to exploit varied quantization schemes. We first transform the quantization diversity problem into a consistency problem by aligning the bit with the same magnitude along the same bitline of the crossbar. Consequently, such naive weight mapping causes many square hollows of idle PIM cells. We then propose a novel spatial mapping to exempt these 'hollow' crossbar from the intercrossbar data path. To further squeeze the weights on fewer crossbars, we decouple the intracrossbar data path from the hardware bitline by a novel temporal scheduling, so that bits with different magnitudes can be placed on cells along the same bitline. Finally, the proposed IVQ includes a temporal pipeline to avoid the introduced stalling cycles, and a data flow with delicate control mechanisms for the new intra and intercrossbar data paths. Putting all together, IVQ achieves 19.7×, 10.7×, 4.7× ∼ 63.4×, 91.7× speedup, and 17.7×, 5.1×, 5.7× ∼ 68.1×, 541× energy savings over two PIM accelerators (ISAAC and CASCADE), two customized quantization accelerators (based on ASIC and FPGA), and NVIDIA RTX 2080 GPU, respectively.

Duke Scholars

Author Yiran Chen Electrical and Computer Engineering

Published In

IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems

DOI

10.1109/TCAD.2022.3156017

EISSN

1937-4151

ISSN

0278-0070

Publication Date

December 1, 2022

Volume

Issue

Start / End Page

5313 / 5326

Related Subject Headings

Computer Hardware & Architecture
4607 Graphics, augmented reality and games
4009 Electronics, sensors and digital hardware
1006 Computer Hardware
0906 Electrical and Electronic Engineering

Citation

APA

Chicago

ICMJE

MLA

NLM

Liu, F., Zhao, W., Wang, Z., Zhao, Y., Yang, T., Chen, Y., & Jiang, L. (2022). IVQ: In-Memory Acceleration of DNN Inference Exploiting Varied Quantization. IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, 41(12), 5313–5326. https://doi.org/10.1109/TCAD.2022.3156017

Liu, F., W. Zhao, Z. Wang, Y. Zhao, T. Yang, Y. Chen, and L. Jiang. “IVQ: In-Memory Acceleration of DNN Inference Exploiting Varied Quantization.” IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems 41, no. 12 (December 1, 2022): 5313–26. https://doi.org/10.1109/TCAD.2022.3156017.

Liu F, Zhao W, Wang Z, Zhao Y, Yang T, Chen Y, et al. IVQ: In-Memory Acceleration of DNN Inference Exploiting Varied Quantization. IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems. 2022 Dec 1;41(12):5313–26.

Liu, F., et al. “IVQ: In-Memory Acceleration of DNN Inference Exploiting Varied Quantization.” IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, vol. 41, no. 12, Dec. 2022, pp. 5313–26. Scopus, doi:10.1109/TCAD.2022.3156017.

Liu F, Zhao W, Wang Z, Zhao Y, Yang T, Chen Y, Jiang L. IVQ: In-Memory Acceleration of DNN Inference Exploiting Varied Quantization. IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems. 2022 Dec 1;41(12):5313–5326.

Published In

IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems

DOI

10.1109/TCAD.2022.3156017

EISSN

1937-4151

ISSN

0278-0070

Publication Date

December 1, 2022

Volume

Issue

Start / End Page

5313 / 5326

Related Subject Headings

Computer Hardware & Architecture
4607 Graphics, augmented reality and games
4009 Electronics, sensors and digital hardware
1006 Computer Hardware
0906 Electrical and Electronic Engineering