BitSystolic: A 26.7 TOPS/W 2b8b NPU with Configurable Data Flows for Edge Devices
Efficient deployment of deep neural networks (DNNs) emerges with the exploding demand for artificial intelligence on edge devices. Mixed-precision inference with both compressed model and reduced computation cost enlightens a way for accurate and efficient DNN deployments. Despite obtaining mixed-precision DNN models at the algorithmic level, there still lacks sufficient hardware support. In this work, we propose BitSystolic, a neural processing unit based on a systolic array structure. In BitSystolic, the numerical precision of both weights and activations can be configured in the range of 2-8 bits, fulfilling different requirements across mixed-precision models and tasks. Moreover, BitSystolic can support various data flows presented in different types of neural layers (e.g., convolution, fully-connected, and recurrent neural layers) and adaptive optimization of data reuse by switching between the matrix-matrix mode and vector-matrix mode. We designed and fabricated the proposed BitSystolic composed of a 16\times 16 systolic array. Our measurement results show that BitSystolic features the unified power efficiency of up to 26.7 TOPS/W with 17.8 mW peak power consumption across various layer types.
Duke Scholars
Altmetric Attention Stats
Dimensions Citation Stats
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Electrical & Electronic Engineering
- 4009 Electronics, sensors and digital hardware
- 0906 Electrical and Electronic Engineering
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Electrical & Electronic Engineering
- 4009 Electronics, sensors and digital hardware
- 0906 Electrical and Electronic Engineering