A high utilization FPGA-based accelerator for variable-scale convolutional neural network
Convolutional Neural Network (CNN) plays an essential role in computer vision applications for high classification accuracy and robust generalization capability. In recent years, various GPU-based or application-specific hardware approaches have been proposed to accelerate CNN computations. However, for variable-scale CNNs, the utilization of DSP on chip is not able to achieve very high due to the boundary of image. In this paper, we propose an optimization framework to solve boundary problem and connect our accelerator with ARM processors and DDR4 memory through dual Advanced eXtensible Interface (AXI) bus. Each port is capable of a peak throughout of 1.6 GB/s in full duplex. The accelerator has the ability to perform 160 G-op/s at peak and achieve 96% computing resource utilization.