Heterogeneous systems with reconfigurable neuromorphic computing accelerators
Developing heterogeneous system with hardware accelerator is a promising solution to implement high performance applications where explicitly programmed, rule-based algorithms are either infeasible or inefficient. However, mapping a neural network model to a hardware representation is a complex process, where balancing computation resources and memory accesses is crucial. In this work, we present a systematic approach o optimize the heterogeneous system with a FPGA-based neuromorphic computing accelerator (NCA). For any applications, the neural network topology and computation flow of the accelerator can be configured through a NCA-aware compiler. The FPGA-based NCA contains a generic multi-layer neural network composed of a set of parallel neural processing elements. Such a scheme imitates the human cognition process and follows the hierarchy of neocortex. At architectural level, we decrease the computing resource requirement to enhance computation efficiency. The hardware implementation primarily targets at reducing data communication load: a multi-thread computation engine is utilized to mask the long memory latency. Such a combined solution can well accommodate the ever increasing complexity and scalability of machine learning applications and improve the system performance and efficiency. Through the evaluation across eight representative benchmarks, we observed on average 12.1× speedup and 45.8× energy reduction, with marginal accuracy loss comparing with CPU-only computation.