FPGA acceleration of recurrent neural network based language model
Recurrent neural network (RNN) based language model (RNNLM) is a biologically inspired model for natural language processing. It records the historical information through additional recurrent connections and therefore is very effective in capturing semantics of sentences. However, the use of RNNLM has been greatly hindered for the high computation cost in training. This work presents an FPGA implementation framework for RNNLM training acceleration. At architectural level, we improve the parallelism of RNN training scheme and reduce the computing resource requirement for computation efficiency enhancement. The hardware implementation primarily targets at reducing data communication load. A multi-thread based computation engine is utilized which can successfully mask the long memory latency and reuse frequent accessed data. The evaluation based on the Microsoft Research Sentence Completion Challenge shows that the proposed FPGA implementation outperforms traditional class-based modest-size recurrent networks and obtains 46.2% in training accuracy. Moreover, experiments at different network sizes demonstrate a great scalability of the proposed framework.