A Scalable Pipelined Dataflow Accelerator for Object Region Proposals on FPGA Platform
Region proposal is critical for object detection while it usually poses a bottleneck in improving the computation efficiency on traditional control-flow architectures. We have observed region proposal tasks are potentially suitable for performing pipelined parallelism by exploiting dataflow driven acceleration. In this paper, a scalable pipelined dataflow accelerator is proposed for efficient region proposals on FPGA platform. The accelerator processes image data by a streaming manner with three sequential stages: resizing, kernel computing and sorting. First, Ping-Pong cache strategy is adopted for rotation loading in resize module to guarantee continuous output streaming. Then, a multiple pipelines architecture with tiered memory is utilized in kernel computing module to complete the main computation tasks. Finally, a bubble-pushing heap sort method is exploited in sorting module to find the top-k largest candidates efficiently. Our design is implemented with high level synthesis on FPGA platforms, and experimental re-sults on VOC2007 datasets show that it could achieve about 3.67X speedups than traditional desktop CPU platform and >250X energy efficiency improvement than embedded ARM platform.