BigLittleMCA: A Spatially-Optimal Tiled Hardware Accelerator for MCMC Image Processing
Publication
, Journal Article
Kjellqvist, C; Wills, L; Lebeck, A
Published in: ACM Transactions on Architecture and Code Optimization
Markov-Chain Monte-Carlo (MCMC) algorithms offer a general framework for performing interpretable inference but have high overheads due to the computational complexity of the sampling process and the large number of samples required to produce an accurate result. Computer Vision is a common class of workloads that can be performed using MCMC methods. As computer vision workloads trend toward high-resolution real-time inference, it becomes challenging to perform inference in contexts such as edge computing, which operates under strict power and area budgets. Previous work explores hardware techniques for efficient sampling; however, MCMC algorithms still require many samples. We reduce the overheads of Gibbs Sampling, an MCMC algorithm, using an approach we call mixed-resolution sampling. This approach uses low-resolution inference to provide a starting point for full-resolution sampling. We evaluate this approach on three important computer vision tasks: stereo matching, optical flow, and blind source separation. Mixed-resolution sampling reduces root mean square error (RMSE) by an average of 19.6% for stereo-matching tasks, 13% for optical flow tasks, and 6.3% for blind source separation relative to traditional Gibbs Sampling. To enable real-time, explainable MCMC inference under edge power constraints, we exploit the structure of mixed-resolution sampling to architect and implement a hardware-software co-designed accelerator architecture, BigLittleMCA (
-
MC
ccelerator). BigLittleMCA is a tiled MCMC accelerator architecture that uses a small sampler for low-resolution sampling and a large sampler for full-resolution sampling. Our results show that the architecture sustains real-time 720p inference at 30 FPS (frames per second) using 48.5% less power than prior work.