Skip to main content

Robin: RWKV Accelerator using Block Circulant Matrices based on FPGA

Publication ,  Conference
Li, Z; Li, S; Dai, C; Ma, C; Liang, J; Li, X; Zhanh, W
Published in: IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad
January 1, 2025

Recent advancements in linear-attention models, such as RWKV, have opened up new possibilities for efficient sequence processing by reducing the computational overhead of traditional Transformer architectures. Field-programmable gate arrays (FPGAs) offer a compelling solution for deep learning applications by providing customizable hardware architectures that enhance computational efficiency and flexibility. However, deploying these models on FPGAs introduces several challenges. Previous FPGA deployment workflows tend to focus on general machine learning tasks, lacking sufficient integration between software and hardware optimizations. Besides, FPGAs are constrained by limited on-chip and off-chip memory, posing significant challenges for weight storage. Moreover, the predominance of linear operations on the GPU runtime leads to significant computational bottlenecks. These obstacles necessitate innovative solutions to bridge the performance gap between FPGAs and GPUs while preserving model accuracyTo overcome these challenges, we introduce Robin, a fine-grained FPGA accelerator workflow that integrates both algorithm-level and hardware-level optimization. Robin leverages a weight compression technique based on Partial Block Circulant Matrices (PBCM), which effectively reduces storage demands while maintaining accuracy. Based on PBCM, our design employs a configurable circulant computing core that fully exploits the bit-width efficiency of DSP48E resources through two DSP packaging strategies to support both circulant and standard matrix operations. The combined end-to-end software-hardware co-design enables Robin to achieve up to a 3.09× increase in throughput and a 7.31× boost in energy efficiency compared to high-end Tesla A100 GPU implementations, making it a compelling solution for deploying RWKV models on FPGAs.

Duke Scholars

Published In

IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad

DOI

ISSN

1092-3152

Publication Date

January 1, 2025
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Li, Z., Li, S., Dai, C., Ma, C., Liang, J., Li, X., & Zhanh, W. (2025). Robin: RWKV Accelerator using Block Circulant Matrices based on FPGA. In IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad. https://doi.org/10.1109/ICCAD66269.2025.11240845
Li, Z., S. Li, C. Dai, C. Ma, J. Liang, X. Li, and W. Zhanh. “Robin: RWKV Accelerator using Block Circulant Matrices based on FPGA.” In IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad, 2025. https://doi.org/10.1109/ICCAD66269.2025.11240845.
Li Z, Li S, Dai C, Ma C, Liang J, Li X, et al. Robin: RWKV Accelerator using Block Circulant Matrices based on FPGA. In: IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad. 2025.
Li, Z., et al. “Robin: RWKV Accelerator using Block Circulant Matrices based on FPGA.” IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad, 2025. Scopus, doi:10.1109/ICCAD66269.2025.11240845.
Li Z, Li S, Dai C, Ma C, Liang J, Li X, Zhanh W. Robin: RWKV Accelerator using Block Circulant Matrices based on FPGA. IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad. 2025.

Published In

IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad

DOI

ISSN

1092-3152

Publication Date

January 1, 2025