Scholars@Duke publication: Robin: RWKV Accelerator using Block Circulant Matrices based on FPGA

Robin: RWKV Accelerator using Block Circulant Matrices based on FPGA

Publication , Conference

Li, Z; Li, S; Dai, C; Ma, C; Liang, J; Li, X; Zhanh, W

Published in: IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad

January 1, 2025

Recent advancements in linear-attention models, such as RWKV, have opened up new possibilities for efficient sequence processing by reducing the computational overhead of traditional Transformer architectures. Field-programmable gate arrays (FPGAs) offer a compelling solution for deep learning applications by providing customizable hardware architectures that enhance computational efficiency and flexibility. However, deploying these models on FPGAs introduces several challenges. Previous FPGA deployment workflows tend to focus on general machine learning tasks, lacking sufficient integration between software and hardware optimizations. Besides, FPGAs are constrained by limited on-chip and off-chip memory, posing significant challenges for weight storage. Moreover, the predominance of linear operations on the GPU runtime leads to significant computational bottlenecks. These obstacles necessitate innovative solutions to bridge the performance gap between FPGAs and GPUs while preserving model accuracyTo overcome these challenges, we introduce Robin, a fine-grained FPGA accelerator workflow that integrates both algorithm-level and hardware-level optimization. Robin leverages a weight compression technique based on Partial Block Circulant Matrices (PBCM), which effectively reduces storage demands while maintaining accuracy. Based on PBCM, our design employs a configurable circulant computing core that fully exploits the bit-width efficiency of DSP48E resources through two DSP packaging strategies to support both circulant and standard matrix operations. The combined end-to-end software-hardware co-design enables Robin to achieve up to a 3.09× increase in throughput and a 7.31× boost in energy efficiency compared to high-end Tesla A100 GPU implementations, making it a compelling solution for deploying RWKV models on FPGAs.

Duke Scholars

Author Xin Li Pierre R. Lamond Department of Electrical and Computer Engin ...

Published In

IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad

DOI

10.1109/ICCAD66269.2025.11240845

ISSN

1092-3152

Publication Date

January 1, 2025

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, Z., Li, S., Dai, C., Ma, C., Liang, J., Li, X., & Zhanh, W. (2025). Robin: RWKV Accelerator using Block Circulant Matrices based on FPGA. In IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad. https://doi.org/10.1109/ICCAD66269.2025.11240845

Li, Z., S. Li, C. Dai, C. Ma, J. Liang, X. Li, and W. Zhanh. “Robin: RWKV Accelerator using Block Circulant Matrices based on FPGA.” In IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad, 2025. https://doi.org/10.1109/ICCAD66269.2025.11240845.

Li Z, Li S, Dai C, Ma C, Liang J, Li X, et al. Robin: RWKV Accelerator using Block Circulant Matrices based on FPGA. In: IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad. 2025.

Li, Z., et al. “Robin: RWKV Accelerator using Block Circulant Matrices based on FPGA.” IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad, 2025. Scopus, doi:10.1109/ICCAD66269.2025.11240845.

Li Z, Li S, Dai C, Ma C, Liang J, Li X, Zhanh W. Robin: RWKV Accelerator using Block Circulant Matrices based on FPGA. IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad. 2025.

Published In

IEEE ACM International Conference on Computer Aided Design Digest of Technical Papers Iccad

DOI

10.1109/ICCAD66269.2025.11240845

ISSN

1092-3152

Publication Date

January 1, 2025