CAMformer: Binary Associative Memory Is All You Need
Transformer attention mechanisms pose significant scalability challenges due to quadratic complexity in sequence length, and existing accelerators remain bottlenecked by dense arithmetic and data movement. This paper proposes CAMformer, a hardware accelerator that reinterprets attention as an associative memory operation, contributing at three levels. At the circuit level, a voltage-domain Binary Attention CAM (BA-CAM) computes Hamming similarity through analog charge sharing, achieving 1.12% mean error under PVT variation— 7x lower than time-domain approaches. At the architecture level, a three-stage pipeline with hierarchical two-stage top- k filtering reduces score storage by 8x while hiding DRAM latency. At the algorithm level, this top- k mechanism, co-designed with Hamming Attention Distillation (HAD), maintains <0.4% accuracy degradation on GLUE benchmarks. Implemented in 65 nm CMOS and evaluated on BERT-Large, Vision Transformer, and GPT-2 decoder workloads via HSPICE simulation and Design Compiler synthesis, CAMformer achieves 9,045 queries/mJ (10x), 191 queries/ms (4x), and 0.26 mm2 (6– 8x reduction) for attention computation compared to state-of-the-art accelerators. These results demonstrate that reconceptualizing attention as associative memory retrieval enables order-of-magnitude efficiency gains for edge Transformer inference.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Related Subject Headings
- Electrical & Electronic Engineering
- 4009 Electronics, sensors and digital hardware
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Related Subject Headings
- Electrical & Electronic Engineering
- 4009 Electronics, sensors and digital hardware