Skip to main content

Cascading Structured Pruning: Enabling High Data Reuse for Sparse DNN Accelerators

Publication ,  Conference
Hanson, E; Li, S; Li, HH; Chen, Y
Published in: Proceedings - International Symposium on Computer Architecture
June 18, 2022

Performance and efciency of running modern Deep Neural Networks (DNNs) are heavily bounded by data movement. To mitigate the data movement bottlenecks, recent DNN inference accelerator designs widely adopt aggressive compression techniques and sparse-skipping mechanisms. These mechanisms avoid transferring or computing with zero-valued weights or activations to save time and energy. However, such sparse-skipping logic involves large input buffers and irregular data access patterns, thus precluding many energy-efcient data reuse opportunities and dataflows. In this work, we propose Cascading Structured Pruning (CSP), a technique that preserves signifcantly more data reuse opportunities for higher energy efciency while maintaining comparable performance relative to recent sparse architectures such as SparTen. CSP includes the following two components: At algorithm level, CSP-A induces a predictable sparsity pattern that allows for low-overhead compression of weight data and sequential access to both activation and weight data. At architecture level, CSP-H leverages CSP-A's induced sparsity pattern with a novel dataflow to access unique activation data only once, thus removing the demand for large input buffers. Each CSP-H processing element (PE) employs a novel accumulation buffer design and a counter-based sparse-skipping mechanism to support the dataflow with minimum controller overhead. We verify our approach on several representative models. Our simulated results show that CSP achieves on average 15× energy efciency improvement over SparTen with comparable or superior speedup under most evaluations.

Duke Scholars

Published In

Proceedings - International Symposium on Computer Architecture

DOI

ISSN

1063-6897

Publication Date

June 18, 2022

Start / End Page

522 / 535
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Hanson, E., Li, S., Li, H. H., & Chen, Y. (2022). Cascading Structured Pruning: Enabling High Data Reuse for Sparse DNN Accelerators. In Proceedings - International Symposium on Computer Architecture (pp. 522–535). https://doi.org/10.1145/3470496.3527419
Hanson, E., S. Li, H. H. Li, and Y. Chen. “Cascading Structured Pruning: Enabling High Data Reuse for Sparse DNN Accelerators.” In Proceedings - International Symposium on Computer Architecture, 522–35, 2022. https://doi.org/10.1145/3470496.3527419.
Hanson E, Li S, Li HH, Chen Y. Cascading Structured Pruning: Enabling High Data Reuse for Sparse DNN Accelerators. In: Proceedings - International Symposium on Computer Architecture. 2022. p. 522–35.
Hanson, E., et al. “Cascading Structured Pruning: Enabling High Data Reuse for Sparse DNN Accelerators.” Proceedings - International Symposium on Computer Architecture, 2022, pp. 522–35. Scopus, doi:10.1145/3470496.3527419.
Hanson E, Li S, Li HH, Chen Y. Cascading Structured Pruning: Enabling High Data Reuse for Sparse DNN Accelerators. Proceedings - International Symposium on Computer Architecture. 2022. p. 522–535.

Published In

Proceedings - International Symposium on Computer Architecture

DOI

ISSN

1063-6897

Publication Date

June 18, 2022

Start / End Page

522 / 535