Scholars@Duke publication: Cascading Structured Pruning: Enabling High Data Reuse for Sparse DNN Accelerators

Cascading Structured Pruning: Enabling High Data Reuse for Sparse DNN Accelerators

Publication , Conference

Hanson, E; Li, S; Li, HH; Chen, Y

Published in: Proceedings - International Symposium on Computer Architecture

June 18, 2022

Performance and efciency of running modern Deep Neural Networks (DNNs) are heavily bounded by data movement. To mitigate the data movement bottlenecks, recent DNN inference accelerator designs widely adopt aggressive compression techniques and sparse-skipping mechanisms. These mechanisms avoid transferring or computing with zero-valued weights or activations to save time and energy. However, such sparse-skipping logic involves large input buffers and irregular data access patterns, thus precluding many energy-efcient data reuse opportunities and dataflows. In this work, we propose Cascading Structured Pruning (CSP), a technique that preserves signifcantly more data reuse opportunities for higher energy efciency while maintaining comparable performance relative to recent sparse architectures such as SparTen. CSP includes the following two components: At algorithm level, CSP-A induces a predictable sparsity pattern that allows for low-overhead compression of weight data and sequential access to both activation and weight data. At architecture level, CSP-H leverages CSP-A's induced sparsity pattern with a novel dataflow to access unique activation data only once, thus removing the demand for large input buffers. Each CSP-H processing element (PE) employs a novel accumulation buffer design and a counter-based sparse-skipping mechanism to support the dataflow with minimum controller overhead. We verify our approach on several representative models. Our simulated results show that CSP achieves on average 15× energy efciency improvement over SparTen with comparable or superior speedup under most evaluations.

Duke Scholars

Author Hai "Helen" Li Electrical and Computer Engineering

Author Yiran Chen Electrical and Computer Engineering

Published In

Proceedings - International Symposium on Computer Architecture

DOI

10.1145/3470496.3527419

ISSN

1063-6897

ISBN

9781450386104

Publication Date

June 18, 2022

Start / End Page

522 / 535

Citation

APA

Chicago

ICMJE

MLA

NLM

Hanson, E., Li, S., Li, H. H., & Chen, Y. (2022). Cascading Structured Pruning: Enabling High Data Reuse for Sparse DNN Accelerators. In Proceedings - International Symposium on Computer Architecture (pp. 522–535). https://doi.org/10.1145/3470496.3527419

Hanson, E., S. Li, H. H. Li, and Y. Chen. “Cascading Structured Pruning: Enabling High Data Reuse for Sparse DNN Accelerators.” In Proceedings - International Symposium on Computer Architecture, 522–35, 2022. https://doi.org/10.1145/3470496.3527419.

Hanson E, Li S, Li HH, Chen Y. Cascading Structured Pruning: Enabling High Data Reuse for Sparse DNN Accelerators. In: Proceedings - International Symposium on Computer Architecture. 2022. p. 522–35.

Hanson, E., et al. “Cascading Structured Pruning: Enabling High Data Reuse for Sparse DNN Accelerators.” Proceedings - International Symposium on Computer Architecture, 2022, pp. 522–35. Scopus, doi:10.1145/3470496.3527419.

Hanson E, Li S, Li HH, Chen Y. Cascading Structured Pruning: Enabling High Data Reuse for Sparse DNN Accelerators. Proceedings - International Symposium on Computer Architecture. 2022. p. 522–535.

Published In

Proceedings - International Symposium on Computer Architecture

DOI

10.1145/3470496.3527419

ISSN

1063-6897

ISBN

9781450386104

Publication Date

June 18, 2022

Start / End Page

522 / 535