Scholars@Duke publication: PENNI: Pruned kernel sharing for efficient cnn inference

PENNI: Pruned kernel sharing for efficient cnn inference

Publication , Conference

Li, S; Hanson, E; Li, H; Chen, Y

Published in: 37th International Conference on Machine Learning, ICML 2020

January 1, 2020

Although state-of-the-art (SOTA) CNNs achieve outstanding performance on various tasks, their high computation demand and massive number of parameters make it difficult to deploy these SOTA CNNs onto resource-constrained devices. Previous works on CNN acceleration utilize low_rank approximation of the original convolution layers to reduce computation cost. However, these methods are very difficult to conduct upon sparse models, which limits execution speedup since re_dundancies within the CNN model are not fully exploited. We argue that kernel granularity de_composition can be conducted with low-rank as_sumption while exploiting the redundancy within the remaining compact coefficients. Based on this observation, we propose PENNI, a CNN model compression framework that is able to achieve model compactness and hardware efficiency si_multaneously by (1) implementing kernel sharing in convolution layers via a small number of basis kernels and (2) alternately adjusting bases and coefficients with sparse constraints. Experiments show that we can prune 97% parameters and 92% FLOPs on ResNet18 CIFAR10 with no accuracy loss, and achieve 44% reduction in run-time mem_ory consumption and a 53% reduction in inference latency.

Duke Scholars

Author Hai "Helen" Li Electrical and Computer Engineering

Author Yiran Chen Electrical and Computer Engineering

Published In

37th International Conference on Machine Learning, ICML 2020

Publication Date

January 1, 2020

Volume

PartF168147-8

Start / End Page

5819 / 5829

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, S., Hanson, E., Li, H., & Chen, Y. (2020). PENNI: Pruned kernel sharing for efficient cnn inference. In 37th International Conference on Machine Learning, ICML 2020 (Vol. PartF168147-8, pp. 5819–5829).

Li, S., E. Hanson, H. Li, and Y. Chen. “PENNI: Pruned kernel sharing for efficient cnn inference.” In 37th International Conference on Machine Learning, ICML 2020, PartF168147-8:5819–29, 2020.

Li S, Hanson E, Li H, Chen Y. PENNI: Pruned kernel sharing for efficient cnn inference. In: 37th International Conference on Machine Learning, ICML 2020. 2020. p. 5819–29.

Li, S., et al. “PENNI: Pruned kernel sharing for efficient cnn inference.” 37th International Conference on Machine Learning, ICML 2020, vol. PartF168147-8, 2020, pp. 5819–29.

Li S, Hanson E, Li H, Chen Y. PENNI: Pruned kernel sharing for efficient cnn inference. 37th International Conference on Machine Learning, ICML 2020. 2020. p. 5819–5829.

Published In

37th International Conference on Machine Learning, ICML 2020

Publication Date

January 1, 2020

Volume

PartF168147-8

Start / End Page

5819 / 5829