Scholars@Duke publication: DefT: Boosting Scalability of Deformable Convolution Operations on GPUs

DefT: Boosting Scalability of Deformable Convolution Operations on GPUs

Publication , Conference

Hanson, E; Horton, M; Li, HH; Chen, Y

Published in: International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS

March 25, 2023

Deformable Convolutional Networks (DCN) have been proposed as a powerful tool to boost the representation power of Convolutional Neural Networks (CNN) in computer vision tasks via adaptive sampling of the input feature map. Much like vision transformers, DCNs utilize a more flexible inductive bias than standard CNNs and have also been shown to improve performance of particular models. For example, drop-in DCN layers were shown to increase the AP score of Mask RCNN by 10.6 points while introducing only 1% additional parameters and FLOPs, improving the state-of-the-art model at the time of publication. However, despite evidence that more DCN layers placed earlier in the network can further improve performance, we have not seen this trend continue with further scaling of deformations in CNNs, unlike for vision transformers. Benchmarking experiments show that a realistically sized DCN layer (64H×64W, 64 in-out channel) incurs a 4× slowdown on a GPU platform, discouraging the more ubiquitous use of deformations in CNNs. These slowdowns are caused by the irregular input-dependent access patterns of the bilinear interpolation operator, which has a disproportionately low arithmetic intensity (AI) compared to the rest of the DCN. To address the disproportionate slowdown of DCNs and enable their expanded use in CNNs, we propose DefT, a series of workload-aware optimizations for DCN kernels. DefT identifies performance bottlenecks in DCNs and fuses specific operators that are observed to limit DCN AI. Our approach also uses statistical information of DCN workloads to adapt the workload tiling to the DCN layer dimensions, minimizing costly out-of-boundary input accesses. Experimental results show that DefT mitigates up to half of DCN slowdown over the current-art PyTorch implementation. This translates to a layerwise speedup of up to 134% and a reduction of normalized training time of 46% on a fully DCN-enabled ResNet model.

Duke Scholars

Author Hai "Helen" Li Electrical and Computer Engineering

Author Yiran Chen Electrical and Computer Engineering

Published In

International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS

DOI

10.1145/3582016.3582017

Publication Date

March 25, 2023

Volume

Start / End Page

134 / 146

Citation

APA

Chicago

ICMJE

MLA

NLM

Hanson, E., Horton, M., Li, H. H., & Chen, Y. (2023). DefT: Boosting Scalability of Deformable Convolution Operations on GPUs. In International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS (Vol. 3, pp. 134–146). https://doi.org/10.1145/3582016.3582017

Hanson, E., M. Horton, H. H. Li, and Y. Chen. “DefT: Boosting Scalability of Deformable Convolution Operations on GPUs.” In International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS, 3:134–46, 2023. https://doi.org/10.1145/3582016.3582017.

Hanson E, Horton M, Li HH, Chen Y. DefT: Boosting Scalability of Deformable Convolution Operations on GPUs. In: International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS. 2023. p. 134–46.

Hanson, E., et al. “DefT: Boosting Scalability of Deformable Convolution Operations on GPUs.” International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS, vol. 3, 2023, pp. 134–46. Scopus, doi:10.1145/3582016.3582017.

Hanson E, Horton M, Li HH, Chen Y. DefT: Boosting Scalability of Deformable Convolution Operations on GPUs. International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS. 2023. p. 134–146.

Published In

International Conference on Architectural Support for Programming Languages and Operating Systems ASPLOS

DOI

10.1145/3582016.3582017

Publication Date

March 25, 2023

Volume

Start / End Page

134 / 146