Skip to main content

DefT: Boosting Scalability of Deformable Convolution Operations on GPUs

Publication ,  Conference
Hanson, E; Horton, M; Li, HH; Chen, Y
Published in: International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
March 25, 2023

Deformable Convolutional Networks (DCN) have been proposed as a powerful tool to boost the representation power of Convolutional Neural Networks (CNN) in computer vision tasks via adaptive sampling of the input feature map. Much like vision transformers, DCNs utilize a more flexible inductive bias than standard CNNs and have also been shown to improve performance of particular models. For example, drop-in DCN layers were shown to increase the AP score of Mask RCNN by 10.6 points while introducing only 1% additional parameters and FLOPs, improving the state-of-the-art model at the time of publication. However, despite evidence that more DCN layers placed earlier in the network can further improve performance, we have not seen this trend continue with further scaling of deformations in CNNs, unlike for vision transformers. Benchmarking experiments show that a realistically sized DCN layer (64H×64W, 64 in-out channel) incurs a 4× slowdown on a GPU platform, discouraging the more ubiquitous use of deformations in CNNs. These slowdowns are caused by the irregular input-dependent access patterns of the bilinear interpolation operator, which has a disproportionately low arithmetic intensity (AI) compared to the rest of the DCN. To address the disproportionate slowdown of DCNs and enable their expanded use in CNNs, we propose DefT, a series of workload-aware optimizations for DCN kernels. DefT identifies performance bottlenecks in DCNs and fuses specific operators that are observed to limit DCN AI. Our approach also uses statistical information of DCN workloads to adapt the workload tiling to the DCN layer dimensions, minimizing costly out-of-boundary input accesses. Experimental results show that DefT mitigates up to half of DCN slowdown over the current-art PyTorch implementation. This translates to a layerwise speedup of up to 134% and a reduction of normalized training time of 46% on a fully DCN-enabled ResNet model.

Duke Scholars

Published In

International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

DOI

Publication Date

March 25, 2023

Volume

3

Start / End Page

134 / 146
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Hanson, E., Horton, M., Li, H. H., & Chen, Y. (2023). DefT: Boosting Scalability of Deformable Convolution Operations on GPUs. In International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS (Vol. 3, pp. 134–146). https://doi.org/10.1145/3582016.3582017
Hanson, E., M. Horton, H. H. Li, and Y. Chen. “DefT: Boosting Scalability of Deformable Convolution Operations on GPUs.” In International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS, 3:134–46, 2023. https://doi.org/10.1145/3582016.3582017.
Hanson E, Horton M, Li HH, Chen Y. DefT: Boosting Scalability of Deformable Convolution Operations on GPUs. In: International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS. 2023. p. 134–46.
Hanson, E., et al. “DefT: Boosting Scalability of Deformable Convolution Operations on GPUs.” International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS, vol. 3, 2023, pp. 134–46. Scopus, doi:10.1145/3582016.3582017.
Hanson E, Horton M, Li HH, Chen Y. DefT: Boosting Scalability of Deformable Convolution Operations on GPUs. International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS. 2023. p. 134–146.

Published In

International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

DOI

Publication Date

March 25, 2023

Volume

3

Start / End Page

134 / 146