Skip to main content

Spatiotemporal Joint Filter Decomposition in 3D Convolutional Neural Networks

Publication ,  Conference
Miao, Z; Wang, Z; Cheng, X; Qiu, Q
Published in: Advances in Neural Information Processing Systems
January 1, 2021

In this paper, we introduce spatiotemporal joint filter decomposition to decouple spatial and temporal learning, while preserving spatiotemporal dependency in a video. A 3D convolutional filter is now jointly decomposed over a set of spatial and temporal filter atoms respectively. In this way, a 3D convolutional layer becomes three: a temporal atom layer, a spatial atom layer, and a joint coefficient layer, all three remaining convolutional. One obvious arithmetic manipulation allowed in our joint decomposition is to swap spatial or temporal atoms with a set of atoms that have the same number but different sizes, while keeping the remaining unchanged. For example, as shown later, we can now achieve tempo-invariance by simply dilating temporal atoms only. To illustrate this useful atom-swapping property, we further demonstrate how such a decomposition permits the direct learning of 3D CNNs with full-size videos through iterations of two consecutive sub-stages of learning: In the temporal stage, full-temporal downsampled-spatial data are used to learn temporal atoms and joint coefficients while fixing spatial atoms. In the spatial stage, full-spatial downsampled-temporal data are used for spatial atoms and joint coefficients while fixing temporal atoms. We show empirically on multiple action recognition datasets that, the decoupled spatiotemporal learning significantly reduces the model memory footprints, and allows deep 3D CNNs to model high-spatial long-temporal dependency with limited computational resources while delivering comparable performance.

Duke Scholars

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2021

Volume

5

Start / End Page

3376 / 3388

Related Subject Headings

  • 4611 Machine learning
  • 1702 Cognitive Sciences
  • 1701 Psychology
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Miao, Z., Wang, Z., Cheng, X., & Qiu, Q. (2021). Spatiotemporal Joint Filter Decomposition in 3D Convolutional Neural Networks. In Advances in Neural Information Processing Systems (Vol. 5, pp. 3376–3388).
Miao, Z., Z. Wang, X. Cheng, and Q. Qiu. “Spatiotemporal Joint Filter Decomposition in 3D Convolutional Neural Networks.” In Advances in Neural Information Processing Systems, 5:3376–88, 2021.
Miao Z, Wang Z, Cheng X, Qiu Q. Spatiotemporal Joint Filter Decomposition in 3D Convolutional Neural Networks. In: Advances in Neural Information Processing Systems. 2021. p. 3376–88.
Miao, Z., et al. “Spatiotemporal Joint Filter Decomposition in 3D Convolutional Neural Networks.” Advances in Neural Information Processing Systems, vol. 5, 2021, pp. 3376–88.
Miao Z, Wang Z, Cheng X, Qiu Q. Spatiotemporal Joint Filter Decomposition in 3D Convolutional Neural Networks. Advances in Neural Information Processing Systems. 2021. p. 3376–3388.

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2021

Volume

5

Start / End Page

3376 / 3388

Related Subject Headings

  • 4611 Machine learning
  • 1702 Cognitive Sciences
  • 1701 Psychology