STCAM: Spatial-Temporal and Channel Attention Module for Dynamic Facial Expression Recognition
Capturing the dynamics of facial expression progression in video is an essential and challenging task for facial expression recognition (FER). In this article, we propose an effective framework to address this challenge. We develop a C3D-based network architecture, 3D-Inception-ResNet, to extract spatial-temporal features from the dynamic facial expression image sequence. A Spatial-Temporal and Channel Attention Module (STCAM) is proposed to explicitly exploit the holistic spatial-temporal and channel-wise correlations among the extracted features. Specifically, the proposed STCAM calculates a channel-wise and a spatial-temporal-wise attention map to enhance the features along the corresponding feature dimensions for more representative features. We evaluate our method on three popular dynamic facial expression recognition datasets, CK+, Oulu-CASIA, and MMI. Experimental results show that our method achieves better or comparable performance compared to the state-of-the-art approaches.
Duke Scholars
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- 4608 Human-centred computing
- 4603 Computer vision and multimedia computation
- 4602 Artificial intelligence
- 1702 Cognitive Sciences
- 0806 Information Systems
- 0801 Artificial Intelligence and Image Processing
Citation
Published In
DOI
EISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- 4608 Human-centred computing
- 4603 Computer vision and multimedia computation
- 4602 Artificial intelligence
- 1702 Cognitive Sciences
- 0806 Information Systems
- 0801 Artificial Intelligence and Image Processing