Skip to main content
Journal cover image

Action recognition in videos with temporal segments fusions

Publication ,  Conference
Fang, Y; Zhang, R; Wang, QF; Huang, K
Published in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
January 1, 2020

Deep Convolutional Neural Networks (CNNs) have achieved great success in object recognition. However, they are difficult to capture the long-range temporal information, which plays an important role for action recognition in videos. To overcome this issue, a two-stream architecture including spatial and temporal segments based CNNs is widely used recently. However, the relationship among the segments is not sufficiently investigated. In this paper, we proposed to combine multiple segments by a fully connected layer in a deep CNN model for the whole action video. Moreover, the four streams (i.e., RGB, RGB differences, optical flow, and warped optical flow) are carefully integrated with a linear combination, and the weights are optimized on the validation datasets. We evaluate the recognition accuracy of the proposed method on two benchmark datasets of UCF101 and HMDB51. The extensive experimental results demonstrate encouraging results of our proposed method. Specifically, the proposed method improves the accuracy of action recognition in videos obviously (e.g., compared with the baseline, the accuracy is improved from 94.20% to 97.30% and from 69.40% to 77.99% on the dataset UCF101 and HMDB51, respectively). Furthermore, the proposed method can obtain the competitive accuracy to the state-of-the-art method of the 3D convolutional operation, but with much fewer parameters.

Duke Scholars

Published In

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

DOI

EISSN

1611-3349

ISSN

0302-9743

ISBN

9783030394301

Publication Date

January 1, 2020

Volume

11691 LNAI

Start / End Page

244 / 253

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Fang, Y., Zhang, R., Wang, Q. F., & Huang, K. (2020). Action recognition in videos with temporal segments fusions. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11691 LNAI, pp. 244–253). https://doi.org/10.1007/978-3-030-39431-8_23
Fang, Y., R. Zhang, Q. F. Wang, and K. Huang. “Action recognition in videos with temporal segments fusions.” In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11691 LNAI:244–53, 2020. https://doi.org/10.1007/978-3-030-39431-8_23.
Fang Y, Zhang R, Wang QF, Huang K. Action recognition in videos with temporal segments fusions. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2020. p. 244–53.
Fang, Y., et al. “Action recognition in videos with temporal segments fusions.” Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11691 LNAI, 2020, pp. 244–53. Scopus, doi:10.1007/978-3-030-39431-8_23.
Fang Y, Zhang R, Wang QF, Huang K. Action recognition in videos with temporal segments fusions. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2020. p. 244–253.
Journal cover image

Published In

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

DOI

EISSN

1611-3349

ISSN

0302-9743

ISBN

9783030394301

Publication Date

January 1, 2020

Volume

11691 LNAI

Start / End Page

244 / 253

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences