Skip to main content

Adaptive feature abstraction for translating video to text

Publication ,  Conference
Pu, Y; Min, MR; Gan, Z; Carin, L
Published in: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018
January 1, 2018

Previous models for video captioning often use the output from a specific layer of a Convolutional Neural Network (CNN) as video features. However, the variable context-dependent semantics in the video may make it more appropriate to adaptively select features from the multiple CNN layers. We propose a new approach to generating adaptive spatiotemporal representations of videos for the captioning task. A novel attention mechanism is developed, which adaptively and sequentially focuses on different layers of CNN features (levels of feature “abstraction”), as well as local spatiotemporal regions of the feature maps at each layer. The proposed approach is evaluated on three benchmark datasets: YouTube2Text, M-VAD and MSR-VTT. Along with visualizing the results and how the model works, these experiments quantitatively demonstrate the effectiveness of the proposed adaptive spatiotemporal feature abstraction for translating videos to sentences with rich semantics.

Duke Scholars

Published In

32nd AAAI Conference on Artificial Intelligence, AAAI 2018

Publication Date

January 1, 2018

Start / End Page

7284 / 7291
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Pu, Y., Min, M. R., Gan, Z., & Carin, L. (2018). Adaptive feature abstraction for translating video to text. In 32nd AAAI Conference on Artificial Intelligence, AAAI 2018 (pp. 7284–7291).
Pu, Y., M. R. Min, Z. Gan, and L. Carin. “Adaptive feature abstraction for translating video to text.” In 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 7284–91, 2018.
Pu Y, Min MR, Gan Z, Carin L. Adaptive feature abstraction for translating video to text. In: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018. 2018. p. 7284–91.
Pu, Y., et al. “Adaptive feature abstraction for translating video to text.” 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 2018, pp. 7284–91.
Pu Y, Min MR, Gan Z, Carin L. Adaptive feature abstraction for translating video to text. 32nd AAAI Conference on Artificial Intelligence, AAAI 2018. 2018. p. 7284–7291.

Published In

32nd AAAI Conference on Artificial Intelligence, AAAI 2018

Publication Date

January 1, 2018

Start / End Page

7284 / 7291