Scholars@Duke publication: Adaptive feature abstraction for translating video to language

Adaptive feature abstraction for translating video to language

Publication , Conference

Pu, Y; Gan, Z; Carin, L; Min, MR

Published in: 5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings

January 1, 2019

© 5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings. All Rights Reserved. A new model for video captioning is developed, using a deep three-dimensional Convolutional Neural Network (C3D) as an encoder for videos and a Recurrent Neural Network (RNN) as a decoder for captions. A novel attention mechanism with spatiotemporal alignment is employed to adaptively and sequentially focus on different layers of CNN features (levels of feature “abstraction”), as well as local spatiotemporal regions of the feature maps at each layer. The proposed approach is evaluated on the YouTube2Text benchmark. Experimental results demonstrate quantitatively the effectiveness of our proposed adaptive spatiotemporal feature abstraction for translating videos to sentences with rich semantic structures.

Duke Scholars

Author Lawrence Carin Electrical and Computer Engineering

Published In

5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings

Publication Date

January 1, 2019

Citation

APA

Chicago

ICMJE

MLA

NLM

Pu, Y., Gan, Z., Carin, L., & Min, M. R. (2019). Adaptive feature abstraction for translating video to language. In 5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings.

Pu, Y., Z. Gan, L. Carin, and M. R. Min. “Adaptive feature abstraction for translating video to language.” In 5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings, 2019.

Pu Y, Gan Z, Carin L, Min MR. Adaptive feature abstraction for translating video to language. In: 5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings. 2019.

Pu, Y., et al. “Adaptive feature abstraction for translating video to language.” 5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings, 2019.

Pu Y, Gan Z, Carin L, Min MR. Adaptive feature abstraction for translating video to language. 5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings. 2019.

Published In

5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings

Publication Date

January 1, 2019