A joint attention model for automated editing
Publication
, Conference
Wu, HY; Jhala, A
Published in: Ceur Workshop Proceedings
January 1, 2018
We introduce a model of joint attention for the task of automatically editing video recordings of corporate meetings. In a multi-camera setting, we extract pose data of participants from each frame and audio amplitude on individual headsets. These are used as features to train a system to predict the importance of each camera. Editing decisions and rhythm are learned from a corpus of human-expert edited videos. A Long-Short Term Memory (LSTM) neural network is then used to predict the joint attention by training on video and audio features from expert-edited videos, and editing predictions are made on other test data. The output of the system is an editing plan for the meeting in Edit Description Language (EDL) format.
Duke Scholars
Published In
Ceur Workshop Proceedings
ISSN
1613-0073
Publication Date
January 1, 2018
Volume
2321
Related Subject Headings
- 4609 Information systems
Citation
APA
Chicago
ICMJE
MLA
NLM
Wu, H. Y., & Jhala, A. (2018). A joint attention model for automated editing. In Ceur Workshop Proceedings (Vol. 2321).
Wu, H. Y., and A. Jhala. “A joint attention model for automated editing.” In Ceur Workshop Proceedings, Vol. 2321, 2018.
Wu HY, Jhala A. A joint attention model for automated editing. In: Ceur Workshop Proceedings. 2018.
Wu, H. Y., and A. Jhala. “A joint attention model for automated editing.” Ceur Workshop Proceedings, vol. 2321, 2018.
Wu HY, Jhala A. A joint attention model for automated editing. Ceur Workshop Proceedings. 2018.
Published In
Ceur Workshop Proceedings
ISSN
1613-0073
Publication Date
January 1, 2018
Volume
2321
Related Subject Headings
- 4609 Information systems