Scholars@Duke publication: Cross-Attention Transformer for Video Interpolation

Cross-Attention Transformer for Video Interpolation

Publication , Conference

Kim, HH; Yu, S; Yuan, S; Tomasi, C

Published in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

January 1, 2023

Published version (DOI)

We propose TAIN (Transformers and Attention for video INterpolation), a residual neural network for video interpolation, which aims to interpolate an intermediate frame given two consecutive image frames around it. We first present a novel vision transformer module, named Cross-Similarity (CS), to globally aggregate input image features with similar appearance as those of the predicted interpolated frame. These CS features are then used to refine the interpolated prediction. To account for occlusions in the CS features, we propose an Image Attention (IA) module to allow the network to focus on CS features from one frame over those of the other. TAIN outperforms existing methods that do not require flow estimation and performs comparably to flow-based methods while being computationally efficient in terms of inference time on Vimeo90k, UCF101, and SNU-FILM benchmarks.

Duke Scholars

Author Carlo Tomasi Computer Science

Published In

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

DOI

10.1007/978-3-031-27066-6_23

EISSN

1611-3349

ISSN

0302-9743

ISBN

9783031270659

Publication Date

January 1, 2023

Volume

13848 LNCS

Start / End Page

325 / 342

Related Subject Headings

Artificial Intelligence & Image Processing
46 Information and computing sciences

Citation

APA

Chicago

ICMJE

MLA

NLM

Kim, H. H., Yu, S., Yuan, S., & Tomasi, C. (2023). Cross-Attention Transformer for Video Interpolation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13848 LNCS, pp. 325–342). https://doi.org/10.1007/978-3-031-27066-6_23

Kim, H. H., S. Yu, S. Yuan, and C. Tomasi. “Cross-Attention Transformer for Video Interpolation.” In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13848 LNCS:325–42, 2023. https://doi.org/10.1007/978-3-031-27066-6_23.

Kim HH, Yu S, Yuan S, Tomasi C. Cross-Attention Transformer for Video Interpolation. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2023. p. 325–42.

Kim, H. H., et al. “Cross-Attention Transformer for Video Interpolation.” Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13848 LNCS, 2023, pp. 325–42. Scopus, doi:10.1007/978-3-031-27066-6_23.

Kim HH, Yu S, Yuan S, Tomasi C. Cross-Attention Transformer for Video Interpolation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2023. p. 325–342.

Published In

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

DOI

10.1007/978-3-031-27066-6_23

EISSN

1611-3349

ISSN

0302-9743

ISBN

9783031270659

Publication Date

January 1, 2023

Volume

13848 LNCS

Start / End Page

325 / 342

Related Subject Headings

Artificial Intelligence & Image Processing
46 Information and computing sciences