Skip to main content

Few-Shot Composition Learning for Image Retrieval with Prompt Tuning

Publication ,  Journal Article
Wu, J; Wang, R; Zhao, H; Zhang, R; Lu, C; Li, S; Henao, R
Published in: Proceedings of the 37th Aaai Conference on Artificial Intelligence Aaai 2023
June 27, 2023

We study the problem of composition learning for image retrieval, for which we learn to retrieve target images with search queries in the form of a composition of a reference image and a modification text that describes desired modifications of the image. Existing models of composition learning for image retrieval are generally built with large-scale datasets, demanding extensive training samples, i.e., query-target pairs, as supervision, which restricts their application for the scenario of few-shot learning with only few query-target pairs available. Recently, prompt tuning with frozen pretrained language models has shown remarkable performance when the amount of training data is limited. Inspired by this, we propose a prompt tuning mechanism with the pretrained CLIP model for the task of few-shot composition learning for image retrieval. Specifically, we regard the representation of the reference image as a trainable visual prompt, prefixed to the embedding of the text sequence. One challenge is to efficiently train visual prompt with few-shot samples. To deal with this issue, we further propose a self-supervised auxiliary task via ensuring that the reference image can retrieve itself when no modification information is given from the text, which facilitates training for the visual prompt, while not requiring additional annotations for query-target pairs. Experiments on multiple benchmarks show that our proposed model can yield superior performance when trained with only few query-target pairs.

Duke Scholars

Published In

Proceedings of the 37th Aaai Conference on Artificial Intelligence Aaai 2023

DOI

Publication Date

June 27, 2023

Volume

37

Start / End Page

4729 / 4737
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wu, J., Wang, R., Zhao, H., Zhang, R., Lu, C., Li, S., & Henao, R. (2023). Few-Shot Composition Learning for Image Retrieval with Prompt Tuning. Proceedings of the 37th Aaai Conference on Artificial Intelligence Aaai 2023, 37, 4729–4737. https://doi.org/10.1609/aaai.v37i4.25597
Wu, J., R. Wang, H. Zhao, R. Zhang, C. Lu, S. Li, and R. Henao. “Few-Shot Composition Learning for Image Retrieval with Prompt Tuning.” Proceedings of the 37th Aaai Conference on Artificial Intelligence Aaai 2023 37 (June 27, 2023): 4729–37. https://doi.org/10.1609/aaai.v37i4.25597.
Wu J, Wang R, Zhao H, Zhang R, Lu C, Li S, et al. Few-Shot Composition Learning for Image Retrieval with Prompt Tuning. Proceedings of the 37th Aaai Conference on Artificial Intelligence Aaai 2023. 2023 Jun 27;37:4729–37.
Wu, J., et al. “Few-Shot Composition Learning for Image Retrieval with Prompt Tuning.” Proceedings of the 37th Aaai Conference on Artificial Intelligence Aaai 2023, vol. 37, June 2023, pp. 4729–37. Scopus, doi:10.1609/aaai.v37i4.25597.
Wu J, Wang R, Zhao H, Zhang R, Lu C, Li S, Henao R. Few-Shot Composition Learning for Image Retrieval with Prompt Tuning. Proceedings of the 37th Aaai Conference on Artificial Intelligence Aaai 2023. 2023 Jun 27;37:4729–4737.

Published In

Proceedings of the 37th Aaai Conference on Artificial Intelligence Aaai 2023

DOI

Publication Date

June 27, 2023

Volume

37

Start / End Page

4729 / 4737