IDEA: Image description enhanced CLIP-adapter for image classification
CLIP (Contrastive Language-Image Pre-training) has attained great success in pattern recognition and computer vision. Transferring CLIP to downstream tasks (e.g., zero- or few-shot classification) is a hot topic in multimodal learning. However, current studies focus on single-modality adaptation and fail to capture target-relevant fine-grained features. In this paper, we propose an Image Description Enhanced CLIP-Adapter (IDEA), a multimodal adapter that effectively boosts CLIP's performance on few-shot classification tasks. This adapter leverages the textual descriptions in the training set to enhance the model's ability to capture fine-grained features. Meanwhile, IDEA is a training-free method for CLIP, and it can be comparable to or even exceeds state-of-the-art models on multiple tasks. Furthermore, we introduce Trainable-IDEA (T-IDEA), which extends IDEA by adding two lightweight learnable components (i.e., a projector and a learnable latent space), further enhancing the model's performance and achieving SOTA results on 11 datasets. As one important contribution, we employ the LLaMA model and design a comprehensive pipeline to generate textual descriptions for images of 11 datasets, resulting in a total of 1,637,795 image-text pairs, named “IMD-11”. Our code and data are released at https://github.com/FourierAI/IDEA.
Duke Scholars
Published In
DOI
ISSN
Publication Date
Volume
Related Subject Headings
- Artificial Intelligence & Image Processing
- 4611 Machine learning
- 4605 Data management and data science
- 4603 Computer vision and multimedia computation
- 0906 Electrical and Electronic Engineering
- 0806 Information Systems
- 0801 Artificial Intelligence and Image Processing
Citation
Published In
DOI
ISSN
Publication Date
Volume
Related Subject Headings
- Artificial Intelligence & Image Processing
- 4611 Machine learning
- 4605 Data management and data science
- 4603 Computer vision and multimedia computation
- 0906 Electrical and Electronic Engineering
- 0806 Information Systems
- 0801 Artificial Intelligence and Image Processing