Skip to main content
Journal cover image

IDEA: Image description enhanced CLIP-adapter for image classification

Publication ,  Journal Article
Ye, Z; Jiang, F; Wang, Q; Huang, K; Huang, J
Published in: Pattern Recognition
March 1, 2026

CLIP (Contrastive Language-Image Pre-training) has attained great success in pattern recognition and computer vision. Transferring CLIP to downstream tasks (e.g., zero- or few-shot classification) is a hot topic in multimodal learning. However, current studies focus on single-modality adaptation and fail to capture target-relevant fine-grained features. In this paper, we propose an Image Description Enhanced CLIP-Adapter (IDEA), a multimodal adapter that effectively boosts CLIP's performance on few-shot classification tasks. This adapter leverages the textual descriptions in the training set to enhance the model's ability to capture fine-grained features. Meanwhile, IDEA is a training-free method for CLIP, and it can be comparable to or even exceeds state-of-the-art models on multiple tasks. Furthermore, we introduce Trainable-IDEA (T-IDEA), which extends IDEA by adding two lightweight learnable components (i.e., a projector and a learnable latent space), further enhancing the model's performance and achieving SOTA results on 11 datasets. As one important contribution, we employ the LLaMA model and design a comprehensive pipeline to generate textual descriptions for images of 11 datasets, resulting in a total of 1,637,795 image-text pairs, named “IMD-11”. Our code and data are released at https://github.com/FourierAI/IDEA.

Duke Scholars

Published In

Pattern Recognition

DOI

ISSN

0031-3203

Publication Date

March 1, 2026

Volume

171

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 4611 Machine learning
  • 4605 Data management and data science
  • 4603 Computer vision and multimedia computation
  • 0906 Electrical and Electronic Engineering
  • 0806 Information Systems
  • 0801 Artificial Intelligence and Image Processing
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Ye, Z., Jiang, F., Wang, Q., Huang, K., & Huang, J. (2026). IDEA: Image description enhanced CLIP-adapter for image classification (Accepted). Pattern Recognition, 171. https://doi.org/10.1016/j.patcog.2025.112224
Ye, Z., F. Jiang, Q. Wang, K. Huang, and J. Huang. “IDEA: Image description enhanced CLIP-adapter for image classification (Accepted).” Pattern Recognition 171 (March 1, 2026). https://doi.org/10.1016/j.patcog.2025.112224.
Ye Z, Jiang F, Wang Q, Huang K, Huang J. IDEA: Image description enhanced CLIP-adapter for image classification (Accepted). Pattern Recognition. 2026 Mar 1;171.
Ye, Z., et al. “IDEA: Image description enhanced CLIP-adapter for image classification (Accepted).” Pattern Recognition, vol. 171, Mar. 2026. Scopus, doi:10.1016/j.patcog.2025.112224.
Ye Z, Jiang F, Wang Q, Huang K, Huang J. IDEA: Image description enhanced CLIP-adapter for image classification (Accepted). Pattern Recognition. 2026 Mar 1;171.
Journal cover image

Published In

Pattern Recognition

DOI

ISSN

0031-3203

Publication Date

March 1, 2026

Volume

171

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 4611 Machine learning
  • 4605 Data management and data science
  • 4603 Computer vision and multimedia computation
  • 0906 Electrical and Electronic Engineering
  • 0806 Information Systems
  • 0801 Artificial Intelligence and Image Processing