Skip to main content

Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning

Publication ,  Journal Article
Ye, Z; Hu, F; Lyu, F; Li, L; Huang, K
Published in: IEEE Transactions on Multimedia
January 1, 2022

Using generative models to synthesize visual features from semantic distribution is one of the most popular solutions to ZSL image classification in recent years. The triplet loss (TL) is popularly used to generate realistic visual distributions from semantics by automatically searching discriminative representations. However, the traditional TL cannot search reliable unseen disentangled representations due to the unavailability of unseen classes in ZSL. To alleviate this drawback, we propose in this work a multi-modal triplet loss (MMTL) which utilizes multi-modal information to search a disentangled representation space. As such, all classes can interplay which can benefit learning disentangled class representations in the searched space. Furthermore, we develop a novel model called Disentangling Class Representation Generative Adversarial Network (DCR-GAN) focusing on exploiting the disentangled representations in training, feature synthesis, and final recognition stages. Benefiting from the disentangled representations, DCR-GAN could fit a more realistic distribution over both seen and unseen features. Extensive experiments show that our proposed model can lead to superior performance to the state-of-the-arts on four benchmark datasets.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

IEEE Transactions on Multimedia

DOI

EISSN

1941-0077

ISSN

1520-9210

Publication Date

January 1, 2022

Volume

24

Start / End Page

2828 / 2840

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences
  • 40 Engineering
  • 09 Engineering
  • 08 Information and Computing Sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Ye, Z., Hu, F., Lyu, F., Li, L., & Huang, K. (2022). Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning. IEEE Transactions on Multimedia, 24, 2828–2840. https://doi.org/10.1109/TMM.2021.3089017
Ye, Z., F. Hu, F. Lyu, L. Li, and K. Huang. “Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning.” IEEE Transactions on Multimedia 24 (January 1, 2022): 2828–40. https://doi.org/10.1109/TMM.2021.3089017.
Ye Z, Hu F, Lyu F, Li L, Huang K. Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning. IEEE Transactions on Multimedia. 2022 Jan 1;24:2828–40.
Ye, Z., et al. “Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning.” IEEE Transactions on Multimedia, vol. 24, Jan. 2022, pp. 2828–40. Scopus, doi:10.1109/TMM.2021.3089017.
Ye Z, Hu F, Lyu F, Li L, Huang K. Disentangling Semantic-to-Visual Confusion for Zero-Shot Learning. IEEE Transactions on Multimedia. 2022 Jan 1;24:2828–2840.

Published In

IEEE Transactions on Multimedia

DOI

EISSN

1941-0077

ISSN

1520-9210

Publication Date

January 1, 2022

Volume

24

Start / End Page

2828 / 2840

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences
  • 40 Engineering
  • 09 Engineering
  • 08 Information and Computing Sciences