Skip to main content
Journal cover image

Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation

Publication ,  Journal Article
Tan, Z; Yang, X; Ye, Z; Wang, Q; Yan, Y; Nguyen, A; Huang, K
Published in: Pattern Recognition
December 1, 2023

Generating high-quality images from text remains a challenge in visual-language understanding, with text-image consistency being a major concern. Particularly, the most popular metric R-precision may not accurately reflect the text-image consistency, leading to misleading semantics in generated images. Albeit its significance, designing a better text-image consistency metric surprisingly remains under-explored in the community. In this paper, we make a further step forward to develop a novel CLIP-based metric, Semantic Similarity Distance (SSD), which is both theoretically founded from a distributional viewpoint and empirically verified on benchmark datasets. We also introduce Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN), which use two novel components to mitigate inconsistent semantics and bridge the text-image semantic gap. A series of experiments indicate that, under the guidance of SSD, our developed PDF-GAN can induce remarkable enhancements in the consistency between texts and images while preserving acceptable image quality over the CUB and COCO datasets.

Duke Scholars

Published In

Pattern Recognition

DOI

ISSN

0031-3203

Publication Date

December 1, 2023

Volume

144

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 4611 Machine learning
  • 4605 Data management and data science
  • 4603 Computer vision and multimedia computation
  • 0906 Electrical and Electronic Engineering
  • 0806 Information Systems
  • 0801 Artificial Intelligence and Image Processing
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Tan, Z., Yang, X., Ye, Z., Wang, Q., Yan, Y., Nguyen, A., & Huang, K. (2023). Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation. Pattern Recognition, 144. https://doi.org/10.1016/j.patcog.2023.109883
Tan, Z., X. Yang, Z. Ye, Q. Wang, Y. Yan, A. Nguyen, and K. Huang. “Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation.” Pattern Recognition 144 (December 1, 2023). https://doi.org/10.1016/j.patcog.2023.109883.
Tan Z, Yang X, Ye Z, Wang Q, Yan Y, Nguyen A, et al. Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation. Pattern Recognition. 2023 Dec 1;144.
Tan, Z., et al. “Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation.” Pattern Recognition, vol. 144, Dec. 2023. Scopus, doi:10.1016/j.patcog.2023.109883.
Tan Z, Yang X, Ye Z, Wang Q, Yan Y, Nguyen A, Huang K. Semantic Similarity Distance: Towards better text-image consistency metric in text-to-image generation. Pattern Recognition. 2023 Dec 1;144.
Journal cover image

Published In

Pattern Recognition

DOI

ISSN

0031-3203

Publication Date

December 1, 2023

Volume

144

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 4611 Machine learning
  • 4605 Data management and data science
  • 4603 Computer vision and multimedia computation
  • 0906 Electrical and Electronic Engineering
  • 0806 Information Systems
  • 0801 Artificial Intelligence and Image Processing