Skip to main content

The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models

Publication ,  Conference
Wu, C; Li, LE; Ermon, S; Haffner, P; Ge, R; Zhang, Z
Published in: Proceedings of Machine Learning Research
January 1, 2023

Compositionality is a common property in many modalities including text and images, but the compositional generalization of multi-modal models is not well-understood. In this paper, we identify two sources of visual-linguistic compositionality: linguistic priors and the interplay between images and texts. We show that current attempts to improve compositional generalization rely on linguistic priors rather than on information in the image, as the strength of the language model in detecting sentences that are syntactically and semantically likely overwhelms the vision part of the model. We find in particular that a benchmark for compositionality mostly favors pure language models. Finally, we propose a new benchmark for compositionality without such linguistic priors.

Duke Scholars

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2023

Volume

239

Start / End Page

118 / 126
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wu, C., Li, L. E., Ermon, S., Haffner, P., Ge, R., & Zhang, Z. (2023). The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models. In Proceedings of Machine Learning Research (Vol. 239, pp. 118–126).
Wu, C., L. E. Li, S. Ermon, P. Haffner, R. Ge, and Z. Zhang. “The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models.” In Proceedings of Machine Learning Research, 239:118–26, 2023.
Wu C, Li LE, Ermon S, Haffner P, Ge R, Zhang Z. The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models. In: Proceedings of Machine Learning Research. 2023. p. 118–26.
Wu, C., et al. “The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models.” Proceedings of Machine Learning Research, vol. 239, 2023, pp. 118–26.
Wu C, Li LE, Ermon S, Haffner P, Ge R, Zhang Z. The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models. Proceedings of Machine Learning Research. 2023. p. 118–126.

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2023

Volume

239

Start / End Page

118 / 126