Skip to main content

SumCSE: Summary as a transformation for Contrastive Learning

Publication ,  Conference
Thirukovalluru, R; Wang, X; Chen, J; Li, S; Lei, J; Jin, R; Dhingra, B
Published in: Findings of the Association for Computational Linguistics: NAACL 2024 - Findings
January 1, 2024

Sentence embedding models are typically trained using contrastive learning (CL), either using human annotations directly or by repurposing other annotated datasets. In this work, we explore the recently introduced paradigm of generating CL data using generative language models (LM). In CL for computer vision (CV), compositional transformations (series of operations applied over an image. e.g. cropping + color distortion) which modify the input/image to retain minimal information were shown to be very effective. We show that composition of a 'Summary' transformation with diverse paraphrasing/contradicting transformations accomplishes the same and works very well in CL for sentence embeddings. Our final generated dataset (using Vicuna-13B) significantly outperforms the previous best unsupervised method (using ChatGPT) by 1.8 points, and SimCSE, a strong supervised baseline by 0.3 points on the semantic text similarity (STS) benchmark.

Duke Scholars

Published In

Findings of the Association for Computational Linguistics: NAACL 2024 - Findings

Publication Date

January 1, 2024

Start / End Page

3577 / 3588
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Thirukovalluru, R., Wang, X., Chen, J., Li, S., Lei, J., Jin, R., & Dhingra, B. (2024). SumCSE: Summary as a transformation for Contrastive Learning. In Findings of the Association for Computational Linguistics: NAACL 2024 - Findings (pp. 3577–3588).
Thirukovalluru, R., X. Wang, J. Chen, S. Li, J. Lei, R. Jin, and B. Dhingra. “SumCSE: Summary as a transformation for Contrastive Learning.” In Findings of the Association for Computational Linguistics: NAACL 2024 - Findings, 3577–88, 2024.
Thirukovalluru R, Wang X, Chen J, Li S, Lei J, Jin R, et al. SumCSE: Summary as a transformation for Contrastive Learning. In: Findings of the Association for Computational Linguistics: NAACL 2024 - Findings. 2024. p. 3577–88.
Thirukovalluru, R., et al. “SumCSE: Summary as a transformation for Contrastive Learning.” Findings of the Association for Computational Linguistics: NAACL 2024 - Findings, 2024, pp. 3577–88.
Thirukovalluru R, Wang X, Chen J, Li S, Lei J, Jin R, Dhingra B. SumCSE: Summary as a transformation for Contrastive Learning. Findings of the Association for Computational Linguistics: NAACL 2024 - Findings. 2024. p. 3577–3588.

Published In

Findings of the Association for Computational Linguistics: NAACL 2024 - Findings

Publication Date

January 1, 2024

Start / End Page

3577 / 3588