Scholars@Duke publication: Connecting Pre-trained Language Models and Downstream Tasks via Properties of Representations

Connecting Pre-trained Language Models and Downstream Tasks via Properties of Representations

Publication , Conference

Wu, C; Lee, H; Ge, R

Published in: Advances in Neural Information Processing Systems

January 1, 2023

Recently, researchers have found that representations learned by large-scale pretrained language models are useful in various downstream tasks. However, there is little theoretical understanding of how pre-training performance is related to downstream task performance. In this paper, we analyze how this performance transfer depends on the properties of the downstream task and the structure of the representations. We consider a log-linear model where a word can be predicted from its context through a network having softmax as its last layer. We show that even if the downstream task is highly structured and depends on a simple function of the hidden representation, there are still cases when a low pre-training loss cannot guarantee good performance on the downstream task. On the other hand, we propose and empirically validate the existence of an “anchor vector” in the representation space, and show that this assumption, together with properties of the downstream task, guarantees performance transfer.

Duke Scholars

Author Rong Ge Computer Science

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2023

Volume

Related Subject Headings

4611 Machine learning
1702 Cognitive Sciences
1701 Psychology

Citation

APA

Chicago

ICMJE

MLA

NLM

Wu, C., Lee, H., & Ge, R. (2023). Connecting Pre-trained Language Models and Downstream Tasks via Properties of Representations. In Advances in Neural Information Processing Systems (Vol. 36).

Wu, C., H. Lee, and R. Ge. “Connecting Pre-trained Language Models and Downstream Tasks via Properties of Representations.” In Advances in Neural Information Processing Systems, Vol. 36, 2023.

Wu C, Lee H, Ge R. Connecting Pre-trained Language Models and Downstream Tasks via Properties of Representations. In: Advances in Neural Information Processing Systems. 2023.

Wu, C., et al. “Connecting Pre-trained Language Models and Downstream Tasks via Properties of Representations.” Advances in Neural Information Processing Systems, vol. 36, 2023.

Wu C, Lee H, Ge R. Connecting Pre-trained Language Models and Downstream Tasks via Properties of Representations. Advances in Neural Information Processing Systems. 2023.

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2023

Volume

Related Subject Headings

4611 Machine learning
1702 Cognitive Sciences
1701 Psychology