Skip to main content

Linear Transformers are Versatile In-Context Learners

Publication ,  Conference
Vladymyrov, M; von Oswald, J; Sandler, M; Ge, R
Published in: Advances in Neural Information Processing Systems
January 1, 2024

Recent research has demonstrated that transformers, particularly linear attention models, implicitly execute gradient-descent-like algorithms on data provided in-context during their forward inference step. However, their capability in handling more complex problems remains unexplored. In this paper, we prove that each layer of a linear transformer maintains a weight vector for an implicit linear regression problem and can be interpreted as performing a variant of preconditioned gradient descent. We also investigate the use of linear transformers in a challenging scenario where the training data is corrupted with different levels of noise. Remarkably, we demonstrate that for this problem linear transformers discover an intricate and highly effective optimization algorithm, surpassing or matching in performance many reasonable baselines. We analyze this algorithm and show that it is a novel approach incorporating momentum and adaptive rescaling based on noise levels. Our findings show that even linear transformers possess the surprising ability to discover sophisticated optimization strategies.

Duke Scholars

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2024

Volume

37

Related Subject Headings

  • 4611 Machine learning
  • 1702 Cognitive Sciences
  • 1701 Psychology
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Vladymyrov, M., von Oswald, J., Sandler, M., & Ge, R. (2024). Linear Transformers are Versatile In-Context Learners. In Advances in Neural Information Processing Systems (Vol. 37).
Vladymyrov, M., J. von Oswald, M. Sandler, and R. Ge. “Linear Transformers are Versatile In-Context Learners.” In Advances in Neural Information Processing Systems, Vol. 37, 2024.
Vladymyrov M, von Oswald J, Sandler M, Ge R. Linear Transformers are Versatile In-Context Learners. In: Advances in Neural Information Processing Systems. 2024.
Vladymyrov, M., et al. “Linear Transformers are Versatile In-Context Learners.” Advances in Neural Information Processing Systems, vol. 37, 2024.
Vladymyrov M, von Oswald J, Sandler M, Ge R. Linear Transformers are Versatile In-Context Learners. Advances in Neural Information Processing Systems. 2024.

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2024

Volume

37

Related Subject Headings

  • 4611 Machine learning
  • 1702 Cognitive Sciences
  • 1701 Psychology