Skip to main content

ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans

Publication ,  Conference
Shahbazi, A; Akbari, E; Salehi, D; Liu, X; Naderializadeh, N; Kolouri, S
Published in: Proceedings of Machine Learning Research
January 1, 2025

While self-attention has been instrumental in the success of Transformers, it can lead to overconcentration on a few tokens during training, resulting in suboptimal information flow. Enforcing doubly-stochastic constraints in attention matrices has been shown to improve structure and balance in attention distributions. However, existing methods rely on iterative Sinkhorn normalization, which is computationally costly. In this paper, we introduce a novel, fully parallelizable doublystochastic attention mechanism based on sliced optimal transport, leveraging Expected Sliced Transport Plans (ESP). Unlike prior approaches, our method enforces doubly stochasticity without iterative Sinkhorn normalization, significantly enhancing efficiency. To ensure differentiability, we incorporate a temperature-based soft sorting technique, enabling seamless integration into deep learning models. Experiments across multiple benchmark datasets, including image classification, point cloud classification, sentiment analysis, and neural machine translation, demonstrate that our enhanced attention regularization consistently improves performance across diverse applications. Our implementation code can be found at https://github.com/dariansal/ESPFormer.

Duke Scholars

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2025

Volume

267

Start / End Page

54186 / 54202
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Shahbazi, A., Akbari, E., Salehi, D., Liu, X., Naderializadeh, N., & Kolouri, S. (2025). ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans. In Proceedings of Machine Learning Research (Vol. 267, pp. 54186–54202).
Shahbazi, A., E. Akbari, D. Salehi, X. Liu, N. Naderializadeh, and S. Kolouri. “ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans.” In Proceedings of Machine Learning Research, 267:54186–202, 2025.
Shahbazi A, Akbari E, Salehi D, Liu X, Naderializadeh N, Kolouri S. ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans. In: Proceedings of Machine Learning Research. 2025. p. 54186–202.
Shahbazi, A., et al. “ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans.” Proceedings of Machine Learning Research, vol. 267, 2025, pp. 54186–202.
Shahbazi A, Akbari E, Salehi D, Liu X, Naderializadeh N, Kolouri S. ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans. Proceedings of Machine Learning Research. 2025. p. 54186–54202.

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2025

Volume

267

Start / End Page

54186 / 54202