Skip to main content

Autoregressive Models in Vision: A Survey

Publication ,  Journal Article
Xiong, J; Liu, G; Huang, L; Wu, C; Wu, T; Mu, Y; Yao, Y; Shen, H; Wan, Z; Huang, J; Tao, C; Yan, S; Yao, H; Kong, L; Yang, H; Zhang, M ...
Published in: Transactions on Machine Learning Research
January 1, 2025

Autoregressive modeling has been a huge success in the field of natural language processing (NLP). Recently, autoregressive models have emerged as a significant area of focus in computer vision, where they excel in producing high-quality visual content. Autoregressive models in NLP typically operate on subword tokens. However, the representation strategy in computer vision can vary in different levels, i.e., pixel-level, token-level, or scale-level, reflecting the diverse and hierarchical nature of visual data compared to the sequential structure of language. This survey comprehensively exam-ines the literature on autoregressive models applied to vision. To improve readability for researchers from diverse research backgrounds, we start with preliminary sequence representation and modeling in vision. Next, we divide the fundamental frameworks of visual autoregressive models into three general sub-categories, including pixel-based, token-based, and scale-based models based on the representation strategy. We then explore the interconnections between autoregressive models and other generative mod-els. Furthermore, we present a multifaceted categorization of autoregressive models in computer vision, including image generation, video generation, 3D generation, and multimodal generation. We also elaborate on their applications in diverse domains, including emerging domains such as embodied AI and 3D medical AI, with about 250 related references. Finally, we highlight the current challenges to autoregres-sive models in vision with suggestions about potential research directions. We have also set up a Github repository to organize the papers included in this survey at: https://github.com/ChaofanTao/Autoregressive-Models-in-Vision-Survey.

Duke Scholars

Published In

Transactions on Machine Learning Research

EISSN

2835-8856

Publication Date

January 1, 2025

Volume

2025-March
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Xiong, J., Liu, G., Huang, L., Wu, C., Wu, T., Mu, Y., … Wong, N. (2025). Autoregressive Models in Vision: A Survey. Transactions on Machine Learning Research, 2025-March.
Xiong, J., G. Liu, L. Huang, C. Wu, T. Wu, Y. Mu, Y. Yao, et al. “Autoregressive Models in Vision: A Survey.” Transactions on Machine Learning Research 2025-March (January 1, 2025).
Xiong J, Liu G, Huang L, Wu C, Wu T, Mu Y, et al. Autoregressive Models in Vision: A Survey. Transactions on Machine Learning Research. 2025 Jan 1;2025-March.
Xiong, J., et al. “Autoregressive Models in Vision: A Survey.” Transactions on Machine Learning Research, vol. 2025-March, Jan. 2025.
Xiong J, Liu G, Huang L, Wu C, Wu T, Mu Y, Yao Y, Shen H, Wan Z, Huang J, Tao C, Yan S, Yao H, Kong L, Yang H, Zhang M, Sapiro G, Luo J, Luo P, Wong N. Autoregressive Models in Vision: A Survey. Transactions on Machine Learning Research. 2025 Jan 1;2025-March.

Published In

Transactions on Machine Learning Research

EISSN

2835-8856

Publication Date

January 1, 2025

Volume

2025-March