Scholars@Duke publication: A Hierarchical Vision Transformer Using Overlapping Patch and Self-Supervised Learning

A Hierarchical Vision Transformer Using Overlapping Patch and Self-Supervised Learning

Publication , Conference

Ma, Y; Li, M; Chang, J

Published in: Proceedings of the International Joint Conference on Neural Networks

January 1, 2023

Transformer-based network architectures have gradually replaced convolutional neural networks in computer vision. Compared with convolutional neural networks, Transformer is able to learn global information of images and has better feature extraction capability. However, due to the lack of inductive bias, vision Transformers require a large amount of data for pre-training, such as ViT. Local-based Transformers effectively reduce the computational complexity, but could not establish long-range dependencies and do not perform as well on small-scale datasets. In response to these problems, OPSe Transformer is proposed. A global attention calculation module is designed to be added behind each stage of the vision Transformer, using a slightly larger and overlapping key patch and value patch to enhance the exchange of information between two adjacent windows and to aggregate global information in the local Transformer. In addition, a self-supervised learning proxy task is added to the architecture, corresponding to the loss function of the proxy task to constrain the training of the model on the dataset, so that the vision Transformer can learn spatial information within an image and improve the training effect of the network. Comparative experiments are conducted on the tiny-ImageNet, CIFAR-10/100, and other datasets, and the experimental results show that compared with the baseline algorithm, our model improves the accuracy by up to 3.91%.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Proceedings of the International Joint Conference on Neural Networks

DOI

10.1109/IJCNN54540.2023.10191916

Publication Date

January 1, 2023

Volume

2023-June

Citation

APA

Chicago

ICMJE

MLA

NLM

Ma, Y., Li, M., & Chang, J. (2023). A Hierarchical Vision Transformer Using Overlapping Patch and Self-Supervised Learning. In Proceedings of the International Joint Conference on Neural Networks (Vol. 2023-June). https://doi.org/10.1109/IJCNN54540.2023.10191916

Ma, Y., M. Li, and J. Chang. “A Hierarchical Vision Transformer Using Overlapping Patch and Self-Supervised Learning.” In Proceedings of the International Joint Conference on Neural Networks, Vol. 2023-June, 2023. https://doi.org/10.1109/IJCNN54540.2023.10191916.

Ma Y, Li M, Chang J. A Hierarchical Vision Transformer Using Overlapping Patch and Self-Supervised Learning. In: Proceedings of the International Joint Conference on Neural Networks. 2023.

Ma, Y., et al. “A Hierarchical Vision Transformer Using Overlapping Patch and Self-Supervised Learning.” Proceedings of the International Joint Conference on Neural Networks, vol. 2023-June, 2023. Scopus, doi:10.1109/IJCNN54540.2023.10191916.

Ma Y, Li M, Chang J. A Hierarchical Vision Transformer Using Overlapping Patch and Self-Supervised Learning. Proceedings of the International Joint Conference on Neural Networks. 2023.

Published In

Proceedings of the International Joint Conference on Neural Networks

DOI

10.1109/IJCNN54540.2023.10191916

Publication Date

January 1, 2023

Volume

2023-June