Skip to main content

Scene Text Recognition via Dual-path Network with Shape-driven Attention Alignment

Publication ,  Journal Article
Hu, Y; Dong, B; Huang, K; Ding, L; Wang, W; Huang, X; Wang, QF
Published in: ACM Transactions on Multimedia Computing, Communications and Applications
January 11, 2024

Scene text recognition (STR), one typical sequence-to-sequence problem, has drawn much attention recently in multimedia applications. To guarantee good performance, it is essential for STR to obtain aligned character-wise features from the whole-image feature maps. While most present works adopt fully data-driven attention-based alignment, such practice ignores specific character geometric information. In this article, built upon a group of learnable geometric points, we propose a novel shape-driven attention alignment method that is able to obtain character-wise features. Concretely, we first design a corner detector to generate a shape map to guide the attention alignments explicitly, where a series of points can be learned to represent character-wise features flexibly. We then propose a dual-path network with a mutual learning and cooperating strategy that successfully combines CNN with a ViT-based model, leading to further accuracy improvement. We conduct extensive experiments to evaluate the proposed method on various scene text benchmarks, including six popular regular and irregular datasets, two more challenging datasets (i.e., WordArt and OST), and three Chinese datasets. Experimental results indicate that our method can achieve superior performance with a comparable model size against many state-of-the-art models.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

ACM Transactions on Multimedia Computing, Communications and Applications

DOI

EISSN

1551-6865

ISSN

1551-6857

Publication Date

January 11, 2024

Volume

20

Issue

4

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 4607 Graphics, augmented reality and games
  • 4606 Distributed computing and systems software
  • 4603 Computer vision and multimedia computation
  • 0806 Information Systems
  • 0805 Distributed Computing
  • 0803 Computer Software
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Hu, Y., Dong, B., Huang, K., Ding, L., Wang, W., Huang, X., & Wang, Q. F. (2024). Scene Text Recognition via Dual-path Network with Shape-driven Attention Alignment. ACM Transactions on Multimedia Computing, Communications and Applications, 20(4). https://doi.org/10.1145/3633517
Hu, Y., B. Dong, K. Huang, L. Ding, W. Wang, X. Huang, and Q. F. Wang. “Scene Text Recognition via Dual-path Network with Shape-driven Attention Alignment.” ACM Transactions on Multimedia Computing, Communications and Applications 20, no. 4 (January 11, 2024). https://doi.org/10.1145/3633517.
Hu Y, Dong B, Huang K, Ding L, Wang W, Huang X, et al. Scene Text Recognition via Dual-path Network with Shape-driven Attention Alignment. ACM Transactions on Multimedia Computing, Communications and Applications. 2024 Jan 11;20(4).
Hu, Y., et al. “Scene Text Recognition via Dual-path Network with Shape-driven Attention Alignment.” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 20, no. 4, Jan. 2024. Scopus, doi:10.1145/3633517.
Hu Y, Dong B, Huang K, Ding L, Wang W, Huang X, Wang QF. Scene Text Recognition via Dual-path Network with Shape-driven Attention Alignment. ACM Transactions on Multimedia Computing, Communications and Applications. 2024 Jan 11;20(4).

Published In

ACM Transactions on Multimedia Computing, Communications and Applications

DOI

EISSN

1551-6865

ISSN

1551-6857

Publication Date

January 11, 2024

Volume

20

Issue

4

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 4607 Graphics, augmented reality and games
  • 4606 Distributed computing and systems software
  • 4603 Computer vision and multimedia computation
  • 0806 Information Systems
  • 0805 Distributed Computing
  • 0803 Computer Software