Skip to main content

A Segment-Based Layout Aware Model for Information Extraction on Document Images

Publication ,  Conference
Ning, M; Wang, QF; Huang, K; Huang, X
Published in: Communications in Computer and Information Science
January 1, 2021

Information extraction (IE) on document images has attracted considerable attention recently due to its great potentials for intelligent document analysis, where visual layout information is vital. However, most existing works mainly consider visual layout information at the token level, which unfortunately ignore long contexts and require time-consuming annotation. In this paper, we propose to model document visual layout information at the segment level. First, we obtain segment representation by integrating the segment-level layout information and text embedding. Since only segment-level layout annotation is needed, our model enjoys a low cost in comparison with the full annotation as needed at the token level. Then, word vectors are also extracted from each text segment to get the fine-grained representation. Finally, both segment and word vectors are fused for obtaining prediction results. Extensive experiments on the benchmark datasets are conducted to demonstrate the effectiveness of our novel method.

Duke Scholars

Published In

Communications in Computer and Information Science

DOI

EISSN

1865-0937

ISSN

1865-0929

Publication Date

January 1, 2021

Volume

1516 CCIS

Start / End Page

757 / 765
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Ning, M., Wang, Q. F., Huang, K., & Huang, X. (2021). A Segment-Based Layout Aware Model for Information Extraction on Document Images. In Communications in Computer and Information Science (Vol. 1516 CCIS, pp. 757–765). https://doi.org/10.1007/978-3-030-92307-5_88
Ning, M., Q. F. Wang, K. Huang, and X. Huang. “A Segment-Based Layout Aware Model for Information Extraction on Document Images.” In Communications in Computer and Information Science, 1516 CCIS:757–65, 2021. https://doi.org/10.1007/978-3-030-92307-5_88.
Ning M, Wang QF, Huang K, Huang X. A Segment-Based Layout Aware Model for Information Extraction on Document Images. In: Communications in Computer and Information Science. 2021. p. 757–65.
Ning, M., et al. “A Segment-Based Layout Aware Model for Information Extraction on Document Images.” Communications in Computer and Information Science, vol. 1516 CCIS, 2021, pp. 757–65. Scopus, doi:10.1007/978-3-030-92307-5_88.
Ning M, Wang QF, Huang K, Huang X. A Segment-Based Layout Aware Model for Information Extraction on Document Images. Communications in Computer and Information Science. 2021. p. 757–765.

Published In

Communications in Computer and Information Science

DOI

EISSN

1865-0937

ISSN

1865-0929

Publication Date

January 1, 2021

Volume

1516 CCIS

Start / End Page

757 / 765