A Segment-Based Layout Aware Model for Information Extraction on Document Images
Information extraction (IE) on document images has attracted considerable attention recently due to its great potentials for intelligent document analysis, where visual layout information is vital. However, most existing works mainly consider visual layout information at the token level, which unfortunately ignore long contexts and require time-consuming annotation. In this paper, we propose to model document visual layout information at the segment level. First, we obtain segment representation by integrating the segment-level layout information and text embedding. Since only segment-level layout annotation is needed, our model enjoys a low cost in comparison with the full annotation as needed at the token level. Then, word vectors are also extracted from each text segment to get the fine-grained representation. Finally, both segment and word vectors are fused for obtaining prediction results. Extensive experiments on the benchmark datasets are conducted to demonstrate the effectiveness of our novel method.