Improving image caption performance with linguistic context
Image caption aims to generate a description of an image by using techniques of computer vision and natural language processing, where the framework of Convolutional Neural Networks (CNN) followed by Recurrent Neural Networks (RNN) or particularly LSTM, is widely used. In recent years, the attention-based CNN-LSTM networks attain the significant progress due to their ability of modelling global context. However, CNN-LSTMs do not consider the linguistic context explicitly, which is very useful in further boosting the performance. To overcome this issue, we proposed a method that integrate a n-gram model in the attention-based image caption framework, managing to model the word transition probability in the decoding process for enhancing the linguistic context of translation results. We evaluated the performance of BLEU on the benchmark dataset of MSCOCO 2014. Experimental results show the effectiveness of the proposed method. Specifically, the performance of BLEU-1, BLEU-2, BLEU-3 BLEU-4, and METEOR is improved by 0.2%, 0.7%, 0.6%, 0.5%, and 0.1, respectively.
Duke Scholars
DOI
Publication Date
Volume
Start / End Page
Related Subject Headings
- Artificial Intelligence & Image Processing
- 46 Information and computing sciences
Citation
DOI
Publication Date
Volume
Start / End Page
Related Subject Headings
- Artificial Intelligence & Image Processing
- 46 Information and computing sciences