Automatic emotional spoken language text corpus construction from written dialogs in fictions
In this paper, we propose a novel method to automatically construct emotional spoken language text corpus from written dialogs, and release a large scale Chinese emotional text dataset with short conversations extracted from thousands of fictions using the proposed method. The emotional spoken language transcript resources in Chinese are relatively limited. However, constructing a large scale supervised corpus manually is neither efficient nor low-cost. This motivates us to try alternative efficient and effective approaches. First, we build a small scale emotion dictionary manually instead of a large scale corpus. Each word in dictionary has an emotion tag. Then, we use the emotional words to search emotional dialogs heuristically in fictions and classify them automatically. Second, we share our work to boost the performance of emotion recognition on spoken languages using the proposed new database. The labeled dialogs can be used for supervised learning while the unlabeled ones provide better word embeddings for the semantic level emotion recognition. We use the dialogs corpus as an auxiliary dataset in speech emotion recognition. We carry out experiments on automatic speech recognition (ASR) generated texts from the speech signals in Chinese Natural Emotional Audio-Visual Database (CHEAVD). It is an eight emotion states recognition task. We obtain a baseline average macro precision (MAP) of 37.08% and accuracy of 31.13% in terms of text-based method. With the labeled dialogs to pre-train neural networks and over-sampling the minority classes, we achieve an optimized MAP of 47.50% and the accuracy of 43.91%, which outperforms the baseline by 10.42% and 12.78% respectively.