DKU-Tencent submission to oriental language recognition AP18-OLR challenge
In this paper, we describe our submitted DKU- Tencent system for the oriental language recognition AP18- OLR Challenge. Our system pipeline consists of three main components, including data augmentation, frame-level feature extraction, and utterance-level modeling. First, we perform speed perturbation to increase the diversity and amount of training data. Second, we extract several kinds of frame-level features, including the hand-crafted acoustic features as well as the deep phonetic features. Third, we aggregate the frame-level features into fixed-dimensional utterance-level representation through i- vector and x-vector modelings. We also propose a deep residual network to obtain the utterance-level language posteriors in an end-to-end manner. Our submitted primary system achieves Cavg of 0.0499, 0.0146, and 0.0135 for the corresponding short- utterance, confusing language and open-set tasks on the evaluation set.