Skip to main content

Improved Conditional Generative Adversarial Net Classification for Spoken Language Recognition

Publication ,  Conference
Miao, X; McLoughlin, I; Yao, S; Yan, Y
Published in: 2018 IEEE Spoken Language Technology Workshop Slt 2018 Proceedings
July 2, 2018

Recent research on generative adversarial nets (GAN) for language identification (LID) has shown promising results. In this paper, we further exploit the latent abilities of GAN networks to firstly combine them with deep neural network (DNN)-based i-vector approaches and then to improve the LID model using conditional generative adversarial net (cGAN) classification. First, phoneme dependent deep bottleneck features (DBF) combined with output posteriors of a pre-trained DNN for automatic speech recognition (ASR) are used to extract i-vectors in the normal way. These i-vectors are then classified using cGAN, and we show an effective method within the cGAN to optimize parameters by combining both language identification and verification signals as supervision. Results show firstly that cGAN methods can significantly outperform DBF DNN i-vector methods where 49-dimensional i-vectors are used, but not where 600-dimensional vectors are used. Secondly, training a cGAN discriminator network for direct classification has further benefit for low dimensional i-vectors as well as short utterances with high dimensional i-vectors. However, incorporating a dedicated discriminator network output layer for classification and optimizing both classification and verification loss brings benefits in all test cases.

Duke Scholars

Published In

2018 IEEE Spoken Language Technology Workshop Slt 2018 Proceedings

DOI

Publication Date

July 2, 2018

Start / End Page

98 / 104
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Miao, X., McLoughlin, I., Yao, S., & Yan, Y. (2018). Improved Conditional Generative Adversarial Net Classification for Spoken Language Recognition. In 2018 IEEE Spoken Language Technology Workshop Slt 2018 Proceedings (pp. 98–104). https://doi.org/10.1109/SLT.2018.8639522
Miao, X., I. McLoughlin, S. Yao, and Y. Yan. “Improved Conditional Generative Adversarial Net Classification for Spoken Language Recognition.” In 2018 IEEE Spoken Language Technology Workshop Slt 2018 Proceedings, 98–104, 2018. https://doi.org/10.1109/SLT.2018.8639522.
Miao X, McLoughlin I, Yao S, Yan Y. Improved Conditional Generative Adversarial Net Classification for Spoken Language Recognition. In: 2018 IEEE Spoken Language Technology Workshop Slt 2018 Proceedings. 2018. p. 98–104.
Miao, X., et al. “Improved Conditional Generative Adversarial Net Classification for Spoken Language Recognition.” 2018 IEEE Spoken Language Technology Workshop Slt 2018 Proceedings, 2018, pp. 98–104. Scopus, doi:10.1109/SLT.2018.8639522.
Miao X, McLoughlin I, Yao S, Yan Y. Improved Conditional Generative Adversarial Net Classification for Spoken Language Recognition. 2018 IEEE Spoken Language Technology Workshop Slt 2018 Proceedings. 2018. p. 98–104.

Published In

2018 IEEE Spoken Language Technology Workshop Slt 2018 Proceedings

DOI

Publication Date

July 2, 2018

Start / End Page

98 / 104