Skip to main content

Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition

Publication ,  Conference
Cai, D; Cai, Z; Li, M
Published in: 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings
July 2, 2018

Lexical content variability in different utterances is the key challenge for text-independent speaker verification. In this paper, we investigate using supervector which has ability to reduce the impact of lexical content mismatch among different utterances for supervised speaker embedding learning. A DNN acoustic model is used to align a feature sequence to a set of senones and generate centered and normalized first order statistics supervector. Statistics vectors from similar senones are placed together and reshaped to an image to maintain the local continuity and correlation. The supervector image is then fed into residual convolutional neural network. The deep speaker embedding features are the outputs of the last hidden layer of the network and we employ a PLDA back-end for the subsequent modeling. Experimental results show that the proposed method outperforms the conventional GMM-UBM i-vector system and is complementary to the DNN-UBM i-vector system. The score level fusion system achieves 1.26% ERR and 0.260 DCF10 cost on the NIST SRE 10 extended core condition 5 task.

Duke Scholars

Published In

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings

DOI

Publication Date

July 2, 2018

Start / End Page

1478 / 1482
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cai, D., Cai, Z., & Li, M. (2018). Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition. In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings (pp. 1478–1482). https://doi.org/10.23919/APSIPA.2018.8659595
Cai, D., Z. Cai, and M. Li. “Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition.” In 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings, 1478–82, 2018. https://doi.org/10.23919/APSIPA.2018.8659595.
Cai D, Cai Z, Li M. Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition. In: 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings. 2018. p. 1478–82.
Cai, D., et al. “Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition.” 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings, 2018, pp. 1478–82. Scopus, doi:10.23919/APSIPA.2018.8659595.
Cai D, Cai Z, Li M. Deep Speaker Embeddings with Convolutional Neural Network on Supervector for Text-Independent Speaker Recognition. 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings. 2018. p. 1478–1482.

Published In

2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2018 - Proceedings

DOI

Publication Date

July 2, 2018

Start / End Page

1478 / 1482