Skip to main content

A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data

Publication ,  Conference
Cai, W; Li, M
Published in: 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
January 1, 2021

This paper proposes a unified deep speaker em-bedding framework for modeling speech data with different sampling rates. Considering the narrowband spectrogram as a sub-image of the wideband spectrogram, we tackle the joint modeling problem of the mixed-bandwidth data in an image classification manner. From this perspective, we elaborate sev-eral mixed-bandwidth joint training strategies under different training and test data scenarios. The proposed systems are able to flexibly handle the mixed-bandwidth speech data in a single speaker embedding model without any additional downsampling, upsampling, bandwidth extension, or padding operations. We conduct extensive experimental studies on the VoxCelebl dataset. Furthermore, the effectiveness of the proposed approach is validated by the SITW and NIST SRE 2016 datasets.

Duke Scholars

Published In

2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

Publication Date

January 1, 2021

Start / End Page

1133 / 1138
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cai, W., & Li, M. (2021). A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data. In 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings (pp. 1133–1138).
Cai, W., and M. Li. “A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data.” In 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings, 1133–38, 2021.
Cai W, Li M. A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data. In: 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings. 2021. p. 1133–8.
Cai, W., and M. Li. “A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data.” 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings, 2021, pp. 1133–38.
Cai W, Li M. A Unified Deep Speaker Embedding Framework for Mixed-Bandwidth Speech Data. 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings. 2021. p. 1133–1138.

Published In

2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

Publication Date

January 1, 2021

Start / End Page

1133 / 1138