Scholars@Duke publication: SEF-Net: Speaker Embedding Free Target Speaker Extraction Network

SEF-Net: Speaker Embedding Free Target Speaker Extraction Network

Publication , Conference

Zeng, B; Suo, H; Wan, Y; Li, M

Published in: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

January 1, 2023

Most target speaker extraction methods use the target speaker embedding as reference information. However, the speaker embedding extracted by a speaker recognition module may not be optimal for the target speaker extraction tasks. In this paper, we proposes Speaker Embedding Free target speaker extraction Network (SEF-Net), a novel target speaker extraction model without relying on speaker embedding. SEF-Net uses cross multi-head attention in the transformer decoder to implicitly utilize the speaker information in the reference speech's conformer encoding outputs. Experimental results show that our proposed model achieves comparable performance to other target speaker extraction models. SEF-Net provides a feasible new solution to perform target speaker extraction without using a speaker embedding extractor or speaker recognition loss function.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

10.21437/Interspeech.2023-1749

EISSN

2958-1796

ISSN

2308-457X

Publication Date

January 1, 2023

Volume

2023-August

Start / End Page

3452 / 3456

Citation

APA

Chicago

ICMJE

MLA

NLM

Zeng, B., Suo, H., Wan, Y., & Li, M. (2023). SEF-Net: Speaker Embedding Free Target Speaker Extraction Network. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech (Vol. 2023-August, pp. 3452–3456). https://doi.org/10.21437/Interspeech.2023-1749

Zeng, B., H. Suo, Y. Wan, and M. Li. “SEF-Net: Speaker Embedding Free Target Speaker Extraction Network.” In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2023-August:3452–56, 2023. https://doi.org/10.21437/Interspeech.2023-1749.

Zeng B, Suo H, Wan Y, Li M. SEF-Net: Speaker Embedding Free Target Speaker Extraction Network. In: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2023. p. 3452–6.

Zeng, B., et al. “SEF-Net: Speaker Embedding Free Target Speaker Extraction Network.” Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, vol. 2023-August, 2023, pp. 3452–56. Scopus, doi:10.21437/Interspeech.2023-1749.

Zeng B, Suo H, Wan Y, Li M. SEF-Net: Speaker Embedding Free Target Speaker Extraction Network. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2023. p. 3452–3456.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

10.21437/Interspeech.2023-1749

EISSN

2958-1796

ISSN

2308-457X

Publication Date

January 1, 2023

Volume

2023-August

Start / End Page

3452 / 3456