Scholars@Duke publication: Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System

Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System

Publication , Conference

Li, Z; Shi, Y; Xu, Y; Li, M

Published in: Proceedings IEEE International Conference on Multimedia and Expo

January 1, 2025

Speaker embedding based zero-shot Text-to-Speech (TTS) systems enable high-quality speech synthesis for unseen speakers using minimal data. However, these systems are vulnerable to adversarial attacks, where an attacker introduces imperceptible perturbations to the original speaker's audio waveform, leading to synthesized speech sounds like another person. This vulnerability poses significant security risks, including speaker identity spoofing and unauthorized voice manipulation. This paper investigates two primary defense strategies to address these threats: adversarial training and adversarial purification. Adversarial training enhances the model's robustness by integrating adversarial examples during the training process, thereby improving resistance to such attacks. Adversarial purification, on the other hand, employs diffusion probabilistic models to revert adversarially perturbed audio to its clean form. Experimental results demonstrate that these defense mechanisms can significantly reduce the impact of adversarial perturbations, enhancing the security and reliability of speaker embedding based zero-shot TTS systems in adversarial environments.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Proceedings IEEE International Conference on Multimedia and Expo

DOI

10.1109/ICME59968.2025.11210164

EISSN

1945-788X

ISSN

1945-7871

Publication Date

January 1, 2025

Citation

APA

Chicago

ICMJE

MLA

NLM

Li, Z., Shi, Y., Xu, Y., & Li, M. (2025). Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System. In Proceedings IEEE International Conference on Multimedia and Expo. https://doi.org/10.1109/ICME59968.2025.11210164

Li, Z., Y. Shi, Y. Xu, and M. Li. “Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System.” In Proceedings IEEE International Conference on Multimedia and Expo, 2025. https://doi.org/10.1109/ICME59968.2025.11210164.

Li Z, Shi Y, Xu Y, Li M. Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System. In: Proceedings IEEE International Conference on Multimedia and Expo. 2025.

Li, Z., et al. “Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System.” Proceedings IEEE International Conference on Multimedia and Expo, 2025. Scopus, doi:10.1109/ICME59968.2025.11210164.

Li Z, Shi Y, Xu Y, Li M. Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System. Proceedings IEEE International Conference on Multimedia and Expo. 2025.

Published In

Proceedings IEEE International Conference on Multimedia and Expo

DOI

10.1109/ICME59968.2025.11210164

EISSN

1945-788X

ISSN

1945-7871

Publication Date

January 1, 2025