Skip to main content
Journal cover image

Cross-lingual multi-speaker speech synthesis with limited bilingual training data

Publication ,  Journal Article
Cai, Z; Yang, Y; Li, M
Published in: Computer Speech and Language
January 1, 2023

Modeling voices for multiple speakers and multiple languages with one speech synthesis system has been a challenge for a long time, especially in low-resource cases. This paper presents two approaches to achieve cross-lingual multi-speaker text-to-speech (TTS) and code-switching synthesis under two training scenarios: (1) cross-lingual synthesis with sufficient data, (2) cross-lingual synthesis with limited data per speaker. Accordingly, a novel TTS synthesis model and a non-autoregressive multi-speaker voice conversion model are proposed. The TTS model designed for sufficient-data cases has a Tacotron-based structure that uses shared phonemic representations associated with numeric language ID codes. As for the data-limited scenario, we adopt a framework cascading several speech modules to achieve our goal. In particular, we proposed a non-autoregressive many-to-many voice conversion module to address multi-speaker synthesis for data-insufficient cases. Experimental results on speaker similarity show that our proposed voice conversion module can maintain the voice characteristics well in data-limited cases. Both approaches use limited bilingual data and demonstrate impressive performance in cross-lingual synthesis, which can deliver fluent foreign speech and even code-switching speech for monolingual speakers.

Duke Scholars

Published In

Computer Speech and Language

DOI

EISSN

1095-8363

ISSN

0885-2308

Publication Date

January 1, 2023

Volume

77

Related Subject Headings

  • Speech-Language Pathology & Audiology
  • 46 Information and computing sciences
  • 40 Engineering
  • 2004 Linguistics
  • 1702 Cognitive Sciences
  • 0801 Artificial Intelligence and Image Processing
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cai, Z., Yang, Y., & Li, M. (2023). Cross-lingual multi-speaker speech synthesis with limited bilingual training data. Computer Speech and Language, 77. https://doi.org/10.1016/j.csl.2022.101427
Cai, Z., Y. Yang, and M. Li. “Cross-lingual multi-speaker speech synthesis with limited bilingual training data.” Computer Speech and Language 77 (January 1, 2023). https://doi.org/10.1016/j.csl.2022.101427.
Cai Z, Yang Y, Li M. Cross-lingual multi-speaker speech synthesis with limited bilingual training data. Computer Speech and Language. 2023 Jan 1;77.
Cai, Z., et al. “Cross-lingual multi-speaker speech synthesis with limited bilingual training data.” Computer Speech and Language, vol. 77, Jan. 2023. Scopus, doi:10.1016/j.csl.2022.101427.
Cai Z, Yang Y, Li M. Cross-lingual multi-speaker speech synthesis with limited bilingual training data. Computer Speech and Language. 2023 Jan 1;77.
Journal cover image

Published In

Computer Speech and Language

DOI

EISSN

1095-8363

ISSN

0885-2308

Publication Date

January 1, 2023

Volume

77

Related Subject Headings

  • Speech-Language Pathology & Audiology
  • 46 Information and computing sciences
  • 40 Engineering
  • 2004 Linguistics
  • 1702 Cognitive Sciences
  • 0801 Artificial Intelligence and Image Processing