Scholars@Duke publication: AISHELL-3: A multi-speaker Mandarin TTS corpus

AISHELL-3: A multi-speaker Mandarin TTS corpus

Publication , Conference

Shi, Y; Bu, H; Xu, X; Zhang, S; Li, M

Published in: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

January 1, 2021

In this paper, we present AISHELL-3 ^†, a large-scale multispeaker Mandarin speech corpus which could be used to train multi-speaker Text-To-Speech (TTS) systems. The corpus contains roughly 85 hours of emotion-neutral recordings spanning across 218 native Chinese mandarin speakers. Their auxiliary attributes such as gender, age group and native accents are explicitly marked and provided in the corpus. Moreover, transcripts in Chinese character-level and pinyin-level are provided along with the recordings. We also present some data processing strategies and techniques which match with the characteristics of the presented corpus and conduct experiments on multiple speech-synthesis systems to assess the quality of the generated speech samples, showing promising results. The corpus is available online at openslr.org/93/under Apache v2.0 license.

Duke Scholars

Author Ming Li DKU Faculty

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

10.21437/Interspeech.2021-755

EISSN

2958-1796

ISSN

2308-457X

Publication Date

January 1, 2021

Volume

Start / End Page

3526 / 3530

Citation

APA

Chicago

ICMJE

MLA

NLM

Shi, Y., Bu, H., Xu, X., Zhang, S., & Li, M. (2021). AISHELL-3: A multi-speaker Mandarin TTS corpus. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech (Vol. 5, pp. 3526–3530). https://doi.org/10.21437/Interspeech.2021-755

Shi, Y., H. Bu, X. Xu, S. Zhang, and M. Li. “AISHELL-3: A multi-speaker Mandarin TTS corpus.” In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 5:3526–30, 2021. https://doi.org/10.21437/Interspeech.2021-755.

Shi Y, Bu H, Xu X, Zhang S, Li M. AISHELL-3: A multi-speaker Mandarin TTS corpus. In: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2021. p. 3526–30.

Shi, Y., et al. “AISHELL-3: A multi-speaker Mandarin TTS corpus.” Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, vol. 5, 2021, pp. 3526–30. Scopus, doi:10.21437/Interspeech.2021-755.

Shi Y, Bu H, Xu X, Zhang S, Li M. AISHELL-3: A multi-speaker Mandarin TTS corpus. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2021. p. 3526–3530.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

10.21437/Interspeech.2021-755

EISSN

2958-1796

ISSN

2308-457X

Publication Date

January 1, 2021

Volume

Start / End Page

3526 / 3530