Scholars@Duke publication: TMCSpeech: A Chinese TV and Movie Speech Dataset with Character Descriptions and a Character-Based Voice Generation Model

TMCSpeech: A Chinese TV and Movie Speech Dataset with Character Descriptions and a Character-Based Voice Generation Model

Publication , Conference

Liu, D; Lin, Y; Xu, Y; Li, M

Published in: Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

January 1, 2025

Published version (DOI)

Recent research on text-guided speech synthesis has sparked considerable interest. This study explores the potential of leveraging publicly available internet video data for speech synthesis and character-based new voice generation. We introduce a multi-modal extraction pipeline for automating the creation of speech synthesis datasets, extracting accurate character speech segments and descriptions from online videos. Additionally, we propose a person-description-based controllable voice synthesis system, establishing a mapping from character descriptions to speaker representation vectors. This system transforms character descriptions into new vectors, serving as input for zero-shot VITS to generate character-specific voices. Both objective and subjective metrics affirm our approach’s capability to generate previously unheard character-specific voices with acceptable naturalness. We plan to release the annotation set of TMCSPEECH (We only provide our collected original video links and our annotated labels for non-commercial research purposes. Our shared annotation set does not contain any audio or video data. It is the user’s responsibility to decide whether to download the video data and whether their intended purpose with the downloaded data is allowed in their country). Our audio samples can be accessed online (https://raydonld.github.io/TMCSPEECH/).

Duke Scholars

Author Yueqian Lin

Author Ming Li DKU Faculty

Published In

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

DOI

10.1007/978-3-031-78172-8_12

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2025

Volume

15306 LNCS

Start / End Page

177 / 189

Related Subject Headings

Artificial Intelligence & Image Processing
46 Information and computing sciences

Citation

APA

Chicago

ICMJE

MLA

NLM

Liu, D., Lin, Y., Xu, Y., & Li, M. (2025). TMCSpeech: A Chinese TV and Movie Speech Dataset with Character Descriptions and a Character-Based Voice Generation Model. In Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics (Vol. 15306 LNCS, pp. 177–189). https://doi.org/10.1007/978-3-031-78172-8_12

Liu, D., Y. Lin, Y. Xu, and M. Li. “TMCSpeech: A Chinese TV and Movie Speech Dataset with Character Descriptions and a Character-Based Voice Generation Model.” In Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 15306 LNCS:177–89, 2025. https://doi.org/10.1007/978-3-031-78172-8_12.

Liu D, Lin Y, Xu Y, Li M. TMCSpeech: A Chinese TV and Movie Speech Dataset with Character Descriptions and a Character-Based Voice Generation Model. In: Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. 2025. p. 177–89.

Liu, D., et al. “TMCSpeech: A Chinese TV and Movie Speech Dataset with Character Descriptions and a Character-Based Voice Generation Model.” Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, vol. 15306 LNCS, 2025, pp. 177–89. Scopus, doi:10.1007/978-3-031-78172-8_12.

Liu D, Lin Y, Xu Y, Li M. TMCSpeech: A Chinese TV and Movie Speech Dataset with Character Descriptions and a Character-Based Voice Generation Model. Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. 2025. p. 177–189.

Published In

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

DOI

10.1007/978-3-031-78172-8_12

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2025

Volume

15306 LNCS

Start / End Page

177 / 189

Related Subject Headings

Artificial Intelligence & Image Processing
46 Information and computing sciences