Skip to main content

Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm

Publication ,  Conference
Wu, Y; Shi, J; Yu, Y; Tang, Y; Qian, T; Lin, Y; Han, J; Bai, X; Watanabe, S; Jin, Q
Published in: Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia
October 28, 2024

This research presents Muskits-ESPnet, a versatile toolkit that introduces new paradigms to Singing Voice Synthesis (SVS) through the application of pretrained audio models in both continuous and discrete approaches. Specifically, we explore discrete representations derived from SSL models and audio codecs and offer significant advantages in versatility and intelligence, supporting multi-format inputs and adaptable data processing workflows for various SVS models. The toolkit features automatic music score error detection and correction, as well as a perception auto-evaluation module to imitate human subjective evaluating scores. Muskits-ESPnet is available at https://github.com/espnet/espnet.

Duke Scholars

Published In

Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia

DOI

Publication Date

October 28, 2024

Start / End Page

11279 / 11281
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Wu, Y., Shi, J., Yu, Y., Tang, Y., Qian, T., Lin, Y., … Jin, Q. (2024). Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm. In Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia (pp. 11279–11281). https://doi.org/10.1145/3664647.3685000
Wu, Y., J. Shi, Y. Yu, Y. Tang, T. Qian, Y. Lin, J. Han, X. Bai, S. Watanabe, and Q. Jin. “Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm.” In Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia, 11279–81, 2024. https://doi.org/10.1145/3664647.3685000.
Wu Y, Shi J, Yu Y, Tang Y, Qian T, Lin Y, et al. Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm. In: Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia. 2024. p. 11279–81.
Wu, Y., et al. “Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm.” Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 11279–81. Scopus, doi:10.1145/3664647.3685000.
Wu Y, Shi J, Yu Y, Tang Y, Qian T, Lin Y, Han J, Bai X, Watanabe S, Jin Q. Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm. Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia. 2024. p. 11279–11281.

Published In

Mm 2024 Proceedings of the 32nd ACM International Conference on Multimedia

DOI

Publication Date

October 28, 2024

Start / End Page

11279 / 11281