Scholars@Duke publication: An audio based piano performance evaluation method using deep neural network based acoustic modeling

An audio based piano performance evaluation method using deep neural network based acoustic modeling

Publication , Conference

Pan, J; Li, M; Song, Z; Li, X; Liu, X; Yi, H; Zhu, M

Published in: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

January 1, 2017

In this paper, we propose an annotated piano performance evaluation dataset with 185 audio pieces and a method to evaluate the performance of piano beginners based on their audio recordings. The proposed framework includes three parts: piano key posterior probability extraction, Dynamic Time Warping (DTW) based matching and performance score regression. First, a deep neural network model is trained to extract 88 dimensional piano key features from Constant-Q Transform (CQT) spectrum. The proposed acoustic model shows high robustness to the recording environments. Second, we employ the DTW algorithm on the high-level piano key feature sequences to align the input with the template. Upon the alignment, we extract multiple global matching features that could reflect the similarity between the input and the template. Finally, we apply linear regression upon these matching features with the scores annotated by expertise in training data to estimate performance scores for test audio. Experimental results show that our automatic evaluation method achieves 2.64 average absolute score error in score range from 0 to 100, and 0.73 average correlation coefficient on our in-house collected YCU-MPPE-II dataset.

Duke Scholars

Author Ming Li DKU Faculty

Author Xin Li Electrical and Computer Engineering

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

10.21437/Interspeech.2017-866

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2017

Volume

2017-August

Start / End Page

3088 / 3092

Citation

APA

Chicago

ICMJE

MLA

NLM

Pan, J., Li, M., Song, Z., Li, X., Liu, X., Yi, H., & Zhu, M. (2017). An audio based piano performance evaluation method using deep neural network based acoustic modeling. In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech (Vol. 2017-August, pp. 3088–3092). https://doi.org/10.21437/Interspeech.2017-866

Pan, J., M. Li, Z. Song, X. Li, X. Liu, H. Yi, and M. Zhu. “An audio based piano performance evaluation method using deep neural network based acoustic modeling.” In Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, 2017-August:3088–92, 2017. https://doi.org/10.21437/Interspeech.2017-866.

Pan J, Li M, Song Z, Li X, Liu X, Yi H, et al. An audio based piano performance evaluation method using deep neural network based acoustic modeling. In: Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2017. p. 3088–92.

Pan, J., et al. “An audio based piano performance evaluation method using deep neural network based acoustic modeling.” Proceedings of the Annual Conference of the International Speech Communication Association Interspeech, vol. 2017-August, 2017, pp. 3088–92. Scopus, doi:10.21437/Interspeech.2017-866.

Pan J, Li M, Song Z, Li X, Liu X, Yi H, Zhu M. An audio based piano performance evaluation method using deep neural network based acoustic modeling. Proceedings of the Annual Conference of the International Speech Communication Association Interspeech. 2017. p. 3088–3092.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association Interspeech

DOI

10.21437/Interspeech.2017-866

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2017

Volume

2017-August

Start / End Page

3088 / 3092