Skip to main content

An audio based piano performance evaluation method using deep neural network based acoustic modeling

Publication ,  Conference
Pan, J; Li, M; Song, Z; Li, X; Liu, X; Yi, H; Zhu, M
Published in: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
January 1, 2017

In this paper, we propose an annotated piano performance evaluation dataset with 185 audio pieces and a method to evaluate the performance of piano beginners based on their audio recordings. The proposed framework includes three parts: piano key posterior probability extraction, Dynamic Time Warping (DTW) based matching and performance score regression. First, a deep neural network model is trained to extract 88 dimensional piano key features from Constant-Q Transform (CQT) spectrum. The proposed acoustic model shows high robustness to the recording environments. Second, we employ the DTW algorithm on the high-level piano key feature sequences to align the input with the template. Upon the alignment, we extract multiple global matching features that could reflect the similarity between the input and the template. Finally, we apply linear regression upon these matching features with the scores annotated by expertise in training data to estimate performance scores for test audio. Experimental results show that our automatic evaluation method achieves 2.64 average absolute score error in score range from 0 to 100, and 0.73 average correlation coefficient on our in-house collected YCU-MPPE-II dataset.

Duke Scholars

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2017

Volume

2017-August

Start / End Page

3088 / 3092
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Pan, J., Li, M., Song, Z., Li, X., Liu, X., Yi, H., & Zhu, M. (2017). An audio based piano performance evaluation method using deep neural network based acoustic modeling. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Vol. 2017-August, pp. 3088–3092). https://doi.org/10.21437/Interspeech.2017-866
Pan, J., M. Li, Z. Song, X. Li, X. Liu, H. Yi, and M. Zhu. “An audio based piano performance evaluation method using deep neural network based acoustic modeling.” In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2017-August:3088–92, 2017. https://doi.org/10.21437/Interspeech.2017-866.
Pan J, Li M, Song Z, Li X, Liu X, Yi H, et al. An audio based piano performance evaluation method using deep neural network based acoustic modeling. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2017. p. 3088–92.
Pan, J., et al. “An audio based piano performance evaluation method using deep neural network based acoustic modeling.” Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2017-August, 2017, pp. 3088–92. Scopus, doi:10.21437/Interspeech.2017-866.
Pan J, Li M, Song Z, Li X, Liu X, Yi H, Zhu M. An audio based piano performance evaluation method using deep neural network based acoustic modeling. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. 2017. p. 3088–3092.

Published In

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

DOI

EISSN

1990-9772

ISSN

2308-457X

Publication Date

January 1, 2017

Volume

2017-August

Start / End Page

3088 / 3092