An audio based piano performance evaluation method using deep neural network based acoustic modeling
In this paper, we propose an annotated piano performance evaluation dataset with 185 audio pieces and a method to evaluate the performance of piano beginners based on their audio recordings. The proposed framework includes three parts: piano key posterior probability extraction, Dynamic Time Warping (DTW) based matching and performance score regression. First, a deep neural network model is trained to extract 88 dimensional piano key features from Constant-Q Transform (CQT) spectrum. The proposed acoustic model shows high robustness to the recording environments. Second, we employ the DTW algorithm on the high-level piano key feature sequences to align the input with the template. Upon the alignment, we extract multiple global matching features that could reflect the similarity between the input and the template. Finally, we apply linear regression upon these matching features with the scores annotated by expertise in training data to estimate performance scores for test audio. Experimental results show that our automatic evaluation method achieves 2.64 average absolute score error in score range from 0 to 100, and 0.73 average correlation coefficient on our in-house collected YCU-MPPE-II dataset.