Scholars@Duke publication: Audio-Based Piano Performance Evaluation for Beginners with Convolutional Neural Network and Attention Mechanism

Audio-Based Piano Performance Evaluation for Beginners with Convolutional Neural Network and Attention Mechanism

Publication , Journal Article

Wang, W; Pan, J; Yi, H; Song, Z; Li, M

Published in: IEEE ACM Transactions on Audio Speech and Language Processing

January 1, 2021

In this paper, we propose two different audio-based piano performance evaluation systems for beginners. The first is a sequential and modularized system, including three steps: Convolutional Neural Network (CNN)-based acoustic feature extraction, matching via dynamic time warping (DTW), and performance score regression. The second system is an end-to-end system with CNNs and the attention mechanism. It takes two acoustic feature sequences as input and directly predicts a performance score. We evaluate two proposed methods with our new open-access Yingcai Piano Performance Evaluation Phase III Dataset (YCU-PPE-III) that contains more than 2000 piano audio pieces recorded in multiple real test sessions. Experimental results show that the modularized system achieves a mean absolute error (MAE) of 3.79 in a 0-100-point range. Another end-to-end system also achieves an MAE of 4.40, which shows that it is possible to train a robust end-to-end piano performance evaluation system with only two thousand audio pieces.

Duke Scholars

Author Ming Li DKU Faculty

Published In

IEEE ACM Transactions on Audio Speech and Language Processing

DOI

10.1109/TASLP.2021.3061267

EISSN

2329-9304

ISSN

2329-9290

Publication Date

January 1, 2021

Volume

Start / End Page

1119 / 1133

Citation

APA

Chicago

ICMJE

MLA

NLM

Wang, W., Pan, J., Yi, H., Song, Z., & Li, M. (2021). Audio-Based Piano Performance Evaluation for Beginners with Convolutional Neural Network and Attention Mechanism. IEEE ACM Transactions on Audio Speech and Language Processing, 29, 1119–1133. https://doi.org/10.1109/TASLP.2021.3061267

Wang, W., J. Pan, H. Yi, Z. Song, and M. Li. “Audio-Based Piano Performance Evaluation for Beginners with Convolutional Neural Network and Attention Mechanism.” IEEE ACM Transactions on Audio Speech and Language Processing 29 (January 1, 2021): 1119–33. https://doi.org/10.1109/TASLP.2021.3061267.

Wang W, Pan J, Yi H, Song Z, Li M. Audio-Based Piano Performance Evaluation for Beginners with Convolutional Neural Network and Attention Mechanism. IEEE ACM Transactions on Audio Speech and Language Processing. 2021 Jan 1;29:1119–33.

Wang, W., et al. “Audio-Based Piano Performance Evaluation for Beginners with Convolutional Neural Network and Attention Mechanism.” IEEE ACM Transactions on Audio Speech and Language Processing, vol. 29, Jan. 2021, pp. 1119–33. Scopus, doi:10.1109/TASLP.2021.3061267.

Published In

IEEE ACM Transactions on Audio Speech and Language Processing

DOI

10.1109/TASLP.2021.3061267

EISSN

2329-9304

ISSN

2329-9290

Publication Date

January 1, 2021

Volume

Start / End Page

1119 / 1133