Scholars@Duke publication: ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models

ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models

Publication , Conference

Zhang, J; Muhamed, A; Anantharaman, A; Wang, G; Chen, C; Zhong, K; Cui, Q; Xu, Y; Zeng, B; Chilimbi, T; Chen, Y

Published in: Proceedings of the Annual Meeting of the Association for Computational Linguistics

January 1, 2023

Knowledge Distillation (KD) (Hinton et al., 2015) is one of the most effective approaches for deploying large-scale pre-trained language models in low-latency environments by transferring the knowledge contained in the large-scale models to smaller student models. Previous KD approaches use the soft labels and intermediate activations generated by the teacher to transfer knowledge to the student model parameters alone. In this paper, we show that having access to non-parametric memory in the form of a knowledge base with the teacher’s soft labels and predictions can further enhance student capacity and improve generalization. To enable the student to retrieve from the knowledge base effectively, we propose a new Retrieval-augmented KD framework with a loss function that aligns the relational knowledge in teacher and student embedding spaces. We show through extensive experiments that our retrieval mechanism can achieve state-of-the-art performance for task-specific knowledge distillation on the GLUE benchmark (Wang et al., 2018a).

Duke Scholars

Author Yiran Chen Electrical and Computer Engineering

Published In

Proceedings of the Annual Meeting of the Association for Computational Linguistics

ISSN

0736-587X

Publication Date

January 1, 2023

Volume

Start / End Page

1128 / 1136

Citation

APA

Chicago

ICMJE

MLA

NLM

Zhang, J., Muhamed, A., Anantharaman, A., Wang, G., Chen, C., Zhong, K., … Chen, Y. (2023). ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2, pp. 1128–1136).

Zhang, J., A. Muhamed, A. Anantharaman, G. Wang, C. Chen, K. Zhong, Q. Cui, et al. “ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models.” In Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2:1128–36, 2023.

Zhang J, Muhamed A, Anantharaman A, Wang G, Chen C, Zhong K, et al. ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models. In: Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2023. p. 1128–36.

Zhang, J., et al. “ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models.” Proceedings of the Annual Meeting of the Association for Computational Linguistics, vol. 2, 2023, pp. 1128–36.

Zhang J, Muhamed A, Anantharaman A, Wang G, Chen C, Zhong K, Cui Q, Xu Y, Zeng B, Chilimbi T, Chen Y. ReAugKD: Retrieval-Augmented Knowledge Distillation For Pre-trained Language Models. Proceedings of the Annual Meeting of the Association for Computational Linguistics. 2023. p. 1128–1136.

Published In

Proceedings of the Annual Meeting of the Association for Computational Linguistics

ISSN

0736-587X

Publication Date

January 1, 2023

Volume

Start / End Page

1128 / 1136