Scholars@Duke publication: Wasserstein contrastive representation distillation

Wasserstein contrastive representation distillation

Publication , Journal Article

Chen, L; Wang, D; Gan, Z; Liu, J; Henao, R; Carin, L

Published in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

January 1, 2021

The primary goal of knowledge distillation (KD) is to encapsulate the information of a model learned from a teacher network into a student network, with the latter being more compact than the former. Existing work, e.g., using Kullback-Leibler divergence for distillation, may fail to capture important structural knowledge in the teacher network and often lacks the ability for feature generalization, particularly in situations when teacher and student are built to address different classification tasks. We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for KD. The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks. The primal form is used for local contrastive knowledge transfer within a mini-batch, effectively matching the distributions of features between the teacher and the student networks. Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.

Duke Scholars

Author Ricardo Henao Biostatistics & Bioinformatics, Division of Translational Bi ...

Author Lawrence Carin Electrical and Computer Engineering

Altmetric Attention Stats

Dimensions Citation Stats

Published In

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

DOI

10.1109/CVPR46437.2021.01603

ISSN

1063-6919

Publication Date

January 1, 2021

Start / End Page

16291 / 16300

Citation

APA

Chicago

ICMJE

MLA

NLM

Chen, L., Wang, D., Gan, Z., Liu, J., Henao, R., & Carin, L. (2021). Wasserstein contrastive representation distillation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 16291–16300. https://doi.org/10.1109/CVPR46437.2021.01603

Chen, L., D. Wang, Z. Gan, J. Liu, R. Henao, and L. Carin. “Wasserstein contrastive representation distillation.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, January 1, 2021, 16291–300. https://doi.org/10.1109/CVPR46437.2021.01603.

Chen L, Wang D, Gan Z, Liu J, Henao R, Carin L. Wasserstein contrastive representation distillation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2021 Jan 1;16291–300.

Chen, L., et al. “Wasserstein contrastive representation distillation.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jan. 2021, pp. 16291–300. Scopus, doi:10.1109/CVPR46437.2021.01603.

Published In

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

DOI

10.1109/CVPR46437.2021.01603

ISSN

1063-6919

Publication Date

January 1, 2021

Start / End Page

16291 / 16300