Scholars@Duke publication: MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack

MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack

Publication , Conference

Cao, Z; Wang, Z; Wang, R; Yang, Y; Tian, F; Wu, G; Suzuki, A

Published in: Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025

October 27, 2025

Published version (DOI)

The application of deepfake models for image editing has become increasingly popular, yet their malicious use poses significant risks. Recent studies using active defense mechanisms achieve satisfactory results while maintaining the forgery model and dataset, whereas their performance declines significantly when encountering out-of-distribution samples. To address this issue, we propose a method that utilizes gradient information to guide attacks on intermediate feature layers so that we can focus on data’s intrinsic features rather than model-specific training features. When the adversarial perturbation is correlated with the data, it remains effective in defending against multiple forgery models, even though these models are based on different infrastructures such as GAN and DM. Furthermore, we incorporate the mixup technique to enhance the transferability of adversarial perturbation to data. Our extensive experiments show that the proposed universal perturbation successfully distorts the outputs of various forgery models across different datasets.

Duke Scholars

Author Feng Tian DKU Faculty

Published In

Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025

DOI

10.1145/3746265.3759661

Publication Date

October 27, 2025

Start / End Page

73 / 82

Citation

APA

Chicago

ICMJE

MLA

NLM

Cao, Z., Wang, Z., Wang, R., Yang, Y., Tian, F., Wu, G., & Suzuki, A. (2025). MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack. In Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025 (pp. 73–82). https://doi.org/10.1145/3746265.3759661

Cao, Z., Z. Wang, R. Wang, Y. Yang, F. Tian, G. Wu, and A. Suzuki. “MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack.” In Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025, 73–82, 2025. https://doi.org/10.1145/3746265.3759661.

Cao Z, Wang Z, Wang R, Yang Y, Tian F, Wu G, et al. MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack. In: Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025. 2025. p. 73–82.

Cao, Z., et al. “MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack.” Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025, 2025, pp. 73–82. Scopus, doi:10.1145/3746265.3759661.

Cao Z, Wang Z, Wang R, Yang Y, Tian F, Wu G, Suzuki A. MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack. Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025. 2025. p. 73–82.

Published In

Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025

DOI

10.1145/3746265.3759661

Publication Date

October 27, 2025

Start / End Page

73 / 82