Skip to main content

MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack

Publication ,  Conference
Cao, Z; Wang, Z; Wang, R; Yang, Y; Tian, F; Wu, G; Suzuki, A
Published in: Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025
October 27, 2025

The application of deepfake models for image editing has become increasingly popular, yet their malicious use poses significant risks. Recent studies using active defense mechanisms achieve satisfactory results while maintaining the forgery model and dataset, whereas their performance declines significantly when encountering out-of-distribution samples. To address this issue, we propose a method that utilizes gradient information to guide attacks on intermediate feature layers so that we can focus on data’s intrinsic features rather than model-specific training features. When the adversarial perturbation is correlated with the data, it remains effective in defending against multiple forgery models, even though these models are based on different infrastructures such as GAN and DM. Furthermore, we incorporate the mixup technique to enhance the transferability of adversarial perturbation to data. Our extensive experiments show that the proposed universal perturbation successfully distorts the outputs of various forgery models across different datasets.

Duke Scholars

Published In

Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025

DOI

Publication Date

October 27, 2025

Start / End Page

73 / 82
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Cao, Z., Wang, Z., Wang, R., Yang, Y., Tian, F., Wu, G., & Suzuki, A. (2025). MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack. In Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025 (pp. 73–82). https://doi.org/10.1145/3746265.3759661
Cao, Z., Z. Wang, R. Wang, Y. Yang, F. Tian, G. Wu, and A. Suzuki. “MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack.” In Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025, 73–82, 2025. https://doi.org/10.1145/3746265.3759661.
Cao Z, Wang Z, Wang R, Yang Y, Tian F, Wu G, et al. MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack. In: Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025. 2025. p. 73–82.
Cao, Z., et al. “MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack.” Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025, 2025, pp. 73–82. Scopus, doi:10.1145/3746265.3759661.
Cao Z, Wang Z, Wang R, Yang Y, Tian F, Wu G, Suzuki A. MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack. Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025. 2025. p. 73–82.

Published In

Proceedings of the 1st on Deepfake Forensics Workshop Detection Attribution Recognition and Adversarial Challenges in the Era of AI Generated Media Dff 2025

DOI

Publication Date

October 27, 2025

Start / End Page

73 / 82