MGGA: Universal Perturbations against Deepfake via Multiple Model-based Gradient-Guided Feature Layer Attack
The application of deepfake models for image editing has become increasingly popular, yet their malicious use poses significant risks. Recent studies using active defense mechanisms achieve satisfactory results while maintaining the forgery model and dataset, whereas their performance declines significantly when encountering out-of-distribution samples. To address this issue, we propose a method that utilizes gradient information to guide attacks on intermediate feature layers so that we can focus on data’s intrinsic features rather than model-specific training features. When the adversarial perturbation is correlated with the data, it remains effective in defending against multiple forgery models, even though these models are based on different infrastructures such as GAN and DM. Furthermore, we incorporate the mixup technique to enhance the transferability of adversarial perturbation to data. Our extensive experiments show that the proposed universal perturbation successfully distorts the outputs of various forgery models across different datasets.