Self-supervised Multi-Modal Video Forgery Attack Detection
Video forgery attacks threaten surveillance systems by replacing the video captures with unrealistic synthesis, which can be powered by the latest augmented reality and virtual reality technologies. From the machine perception aspect, visual objects often have RF signatures that are naturally synchronized with them during recording. In contrast to video captures, the RF signatures are more difficult to attack given their concealed and ubiquitous nature. In this work, we investigate multimodal video forgery attack detection methods using both visual and wireless modalities. Since wireless signal-based human perception is environmentally sensitive, we propose a self-supervised training strategy to enable the system to work without external annotation and thus adapt to different environments. Our method achieves a perfect human detection accuracy and a high forgery attack detection accuracy of 94.38% which is comparable with supervised methods. The code is publicly available at: https://github.com/ChuiZhao/Secure-Mask.git