Integrating frame-level boundary detection and deepfake detection for locating manipulated regions in partially spoofed audio forgery attacks
Partially fake audio, a variant of deep fake that involves manipulating audio utterances through the incorporation of fake or externally-sourced bona fide audio clips, constitutes a growing threat as an audio forgery attack impacting both human and artificial intelligence applications. Researchers have recently developed valuable databases to aid in the development of effective countermeasures against such attacks. While existing countermeasures mainly focus on identifying partially fake audio at the level of entire utterances or segments, this paper introduces a paradigm shift by proposing frame-level systems. These systems are designed to detect manipulated utterances and pinpoint the specific regions within partially fake audio where the manipulation occurs. Our approach leverages acoustic features extracted from large-scale self-supervised pre-training models, delivering promising results evaluated on diverse, publicly accessible databases. Additionally, we study the integration of boundary and deepfake detection systems, exploring their potential synergies and shortcomings. Importantly, our techniques have yielded impressive results. We have achieved state-of-the-art performance on the test dataset of the Track 2 of ADD 2022 challenge with an equal error rate of 4.4%. Furthermore, our methods exhibit remarkable performance in locating manipulated regions in Track 2 of the ADD 2023 challenge, resulting in a final ADD score of 0.6713 and securing the top position.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Related Subject Headings
- Speech-Language Pathology & Audiology
- 46 Information and computing sciences
- 40 Engineering
- 2004 Linguistics
- 1702 Cognitive Sciences
- 0801 Artificial Intelligence and Image Processing
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Related Subject Headings
- Speech-Language Pathology & Audiology
- 46 Information and computing sciences
- 40 Engineering
- 2004 Linguistics
- 1702 Cognitive Sciences
- 0801 Artificial Intelligence and Image Processing