Skip to main content
Journal cover image

Integrating frame-level boundary detection and deepfake detection for locating manipulated regions in partially spoofed audio forgery attacks

Publication ,  Journal Article
Cai, Z; Li, M
Published in: Computer Speech and Language
April 1, 2024

Partially fake audio, a variant of deep fake that involves manipulating audio utterances through the incorporation of fake or externally-sourced bona fide audio clips, constitutes a growing threat as an audio forgery attack impacting both human and artificial intelligence applications. Researchers have recently developed valuable databases to aid in the development of effective countermeasures against such attacks. While existing countermeasures mainly focus on identifying partially fake audio at the level of entire utterances or segments, this paper introduces a paradigm shift by proposing frame-level systems. These systems are designed to detect manipulated utterances and pinpoint the specific regions within partially fake audio where the manipulation occurs. Our approach leverages acoustic features extracted from large-scale self-supervised pre-training models, delivering promising results evaluated on diverse, publicly accessible databases. Additionally, we study the integration of boundary and deepfake detection systems, exploring their potential synergies and shortcomings. Importantly, our techniques have yielded impressive results. We have achieved state-of-the-art performance on the test dataset of the Track 2 of ADD 2022 challenge with an equal error rate of 4.4%. Furthermore, our methods exhibit remarkable performance in locating manipulated regions in Track 2 of the ADD 2023 challenge, resulting in a final ADD score of 0.6713 and securing the top position.

Duke Scholars

Published In

Computer Speech and Language

DOI

EISSN

1095-8363

ISSN

0885-2308

Publication Date

April 1, 2024

Volume

85

Related Subject Headings

  • Speech-Language Pathology & Audiology
  • 46 Information and computing sciences
  • 40 Engineering
  • 2004 Linguistics
  • 1702 Cognitive Sciences
  • 0801 Artificial Intelligence and Image Processing
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Journal cover image

Published In

Computer Speech and Language

DOI

EISSN

1095-8363

ISSN

0885-2308

Publication Date

April 1, 2024

Volume

85

Related Subject Headings

  • Speech-Language Pathology & Audiology
  • 46 Information and computing sciences
  • 40 Engineering
  • 2004 Linguistics
  • 1702 Cognitive Sciences
  • 0801 Artificial Intelligence and Image Processing