Skip to main content

Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach.

Publication ,  Journal Article
Xiu, Y; Gorlatova, M
Published in: IEEE transactions on visualization and computer graphics
November 2025

The virtual content in augmented reality (AR) can introduce misleading or harmful information, leading to semantic misunderstandings or user errors. In this work, we focus on visual information manipulation (VIM) attacks in AR, where virtual content changes the meaning of real-world scenes in subtle but impactful ways. We introduce a taxonomy that categorizes these attacks into three formats: character, phrase, and pattern manipulation, and three purposes: information replacement, information obfuscation, and extra wrong information. Based on the taxonomy, we construct a dataset, AR-VIM, which consists of 452 raw-AR video pairs spanning 202 different scenes, each simulating a real-world AR scenario. To detect the attacks in the dataset, we propose a multimodal semantic reasoning framework, VIM-Sense. It combines the language and visual understanding capabilities of vision-language models (VLMs) with optical character recognition (OCR)-based textual analysis. VIM-Sense achieves an attack detection accuracy of 88.94% on AR-VIM, consistently outperforming vision-only and text-only baselines. The system achieves an average attack detection latency of 7.07 seconds in a simulated video processing framework and 7.17 seconds in a real-world evaluation conducted on a mobile Android AR application.

Duke Scholars

Published In

IEEE transactions on visualization and computer graphics

DOI

EISSN

1941-0506

ISSN

1077-2626

Publication Date

November 2025

Volume

31

Issue

11

Start / End Page

9645 / 9655

Related Subject Headings

  • Software Engineering
  • 46 Information and computing sciences
  • 0802 Computation Theory and Mathematics
  • 0801 Artificial Intelligence and Image Processing
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Xiu, Y., & Gorlatova, M. (2025). Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach. IEEE Transactions on Visualization and Computer Graphics, 31(11), 9645–9655. https://doi.org/10.1109/tvcg.2025.3616842
Xiu, Yanming, and Maria Gorlatova. “Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach.IEEE Transactions on Visualization and Computer Graphics 31, no. 11 (November 2025): 9645–55. https://doi.org/10.1109/tvcg.2025.3616842.
Xiu Y, Gorlatova M. Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach. IEEE transactions on visualization and computer graphics. 2025 Nov;31(11):9645–55.
Xiu, Yanming, and Maria Gorlatova. “Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach.IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 11, Nov. 2025, pp. 9645–55. Epmc, doi:10.1109/tvcg.2025.3616842.
Xiu Y, Gorlatova M. Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach. IEEE transactions on visualization and computer graphics. 2025 Nov;31(11):9645–9655.

Published In

IEEE transactions on visualization and computer graphics

DOI

EISSN

1941-0506

ISSN

1077-2626

Publication Date

November 2025

Volume

31

Issue

11

Start / End Page

9645 / 9655

Related Subject Headings

  • Software Engineering
  • 46 Information and computing sciences
  • 0802 Computation Theory and Mathematics
  • 0801 Artificial Intelligence and Image Processing