Scholars@Duke publication: Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach.

Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach.

Publication , Journal Article

Xiu, Y; Gorlatova, M

Published in: IEEE transactions on visualization and computer graphics

November 2025

The virtual content in augmented reality (AR) can introduce misleading or harmful information, leading to semantic misunderstandings or user errors. In this work, we focus on visual information manipulation (VIM) attacks in AR, where virtual content changes the meaning of real-world scenes in subtle but impactful ways. We introduce a taxonomy that categorizes these attacks into three formats: character, phrase, and pattern manipulation, and three purposes: information replacement, information obfuscation, and extra wrong information. Based on the taxonomy, we construct a dataset, AR-VIM, which consists of 452 raw-AR video pairs spanning 202 different scenes, each simulating a real-world AR scenario. To detect the attacks in the dataset, we propose a multimodal semantic reasoning framework, VIM-Sense. It combines the language and visual understanding capabilities of vision-language models (VLMs) with optical character recognition (OCR)-based textual analysis. VIM-Sense achieves an attack detection accuracy of 88.94% on AR-VIM, consistently outperforming vision-only and text-only baselines. The system achieves an average attack detection latency of 7.07 seconds in a simulated video processing framework and 7.17 seconds in a real-world evaluation conducted on a mobile Android AR application.

Duke Scholars

Author Maria Gorlatova Electrical and Computer Engineering

Published In

IEEE transactions on visualization and computer graphics

DOI

10.1109/tvcg.2025.3616842

EISSN

1941-0506

ISSN

1077-2626

Publication Date

November 2025

Volume

Issue

Start / End Page

9645 / 9655

Related Subject Headings

Software Engineering
46 Information and computing sciences
0802 Computation Theory and Mathematics
0801 Artificial Intelligence and Image Processing

Citation

APA

Chicago

ICMJE

MLA

NLM

Xiu, Y., & Gorlatova, M. (2025). Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach. IEEE Transactions on Visualization and Computer Graphics, 31(11), 9645–9655. https://doi.org/10.1109/tvcg.2025.3616842

Xiu, Yanming, and Maria Gorlatova. “Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach.” IEEE Transactions on Visualization and Computer Graphics 31, no. 11 (November 2025): 9645–55. https://doi.org/10.1109/tvcg.2025.3616842.

Xiu Y, Gorlatova M. Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach. IEEE transactions on visualization and computer graphics. 2025 Nov;31(11):9645–55.

Xiu, Yanming, and Maria Gorlatova. “Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach.” IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 11, Nov. 2025, pp. 9645–55. Epmc, doi:10.1109/tvcg.2025.3616842.

Published In

IEEE transactions on visualization and computer graphics

DOI

10.1109/tvcg.2025.3616842

EISSN

1941-0506

ISSN

1077-2626

Publication Date

November 2025

Volume

Issue

Start / End Page

9645 / 9655

Related Subject Headings

Software Engineering
46 Information and computing sciences
0802 Computation Theory and Mathematics
0801 Artificial Intelligence and Image Processing