Probing the Augmented Reality Scene Analysis Capabilities of Large Multimodal Models: Toward Reliable Real-Time Assessment Solutions
Augmented Reality (AR) is transforming everyday experiences across domains like education, entertainment, and healthcare. As AR technologies become increasingly widespread, human-aligned and scalable AR quality evaluation is critical for optimizing immersive user experiences. This paper investigates the potential of Large Multimodal Models (LMMs) for automating AR quality assessment. We curate DiverseAR+, a new dataset of 1,405 scenes collected from diverse sources and environments, and use it to evaluate four commercial LMMs. Our results show that LMMs can perceive, describe, and judge AR content with promising accuracy. To deliver real-time, scalable, and robust AR quality evaluation, we propose a hybrid cloud-edge architecture that combines LMMs with traditional machine learning models, enabling scalable and real-time evaluation under diverse network conditions. We argue that task-tailored AR-LMM systems can make AR experience evaluation more efficient, adaptive, and user-centered.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Related Subject Headings
- Networking & Telecommunications
- 4606 Distributed computing and systems software
- 4009 Electronics, sensors and digital hardware
- 1005 Communications Technologies
- 0906 Electrical and Electronic Engineering
- 0805 Distributed Computing
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Related Subject Headings
- Networking & Telecommunications
- 4606 Distributed computing and systems software
- 4009 Electronics, sensors and digital hardware
- 1005 Communications Technologies
- 0906 Electrical and Electronic Engineering
- 0805 Distributed Computing