Scholars@Duke publication: Athena: A Plug-and-Play Advisor for Retrieval-Augmented Generation using VectorDB

Athena: A Plug-and-Play Advisor for Retrieval-Augmented Generation using VectorDB

Publication , Conference

Liang, N; Wenz, F; Giceva, J; Wills, LW

Published in: Proceedings 2025 IEEE International Symposium on Workload Characterization Iiswc 2025

January 1, 2025

Retrieval-Augmented Generation (RAG) has emerged as a popular technique for addressing several challenges of Large Language Model (LLM) systems, including static model knowledge, hallucination, and limited input sequence lengths. Although RAG mitigates common pitfalls of current LLM systems, its inherent heterogeneity and configurability introduce new challenges. The performance of RAG is crucial for meeting the high-throughput and low-latency demands of LLM services. Different components of RAG operate on different hardware platforms, and their complexity scales with the configurability and complexity of the rest of the system. For example, larger embeddings may enhance retrieval accuracy, but also increase the latency of embedding creation and indexing, thereby compromising the RAG system's performance and energy consumption.Thus, a comprehensive characterization of an end-to-end RAG system becomes necessary. In this work, we build an end-to-end RAG benchmarking framework, Athena, that supports various embedding models, vector databases, index/search algorithms, and LLMs. By characterizing the system under various RAG settings built using Athena, we demystify RAG by identifying performance bottlenecks and quantifying the impact of each sub-component on overall system performance. In addition, the plug-and-play, open-sourced Athena framework is designed to assist future RAG research.

Duke Scholars

Author Lisa Wills Computer Science

Published In

Proceedings 2025 IEEE International Symposium on Workload Characterization Iiswc 2025

DOI

10.1109/IISWC66894.2025.00013

Publication Date

January 1, 2025

Start / End Page

28 / 41

Citation

APA

Chicago

ICMJE

MLA

NLM

Liang, N., Wenz, F., Giceva, J., & Wills, L. W. (2025). Athena: A Plug-and-Play Advisor for Retrieval-Augmented Generation using VectorDB. In Proceedings 2025 IEEE International Symposium on Workload Characterization Iiswc 2025 (pp. 28–41). https://doi.org/10.1109/IISWC66894.2025.00013

Liang, N., F. Wenz, J. Giceva, and L. W. Wills. “Athena: A Plug-and-Play Advisor for Retrieval-Augmented Generation using VectorDB.” In Proceedings 2025 IEEE International Symposium on Workload Characterization Iiswc 2025, 28–41, 2025. https://doi.org/10.1109/IISWC66894.2025.00013.

Liang N, Wenz F, Giceva J, Wills LW. Athena: A Plug-and-Play Advisor for Retrieval-Augmented Generation using VectorDB. In: Proceedings 2025 IEEE International Symposium on Workload Characterization Iiswc 2025. 2025. p. 28–41.

Liang, N., et al. “Athena: A Plug-and-Play Advisor for Retrieval-Augmented Generation using VectorDB.” Proceedings 2025 IEEE International Symposium on Workload Characterization Iiswc 2025, 2025, pp. 28–41. Scopus, doi:10.1109/IISWC66894.2025.00013.

Liang N, Wenz F, Giceva J, Wills LW. Athena: A Plug-and-Play Advisor for Retrieval-Augmented Generation using VectorDB. Proceedings 2025 IEEE International Symposium on Workload Characterization Iiswc 2025. 2025. p. 28–41.

Published In

Proceedings 2025 IEEE International Symposium on Workload Characterization Iiswc 2025

DOI

10.1109/IISWC66894.2025.00013

Publication Date

January 1, 2025

Start / End Page

28 / 41