Generalizability of Large Language Model-Based Agents: A Comprehensive Survey
Publication
, Journal Article
Zhang, M; Yang, Y; Xie, R; Dhingra, B; Zhou, S; Pei, J
Published in: ACM Computing Surveys
Large Language Model (LLM)-based agents have recently emerged as a new paradigm that extends the capabilities of LLMs beyond text generation to dynamic interaction with external environments. A critical challenge lies in ensuring their
eneralizability – the ability to maintain consistently high performance across varied instructions, tasks, environments, and domains, especially those different from the agent’s fine-tuning data. Despite growing interest, the concept of generalizability in LLM-based agents remains underdefined, and systematic approaches to measure and improve it are lacking. We provide the first comprehensive review of generalizability in LLM-based agents. We begin by clarifying the definition and boundaries of agent generalizability. We then review existing benchmarks. Next, we categorize strategies for improving generalizability into three groups: methods targeting the backbone LLM, targeting agent components, and targeting their interactions. Furthermore, we introduce the distinction between
eneralizable frameworks and
eneralizable agents and outline how generalizable frameworks can be translated into agent-level generalizability. Finally, we identify future directions, including the development of standardized evaluation frameworks, variance- and cost-based metrics, and hybrid approaches that integrate methodological innovations with agent architecture-level designs. We aim to establish a foundation for principled research on building LLM-based agents that generalize reliably across diverse real-world applications.