Skip to main content

Variational Adversarial Training Towards Policies with Improved Robustness

Publication ,  Conference
Dong, J; Hsu, HL; Gao, Q; Tarokh, V; Pajic, M
Published in: Proceedings of Machine Learning Research
January 1, 2025

Reinforcement learning (RL), while being the benchmark for policy formulation, often struggles to deliver robust solutions across varying scenarios, leading to marked performance drops under environmental perturbations. Traditional adversarial training, based on a two-player max-min game, is known to bolster the robustness of RL agents, but it faces challenges: first, the complexity of the worst-case optimization problem may induce over-optimism, and second, the choice of a specific set of potential adversaries might lead to over-pessimism by considering implausible scenarios. In this work, we first observe that these two challenges do not balance out each other. Thus, we propose to apply variational optimization to optimize over the worst-case distribution of the adversary instead of a single worst-case adversary. Moreover, to counteract overoptimism, we train the RL agent to maximize the lower quantile of the cumulative rewards under worst-case adversary distribution. Our novel algorithm demonstrates a significant advancement over existing robust RL methods, corroborating the importance of the identified challenges and the effectiveness of our approach. To alleviate computational overhead associated with the proposed approach, we also propose a simplified version with lower computational burden and only minimal performance degradation. Extensive experiments validate that our approaches consistently yield policies with superior robustness.

Duke Scholars

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2025

Volume

258

Start / End Page

4681 / 4689
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Dong, J., Hsu, H. L., Gao, Q., Tarokh, V., & Pajic, M. (2025). Variational Adversarial Training Towards Policies with Improved Robustness. In Proceedings of Machine Learning Research (Vol. 258, pp. 4681–4689).
Dong, J., H. L. Hsu, Q. Gao, V. Tarokh, and M. Pajic. “Variational Adversarial Training Towards Policies with Improved Robustness.” In Proceedings of Machine Learning Research, 258:4681–89, 2025.
Dong J, Hsu HL, Gao Q, Tarokh V, Pajic M. Variational Adversarial Training Towards Policies with Improved Robustness. In: Proceedings of Machine Learning Research. 2025. p. 4681–9.
Dong, J., et al. “Variational Adversarial Training Towards Policies with Improved Robustness.” Proceedings of Machine Learning Research, vol. 258, 2025, pp. 4681–89.
Dong J, Hsu HL, Gao Q, Tarokh V, Pajic M. Variational Adversarial Training Towards Policies with Improved Robustness. Proceedings of Machine Learning Research. 2025. p. 4681–4689.

Published In

Proceedings of Machine Learning Research

EISSN

2640-3498

Publication Date

January 1, 2025

Volume

258

Start / End Page

4681 / 4689