Scholars@Duke publication: Policy optimization as wasserstein gradient flows

Policy optimization as wasserstein gradient flows

Publication , Conference

Zhang, R; Chen, C; Li, C; Carin, L

Published in: 35th International Conference on Machine Learning Icml 2018

January 1, 2018

Policy optimization is a core component of reinforcement learning (RL), and most existing RL methods directly optimize parameters of a policy based on maximizing the expected total reward, or its surrogate. Though often achieving encouraging empirical success, its underlying mathematical principle on policy-distribution optimization is unclear. We place policy optimization into the space of probability measures, and interpret it as Wasserstein gradient flows. On the probability-measure space, under specified circumstances, policy optimization becomes a convex problem in terms of distribution optimization. To make optimization feasible, we develop efficient algorithms by numerically solving the corresponding discrete gradient flows. Our technique is applicable to several RL settings, and is related to many state-of-the-art policy-optimization algorithms. Empirical results verify the effectiveness of our framework, often obtaining better performance compared to related algorithms.

Duke Scholars

Author Lawrence Carin Electrical and Computer Engineering

Published In

35th International Conference on Machine Learning Icml 2018

Publication Date

January 1, 2018

Volume

Start / End Page

9134 / 9143

Citation

APA

Chicago

ICMJE

MLA

NLM

Zhang, R., Chen, C., Li, C., & Carin, L. (2018). Policy optimization as wasserstein gradient flows. In 35th International Conference on Machine Learning Icml 2018 (Vol. 13, pp. 9134–9143).

Zhang, R., C. Chen, C. Li, and L. Carin. “Policy optimization as wasserstein gradient flows.” In 35th International Conference on Machine Learning Icml 2018, 13:9134–43, 2018.

Zhang R, Chen C, Li C, Carin L. Policy optimization as wasserstein gradient flows. In: 35th International Conference on Machine Learning Icml 2018. 2018. p. 9134–43.

Zhang, R., et al. “Policy optimization as wasserstein gradient flows.” 35th International Conference on Machine Learning Icml 2018, vol. 13, 2018, pp. 9134–43.

Zhang R, Chen C, Li C, Carin L. Policy optimization as wasserstein gradient flows. 35th International Conference on Machine Learning Icml 2018. 2018. p. 9134–9143.

Published In

35th International Conference on Machine Learning Icml 2018

Publication Date

January 1, 2018

Volume

Start / End Page

9134 / 9143