Skip to main content

Optimistic Initialization for Exploration in Continuous Control

Publication ,  Conference
Lobel, S; Gottesman, O; Allen, C; Bagaria, A; Konidaris, G
Published in: Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
June 30, 2022

Optimistic initialization underpins many theoretically sound exploration schemes in tabular domains; however, in the deep function approximation setting, optimism can quickly disappear if initialized naïvely. We propose a framework for more effectively incorporating optimistic initialization into reinforcement learning for continuous control. Our approach uses metric information about the state-action space to estimate which transitions are still unexplored, and explicitly maintains the initial Q-value optimism for the corresponding state-action pairs. We also develop methods for efficiently approximating these training objectives, and for incorporating domain knowledge into the optimistic envelope to improve sample efficiency. We empirically evaluate these approaches on a variety of hard exploration problems in continuous control, where our method outperforms existing exploration techniques.

Duke Scholars

Published In

Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022

ISBN

9781577358763

Publication Date

June 30, 2022

Volume

36

Start / End Page

7612 / 7619
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Lobel, S., Gottesman, O., Allen, C., Bagaria, A., & Konidaris, G. (2022). Optimistic Initialization for Exploration in Continuous Control. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 (Vol. 36, pp. 7612–7619).
Lobel, S., O. Gottesman, C. Allen, A. Bagaria, and G. Konidaris. “Optimistic Initialization for Exploration in Continuous Control.” In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022, 36:7612–19, 2022.
Lobel S, Gottesman O, Allen C, Bagaria A, Konidaris G. Optimistic Initialization for Exploration in Continuous Control. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022. 2022. p. 7612–9.
Lobel, S., et al. “Optimistic Initialization for Exploration in Continuous Control.” Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022, vol. 36, 2022, pp. 7612–19.
Lobel S, Gottesman O, Allen C, Bagaria A, Konidaris G. Optimistic Initialization for Exploration in Continuous Control. Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022. 2022. p. 7612–7619.

Published In

Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022

ISBN

9781577358763

Publication Date

June 30, 2022

Volume

36

Start / End Page

7612 / 7619