Removing the target network from deep Q-networks with the mellowmax operator
Published
Conference Paper
© 2019 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved. Deep Q-Network (DQX) is a learning algorithm that achieves humanlevel performance in high-dimensional domains like Atari games. We propose that using an softmax operator, Mellowmax, in DQN reduces its need for a separate target network, which is otherwise necessary to stabilize learning. We empirically show that, in the absence of a target network, the combination of Mellowmax and DQN outperforms DQN alone.
Duke Authors
Cited Authors
- Kim, S; Asadi, K; Littman, M; Konidaris, G
Published Date
- January 1, 2019
Published In
- Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, Aamas
Volume / Issue
- 4 /
Start / End Page
- 2060 - 2062
Electronic International Standard Serial Number (EISSN)
- 1558-2914
International Standard Serial Number (ISSN)
- 1548-8403
International Standard Book Number 13 (ISBN-13)
- 9781510892002
Citation Source
- Scopus