Removing the target network from deep Q-networks with the mellowmax operator

Published

Conference Paper

© 2019 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved. Deep Q-Network (DQX) is a learning algorithm that achieves humanlevel performance in high-dimensional domains like Atari games. We propose that using an softmax operator, Mellowmax, in DQN reduces its need for a separate target network, which is otherwise necessary to stabilize learning. We empirically show that, in the absence of a target network, the combination of Mellowmax and DQN outperforms DQN alone.

Duke Authors

Cited Authors

  • Kim, S; Asadi, K; Littman, M; Konidaris, G

Published Date

  • January 1, 2019

Published In

Volume / Issue

  • 4 /

Start / End Page

  • 2060 - 2062

Electronic International Standard Serial Number (EISSN)

  • 1558-2914

International Standard Serial Number (ISSN)

  • 1548-8403

International Standard Book Number 13 (ISBN-13)

  • 9781510892002

Citation Source

  • Scopus