Removing the target network from deep Q-networks with the mellowmax operator
Deep Q-Network (DQX) is a learning algorithm that achieves humanlevel performance in high-dimensional domains like Atari games. We propose that using an softmax operator, Mellowmax, in DQN reduces its need for a separate target network, which is otherwise necessary to stabilize learning. We empirically show that, in the absence of a target network, the combination of Mellowmax and DQN outperforms DQN alone.
Kim, S; Asadi, K; Littman, M; Konidaris, G
Volume / Issue
Start / End Page
Electronic International Standard Serial Number (EISSN)
International Standard Serial Number (ISSN)
International Standard Book Number 13 (ISBN-13)