Publikation

Iterated Deep Q-Network: Efficient Learning of Bellman Iterations for Deep Reinforcement Learning

Théo Vincent; Boris Belousov; Carlo D'Eramo; Jan Peters (Hrsg.)

European Workshop on Reinforcement Learning (EWRL-2023), European Workshop on Reinforcement Learning, 2023.

Zusammenfassung

Value-based reinforcement learning (RL) methods strive to obtain accurate approximations of optimal action-value functions. Notoriously, these methods heavily rely on the application of the optimal Bellman operator, which needs to be approximated from samples. Most approaches consider only a single Bellman iteration, which limits their power. In this paper, we introduce Iterated Deep Q-Network (iDQN), a new DQN-based algorithm that incorporates several consecutive Bellman iterations into the training loss. iDQN leverages the online network of DQN to build a target for a second online network, which in turn serves as a target for a third online network, and so forth, thereby taking into account future Bellman iterations. While using the same number of gradient steps, iDQN allows for better learning of the Bellman iterations compared to DQN. We evaluate iDQN against relevant baselines on 54 Atari 2600 games to showcase its benefit in terms of approximation error and performance. iDQN outperforms its closest baselines, DQN and Random Ensemble Mixture, while being orthogonal to more advanced DQN-based approaches.