Publication
Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning With Iterated Q-Learning
Théo Vincent; Yogesh Tripathi; Tim Lukas Faust; Yaniv Oren; Jan Peters; Carlo D'Eramo
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2506.04398, Pages 1-22, arXiv, 2025.
Abstract
The use of target networks in deep reinforcement learning is a widely popular solu-
tion to mitigate the brittleness of semi-gradient approaches and stabilize learning.
However, target networks notoriously require additional memory and delay the
propagation of Bellman updates compared to an ideal target-free approach. In
this work, we step out of the binary choice between target-free and target-based
algorithms. We introduce a new method that uses a copy of the last linear layer of
the online network as a target network, while sharing the remaining parameters
with the up-to-date online network. This simple modification enables us to keep the
target-free’s low-memory footprint while leveraging the target-based literature. We
find that combining our approach with the concept of iterated Q-learning, which
consists of learning consecutive Bellman updates in parallel, helps improve the
sample-efficiency of target-free approaches. Our proposed method, iterated Shared
Q-Learning (iS-QL), bridges the performance gap between target-free and target-
based approaches across various problems, while using a single Q-network, thus
being a step forward towards resource-efficient reinforcement learning algorithms.
