Publication
Streaming Reinforcement Learning under Partial Observability with Real-Time Recurrent Learning
Noah Farr; Aryaman Reddi; Carlo D'Eramo; Jan Peters
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2605.24709, Pages 1-16, arXiv, 2026.
Abstract
Streaming reinforcement learning has emerged as an online learning paradigm that con-
forms to the restrictions of natural learning agents that process data incrementally, i.e.
with a batch size of 1 and no replay buffer. While streaming RL has recently been shown
to scale with deep function approximation with full observability, partially observable
settings have remained out of reach. Truncated backpropagation through time collapses
to a one-step gradient horizon under the streaming setting, and exact real-time recur-
rent learning is prohibitively expensive. We close this gap using recurrent trace units, a
diagonal recurrent architecture that enables exact RTRL with linear time and memory
complexity in the parameter count, and show that they integrate cleanly into existing
streaming algorithms across both discrete and continuous control. On a MemoryChain
diagnostic with chain lengths from 2 to 128, our method sustains performance where
streaming TBPTT(1) baselines using feedforward, GRU, and RTU networks collapse.
On five POPGym tasks and on partially observable MuJoCo continuous control, the
streaming approach is competitive with batched PPO on POPGym and recovers a sub-
stantial fraction of batched performance on masked MuJoCo, despite using no replay
buffer or batched updates.
