Publikation

Streaming Reinforcement Learning under Partial Observability with Real-Time Recurrent Learning

Noah Farr; Aryaman Reddi; Carlo D'Eramo; Jan Peters

In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2605.24709, Pages 1-16, arXiv, 2026.

Zusammenfassung

Streaming reinforcement learning has emerged as an online learning paradigm that con- forms to the restrictions of natural learning agents that process data incrementally, i.e. with a batch size of 1 and no replay buffer. While streaming RL has recently been shown to scale with deep function approximation with full observability, partially observable settings have remained out of reach. Truncated backpropagation through time collapses to a one-step gradient horizon under the streaming setting, and exact real-time recur- rent learning is prohibitively expensive. We close this gap using recurrent trace units, a diagonal recurrent architecture that enables exact RTRL with linear time and memory complexity in the parameter count, and show that they integrate cleanly into existing streaming algorithms across both discrete and continuous control. On a MemoryChain diagnostic with chain lengths from 2 to 128, our method sustains performance where streaming TBPTT(1) baselines using feedforward, GRU, and RTU networks collapse. On five POPGym tasks and on partially observable MuJoCo continuous control, the streaming approach is competitive with batched PPO on POPGym and recovers a sub- stantial fraction of batched performance on masked MuJoCo, despite using no replay buffer or batched updates.

Weitere Links

https://doi.org/10.48550/arXiv.2605.24709

2605.24709v1.pdf (pdf, 630 KB )