Publikation

XQCfD: Accelerating Fast Actor-Critic Algorithms with Prior Data and Prior Policies

Daniel Palenicek; Florian Vogt; Joe Watson; Ingmar Posner; Danica Kragic; Jan Peters

In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2605.10734, Pages 1-22, arXiv, 2026.

Zusammenfassung

For reinforcement learning in the real world, online exploration is expensive. A common practice in robotic reinforcement learning is to incorporate additional data to improve sample efficiency. Expert demonstration data is often crucial for solving hard exploration tasks with sparse rewards. While prior data is used to augment experience and pre-train models, we show that the design of existing algorithms fails to achieve the sample efficiency that is possible in this setting due to a failure to use pretrained policies effectively. We propose XQCfD, which extends the sample-efficient XQC actor-critic to learn from demonstrations, using augmented replay buffers, pre-trained policies and stationary policy architectures, designed to avoid rapidly ‘unlearning’ the strong initial policy like prior works. We show our stationary network architecture enables policy improvement out-of- distribution better than standard network architectures due to its higher entropy predictions. XQCfD achieves state of the art performance across a range of complex manipulation tasks with sparse rewards from the popular Adroit, Robomimic and MimicGen benchmarks — notably, with a low update-to-data ratio and no ensemble networks.

Weitere Links

https://doi.org/10.48550/arXiv.2605.10734

2605.10734v1.pdf (pdf, 1 MB )