Skip to main content Skip to main navigation

Publikation

Neural posterior domain randomization

Fabio Muratore; Theo Gruner; Florian Wiese; Boris Belousov; Michael Gienger; Jan Peters
In: Aleksandra Faust; David Hsu; Gerhard Neumann (Hrsg.). Proceedings of the 5th Conference on Robot Learning. Conference on Robot Learning (CoRL-2021), November 8-11, London, United Kingdom, Pages 1532-1542, Proceedings of Machine Learning Research (PMLR), Vol. 164, PMLR, 2022.

Zusammenfassung

Combining domain randomization and reinforcement learning is a widely used approach to obtain control policies that can bridge the gap between simulation and reality. However, existing methods make limiting assumptions on the form of the domain parameter distribution which prevents them from utilizing the full power of domain randomization. Typically, a restricted family of probability distributions (e.g., normal or uniform) is chosen a priori for every parameter. Furthermore, straightforward approaches based on deep learning require differentiable simulators, which are either not available or can only simulate a limited class of systems. Such rigid assumptions diminish the applicability of domain randomization in robotics. Building upon recently proposed neural likelihood-free inference methods, we introduce Neural Posterior Domain Randomization (NPDR), an algorithm that alternates between learning a policy from a randomized simulator and adapting the posterior distribution over the simulator’s parameters in a Bayesian fashion. Our approach only requires a parameterized simulator, coarse prior ranges, a policy (optionally with optimization routine), and a small set of real-world observations. Most importantly, the domain parameter distribution is not restricted to a specific family, parameters can be correlated, and the simulator does not have to be differentiable. We show that the presented method is able to efficiently adapt the posterior over the domain parameters to closer match the observed dynamics. Moreover, we demonstrate that NPDR can learn transferable policies using fewer real-world rollouts than comparable algorithms.

Weitere Links