Skip to main content Skip to main navigation

Publikation

Noise-conditioned Energy-based Annealed Rewards (NEAR): A Generative Framework for Imitation Learning from Observation

Anish Abhijit Diwan; Julen Urain; Jens Kober; Jan Peters
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2501.14856, Pages 1-22, arXiv, 2025.

Zusammenfassung

This paper introduces a new imitation learning framework based on energy-based generative models capable of learning complex, physics-dependent, robot mo- tion policies through state-only expert motion trajectories. Our algorithm, called Noise-conditioned Energy-based Annealed Rewards (NEAR), constructs several perturbed versions of the expert’s motion data distribution and learns smooth, and well-defined representations of the data distribution’s energy function using de- noising score matching. We propose to use these learnt energy functions as reward functions to learn imitation policies via reinforcement learning. We also present a strategy to gradually switch between the learnt energy functions, ensuring that the learnt rewards are always well-defined in the manifold of policy-generated sam- ples. We evaluate our algorithm on complex humanoid tasks such as locomotion and martial arts and compare it with state-only adversarial imitation learning algo- rithms like Adversarial Motion Priors (AMP). Our framework sidesteps the opti- misation challenges of adversarial imitation learning techniques and produces re- sults comparable to AMP in several quantitative metrics across multiple imitation settings. Code and videos available at anishhdiwan.github.io/noise-conditioned- energy-based-annealed-rewards/

Weitere Links