Skip to main content Skip to main navigation

Publikation

Improving Sample Efficiency of Example-Guided Deep Reinforcement Learning for Bipedal Walking

Rustam Galljamov; Guoping Zhao; Boris Belousov; André Seyfarth; Jan Peters
In: IEEE-RAS International Conference on Humanoid Robots. IEEE-RAS International Conference on Humanoid Robots (Humanoids-2022), November 28-30, Ginowan, Okinawa, Japan, Pages 587-593, IEEE, 2022.

Zusammenfassung

Reinforcement learning holds a great promise of enabling bipedal walking in humanoid robots. However, despite encouraging recent results, training still requires significant amounts of time and resources, precluding fast iteration cycles of the control development. Therefore, faster training methods are needed. In this paper, we investigate a number of tech- niques for improving sample efficiency of on-policy actor-critic algorithms and show that a significant reduction in training time is achievable with a few straightforward modifications of the common algorithms, such as PPO and DeepMimic, tailored specifically towards the problem of bipedal walking. Action space representation, symmetry prior induction, and cliprange scheduling proved effective at reducing sample complexity by a factor of 4.5. These results indicate that domain-specific knowledge can be readily utilized to reduce training times and thereby enable faster development cycles in challenging robotic applications.

Weitere Links