Publication
Solving Deep Memory POMDPs with Recurrent Policy Gradients
Daan Wierstra; Alexander Förster; Jan Peters; Jürgen Schmidhuber
In: Joaquim Marques de Sá; Luís A. Alexandre; Wlodzislaw Duch; Danilo P. Mandic (Hrsg.). Artificial Neural Networks - ICANN 2007, 17th International Conference, Proceedings. International Conference on Artificial Neural Networks (ICANN-2007), September 9-13, Porto, Portugal, Pages 697-706, Lecture Notes in Computer Science, Vol. 4668, Springer, 2007.
Abstract
This paper presents Recurrent Policy Gradients, a model-free reinforcement learning (RL) method creating limited-memory sto-chastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a “Long Short-Term Memory” architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task.