Publication

Non-parametric policy gradients: a unified treatment of propositional and relational domains

Kristian Kersting; Kurt Driessens

In: William W. Cohen; Andrew McCallum; Sam T. Roweis (Hrsg.). Machine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008). International Conference on Machine Learning (ICML-2008), June 5-9, Helsinki, Finland, Pages 456-463, ACM International Conference Proceeding Series, Vol. 307, ACM, 2008.

Abstract

Policy gradient approaches are a powerful instrument for learning how to interact with the environment. Existing approaches have focused on propositional and continuous domains only. Without extensive feature engineering, it is difficult - if not impossible - to apply them within structured domains, in which e.g. there is a varying number of objects and relations among them. In this paper, we describe a non-parametric policy gradient approach - called NPPG - that overcomes this limitation. The key idea is to apply Friedmann's gradient boosting: policies are represented as a weighted sum of regression models grown in an stage-wise optimization. Employing off-the-shelf regression learners, NPPG can deal with propositional, continuous, and relational domains in a unified way. Our experimental results show that it can even improve on established results.

Non-parametric policy gradients: a unified treatment of propositional and relational domains

Abstract

More links