Publication
Anticipatory action selection for human-robot table tennis
Zhikun Wang; Abdeslam Boularias; Katharina Mülling; Bernhard Schölkopf; Jan Peters
In: Artificial Intelligence (AIJ), Vol. 247, Pages 399-414, Elsevier, 2017.
Abstract
Anticipation can enhance the capability of a robot in its interaction with humans, where the robot predicts the humans' intention for selecting its own action. We present a novel framework of anticipatory action selection for human–robot interaction, which is capable to handle nonlinear and stochastic human behaviors such as table tennis strokes and allows the robot to choose the optimal action based on prediction of the human partner's intention with uncertainty. The presented framework is generic and can be used in many human–robot interaction scenarios, for example, in navigation and human–robot co-manipulation. In this article, we conduct a case study on human–robot table tennis. Due to the limited amount of time for executing hitting movements, a robot usually needs to initiate its hitting movement before the opponent hits the ball, which requires the robot to be anticipatory based on visual observation of the opponent's movement. Previous work on Intention-Driven Dynamics Models (IDDM) allowed the robot to predict the intended target of the opponent. In this article, we address the problem of action selection and optimal timing for initiating a chosen action by formulating the anticipatory action selection as a Partially Observable Markov Decision Process (POMDP), where the transition and observation are modeled by the IDDM framework. We present two approaches to anticipatory action selection based on the POMDP formulation, i.e., a model-free policy learning method based on Least-Squares Policy Iteration (LSPI) that employs the IDDM for belief updates, and a model-based Monte-Carlo Planning (MCP) method, which benefits from the transition and observation model by the IDDM. Experimental results using real data in a simulated environment show the importance of anticipatory action selection, and that POMDPs are suitable to formulate the anticipatory action selection problem by taking into account the uncertainties in prediction. We also show that existing algorithms for POMDPs, such as LSPI and MCP, can be applied to substantially improve the robot's performance in its interaction with humans.