Skip to main content Skip to main navigation

Publication

Robust Processing of Situated Spoken Dialogue

Pierre Lison
Mastersthesis, Universität des Saarlandes, 12/2008.

Abstract

Spoken dialogue is often considered as one of the most natural means of interaction between a human and a machine. It is, however, notoriously hard to process using NLP technology. As many corpus studies have shown, natural spoken dialogue is replete with disfluent, partial, elided or ungrammatical utterances, all of which are very hard to accommodate in a dialogue system. Furthermore, automatic speech recognition [ASR] is known to be a highly error-prone task, especially when dealing with complex, open-ended discourse domains. The combination of these two problems -- ill-formed and/or misrecognised speech inputs -- raises a major challenge to the development of robust dialogue systems. This thesis presents an integrated approach for addressing these issues in the context of domain-specific dialogues for human-robot interaction [HRI]. Several new techniques and algorithms have been developed to this end. They can be divided into two main lines of work. The first line of work pertains to speech recognition. We describe a new model for context-sensitive speech recognition, which is specifically suited to HRI. The underlying hypothesis is that, in situated human-robot interaction, ASR performance can be significantly improved by exploiting contextual knowledge about the physical environment (objects perceived in the visual scene) and the dialogue history (previously referred-to objects within the current dialogue). The language model is dynamically updated as the environment changes, and is used to establish expectations about uttered words which are most likely to be heard given the context. The second line of work deals with the robust parsing of spoken inputs. We present a new approach for this task, based on a incremental parser for Combinatory Categorial Grammar [CCG]. The parser takes word lattices as input and is able to handle ill-formed and misrecognised utterances by selectively relaxing and extending its set of grammatical rules. This operation is done via the introduction of non-standard CCG rules into the grammar. The choice of the most relevant interpretation is then realised via a discriminative model augmented with contextual information. The model includes a broad range of linguistic and contextual features, and can be trained with a simple perceptron algorithm. All the algorithms presented in this thesis are fully implemented, and integrated as part of a distributed cognitive architecture for autonomous robots. We performed an extensive evaluation of our approach using a set of Wizard of Oz experiments. The obtained results demonstrate very significant improvements in accuracy and robustness compared to the baseline.

Projekte