Natural vs. Synthesized Speech in Spoken Dialog Systems Research - Comparing the Performance of Recognition Results

Tatjana Scheffler, Roland Roller, Florian Kretzschmar, Sebastian Möller, Norbert Reithinger

In: Tim Flingscheidt, Walter Kellermann (Hrsg.). ITG-Fachbericht Sprachkommunikation 2012. ITG-Fachtagung (ITG-2012) September 26-28 Braunschweig Seiten 127-130 ISBN 978-3-8007-3455-9 VDE Verlag Berlin 2012.


In this paper, we test the effect of using speech synthe- sis when interacting with a spoken dialog system (SDS). We use a user simulation to connect our speech synthe- sis to a real, state-of-the-art automatic speech recognition (ASR) component deployed in a working commercial SDS via a standard telephone line. In a series of experiments, we compare human-machine dialogs and their recognition scores with simulated dialogs using synthesis. Our results show that a good text-to-speech synthesis configuration ri- vals human speech both in recognition scores as well as variability. This makes the speech interface in user simu- lation quite attractive.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence