A major obstacle for the acceptability of speech synthesis is its lack of expressivity. In order to convey emotions or other expressions appropriately, the sound of the synthetic voice would need to be changed; however, newer speech synthesis methods lack the possibility to influence the relevant parameters to the necessary extent.
In current speech synthesis technology, naturalness and flexibility are mutually exclusive: newer corpus-based unit selection synthesis methods often sound natural, but they can only realise a single speaking style, which is determined during the recordings of the speech corpus. In contrast, older methods such as formant or diphone synthesis are parametrisable but sound quite unnatural. There is currently no synthesis method combining the naturalness of corpus-based synthesis with the parametrisability of earlier systems.
The PAVOQUE project is to make a core contribution to reconciling synthesis quality and parametrisability. In a current corpus-based speech synthesis system, it carries out research on methods for the required parametrisation of the key parameters for vocal emotion expression: prosody (=intonation and rhythm) and voice quality. Two strategies are pursued: parameter-based selection of units from the corpus, and post-processing of the synthetic speech signal with signal manipulation methods. This will allow for a high degree of expressivity while maintaining good quality of the speech signal.