Publication
Speecher: Towards Privacy Ensuring Decoder Only Speech Reconstruction Through Disentanglement for German Speech Anonymization Using Any-to-Many Voice Conversion
Arnab Das; Carlos Franzreb; Suhita Ghosh; Tim Polzehl; Sebastian Möller
In: ISCA Archive. Symposium on Security and Privacy in Speech Communication (SPSC-2024), located at SPSC, ISCA, 9/2024.
Abstract
Voice conversion (VC) has emerged as an essential tool for speaker anonymization providing privacy in speech data. Recent reconstruction-based voice conversion (VC) frameworks learn to reconstruct speech by disentangling content, pitch, and speaker representations. Often these methods show poor content and prosody preservation. Furthermore, these models are constrained in their ability to execute cross-lingual voice conversion, where the source and target speech are from different languages due to the inherent coupling of the encoder and decoder components to specific languages within the model architecture. We propose the decoder-only reconstruction based VC framework Speecher, trained with perceptual losses, and demonstrate that speech features can be extracted from pre-trained networks without additional encoder training. A thorough objective and subjective study using German speech data reveals that our framework improves prosody and content preservation while maintaining anonymization capabilities.