Publikation
A Fast-Match Approach for Robust, faster than Real-Time Speaker Diarization
Y Huang; O. Vinyals; G. Friedland; Christian Müller; N. Mirghafori; C. Wooters
In: Proceedings of the tenth biannual IEEE workshop on Automatic Speech Recognition and Understanding. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU-2007), December 9-13, Kyoto, Japan, 2007.
Zusammenfassung
During the past few years, speaker diarization has achieved
satisfying accuracy in terms of speaker Diarization Error Rate
(DER). The most successful approaches, based on agglomerative
clustering, however, exhibit an inherent computational
complexity which makes real-time processing, especially in
combination with further processing steps, almost impossible.
In this article we present a framework to speed up agglomerative
clustering speaker diarization. The basic idea is
to adopt a computationally cheap method to reduce the hypothesis
space of the more expensive and accurate model selection
via Bayesian Information Criterion (BIC). Two strategies
based on the pitch-correlogram and the unscented-transform
based approximation of KL-divergence are used independently
as a fast-match approach to select the most likely
clusters to merge. We performed the experiments using the
existing ICSI speaker diarization system. The new system using
KL-divergence fast-match strategy only performs 14% of
total BIC comparisons needed in the baseline system, speeds
up the system by 41% without affecting the speaker Diarization
Error Rate (DER). The result is a robust and faster than
real-time speaker diarization system.