Publication
The ICSI 2007 Language Recognition System
Christian Müller; Joan-Isaac Biel
In: Proceeedings of the Odyssey 2008 Workshop on Speaker and Language Recognition. Odyssey Workshop on Speaker and Language Recognition (Odyssey-2008), January 21-24, Stellenbosch, South Africa, ISCA Archive, 2008.
Abstract
In this paper, we describe the ICSI 2007 language recognition
system. The core phonotactic part of the system constitutes
a variant of the classical PPRLM (parallel phone recognizer
followed by language modeling) approach. However, the
phone recognizers are replaced by "phone-like" acoustic subword
unit recognizers (SWR) that are trained in an unsupervised
fashion without making use of any phonetically labeled
data. Analogously, the backend language modeling is substituted
by SVMs, as a more powerful, discriminative classification
method. Besides sub-word unit n-grams, the SVM feature
vector includes vector quantized MFCC n-grams, representing
the short-term cepstral component of the system. A pair
of prosodic frontends is introduced as well by augmenting the
standard sub-word units with binned pitch and energy values,
respectively. Rank normalization is described as a normalization
method superior to mean-variance normalization for this
particular task. Preliminary results obtained on the LRE 2003
evaluation data set suggest that the multiplicity of frontends can
be effectively combined. The SWR frontend augmented with
pitch is performing best, followed by the standard SWR frontend.
The results are discussed and objective for (near) future
work are described.