|
Various Robust Search Methods in a Hungarian
Speech Recognition System
Gábor Gosztolya, András Kocsor, László
Tóth, László Felföldi
In any speech recognition application we have to identify spoken words,
based on the information provided by various features. In this process a
large number of word-combinations must be tried out, and the best fitting
ones must be chosen. A reduction of this search space (ie. word-sequences)
is quite important for both speed and memory reasons, because most of these
hypotheses will, for one reason or another, turn out to be quite unsuitable.
To tackle this problem, a number of standard algorithms are
available like Viterbi beam search, stack decoding,
forward-backward search and A* [1][2]. We have
implemented some of them and focused mainly on an extension of the
general purpose stack decoding method. Our OASIS Speech Laboratory
package incorporates most of these methods, which we then tested
on a set of (Hungarian) speech databases.
In order to find the best fitting word-sequences, language information is
obviously quite important. Incorporating this kind of knowledge into a speech
recognition system usually means some kind of language model has to be used.
Although this paper focuses on the search process, we cannot ignore another
related point, that of choosing a good representation for the Hungarian
language.
[1] Jelinek, F., Statistical Methods for Speech Recognition, The MIT Press,
1997.
[2] Huang, X., Acero, A., Hon, H.-W., Spoken Language Processing,
Prentice Hall PTR, 2001.
|
|