Research
Group on Artificial Intelligence
Hungarian Academy of Sciences, Szeged, Hungary
Sine-wave speech (SWS) is a three-tone replica of speech, conventionally created
by matching each constituent sinusoid in amplitude and frequency
with the corresponding vocal tract resonance (formant). We propose
an alternative technique where we take a high-quality multicomponent
sinusoidal representation and decimate this model so that there
are only three components per frame. In contrast to SWS, the resulting
signal contains only components that were present in the original
signal. Consequently it preserves the harmonic fine structure of
voiced speech. Perceptual studies indicate that this signal is judged
more natural and intelligible than SWS. Furthermore, its tonal artifacts
can mostly be eliminated by the introduction of only a few additional
components, which leads to an intriguing speculation about grouping
issues.