Harmonic Alternatives to Sine-Wave Speech

Tóth László, Kocsor András

Research Group on Artificial Intelligence
Hungarian Academy of Sciences, Szeged, Hungary

Sine-wave speech (SWS) is a three-tone replica of speech, conventionally created by matching each constituent sinusoid in amplitude and frequency with the corresponding vocal tract resonance (formant). We propose an alternative technique where we take a high-quality multicomponent sinusoidal representation and decimate this model so that there are only three components per frame. In contrast to SWS, the resulting signal contains only components that were present in the original signal. Consequently it preserves the harmonic fine structure of voiced speech. Perceptual studies indicate that this signal is judged more natural and intelligible than SWS. Furthermore, its tonal artifacts can mostly be eliminated by the introduction of only a few additional components, which leads to an intriguing speculation about grouping issues.