SPEECH(1) SPEECH(1)
NAME
speech - version 2.10
SYNOPSIS
speech infile.ext [ outfile ] -types [ -m012 ]
speech -i
DESCRIPTION
speech converts infile.ext to the type given by types. If
more than one types are given, then all the conversions
are performed one after another in the given order, and
only the result of the last conversion is saved. The input
type is determined by ext only. The output extension is
determined by the last output type. If outfile is not
given infile is used.
speech is a file conversion tool that helps playing and
performing experiments with sounds. It is possible to make
and view sound preprocessing (spectrum, cepstrum, LPC),
edit the resulting image, and convert it back to a sound
file. For instance, you can even play your favorite images
(well, probably after some editing), or draw words. Not
the less interesting way of learning phonology... You can
also stretch sounds preserving pitch with a high quality.
speech was designed with the human speech in mind, but it
is useful for other sound samples as well.
The file types supported by speech are Windows PCM mono (
*.wav ), Windows 256 color bitmap ( *.bmp ), and four
other types namely spectogram ( *.spc ), phase ( *.phs ),
cepstrum (energy spectrum) ( *.cps ) and LPC ( *.lpc ).
The last four were designed by the developers. Every bmp
file must have a corresponding spectogram information file
( *.spi ) that makes conversion to sound possible. It is a
text file and can be edited. It is generated automati-
cally when performing conversion to bitmap. IMPORTANT :
The intensity information is in the palette index, not the
color (i.e. color number 256 is the most intensive, color
number 0 is the less intensive).
OPTIONS
-i Sends the default ini file to standard output.
-m Controls the type of messages speech sends to stan-
dard error while running. -m0 means no messages
appear, -m1 means only error messages are shown and
-m2 means that both errors and warning/information
messages are shown.
TYPES
b Converts to a Windows bitmap. An spi file is also
written containing information about the parameters
of the discrete Fourier transformation used when
preprocessing sound data. This spi file is needed
1
SPEECH(1) SPEECH(1)
for conversions from bmp -s.
Possible input: lpc, spc, phs, cps.
c Converts to a cps file that contains the cepstrum
(energy spectrum) of the sound data.
Possible input: wav.
l Converts to an LPC file.
Possible input: wav.
p Converts to a phs file that contains the phase
information of the discrete Fourier transformation
of the sound data.
Possible input: wav, bmp.
s Converts to an spc file (spectogram) that contains
the spectrum information of the discrete Fourier
transformation of the sound data. Note that lpc
files are possible inputs.
Possible input: wav, bmp, lpc, cps.
w Converts to a Windows PCM sound file.
Possible input: wav, spc, dft.
FILES
'./bmp.pal':
A HSI palette that is used for creating the
bitmaps. It does not need to exist, the default is
a gray scale palette.
'./dft.spc':
A default spectogram file that is used when con-
verting to sound formats from types that contain
only phase information. Such conversions are not
possible without this file.
'./dft.phs':
A default phase file that is used when converting
to sound formats from types that contain only spec-
trum information. Such conversions are not possible
without this file. In some cases it can be over-
written by speech (see INIFILE, [WAV] section).
'./speech.ini':
File that contains the parameters for the conver-
sions. It is the only way to pass parameters to
the conversion algorithms. If the file does not
exist speech tries to load the file given in the
environment variable SPEECH_INI. If it also fails
then defaults are used (see option -i ).
INIFILE
Here the parameter file will be described (see FILES,
'./speech.ini' ). To get the current defaults see option
2
SPEECH(1) SPEECH(1)
-i.
[DFT] These values will effect the conversions that
involve performing a Fourier transformation on some
sound data. We will assume that the reader is
familiar with the parameters of this transforma-
tions so we won't go into details.
DataType
0 : short, 1 : long, 2 : float, 3 : double
The generated spc, phs and (temporary) dft
files will use these data types. Lower
values mean lower resolution but smaller
files. When getting odd results (espe-
cially for cepstrum) try 3. Note that
only the integer-type files are portable
at the moment; sorry...
WindowType
0 : Rectangular, 1: Hamming, 2: Blackman-
Harris
FrameSize
Must be less than BlockSize.
BlockSize
Must be a power of two.
FrameDistance
[LPC] These values will effect the conversions that
involve performing an LPC transformation. We will
assume that the reader is familiar with the parame-
ters of this transformations so we wont go into
details. Use defaults if not sure.
Coefficients
WindowType
0 : Rectangular, 1: Hamming, 2: Blackman-
Harris
FrameSize
FrameDistance
Preemphasis
[BMP]
ScaleType
0 : No scaling, 1 : Bark scale, 2 : Cep-
strally smoothed, 3: Cepstrally smoothed
with bark scale, 4 : PLP (pereptual linear
prediction) like. Bark transform on the
spectrum is believed to describe human
hearing better. Effective for spectogram
input only.
[WAV]
Iterations
3
SPEECH(1) SPEECH(1)
Effective when converting spectogram or
phase to sound. Setting 1 means a simple
conversion using './dft.phs'/'./dft.spc'
(see FILES ). Higher values will result in
an iterative improving of
'./dft.phs'/'./dft.spc' and so a higher
quality sound. In this case,
'./dft.phs'/'./dft.spc' will be changed
after the conversion.
Stretch
1 is uneffective, 2 will result in the
generation of a 2 times slower sound with
the same pitch. Other values are allowed
but the quality is highest (perfect:-)
with 2.
[CPS]
Smooth
This gives the minimal wave-length to be
used in cps to spc conversions. The wave-
length is in Hz. If you set 200 for
instance, then the resulting conversion
will smooth out the pitch information from
the spectogram if the pitch is less than
200 Hz.
ENVIRONMENT
SPEECH_INI
The name of the file containing the parameters for
the conversions performed by speech. The default
is './speech.ini'.
EXAMPLES
speech -w voice.wav voice2
Generates the file 'voice2.wav'. If Stretch is 2 (see
INIFILE ) 'voice2.wav' is two times slower with the same
pitch.
speech -sb voice.wav
Generates the spectogram in 'voice.bmp' and writes an spi
file 'voice.spi' containing the Fourier transformation
parameters. (see FILES, './speech.ini' )
speech -sw voice.bmp res
Converts 'voice.bmp' to 'res.wav' interpreting it as a
spectogram using the phase information in './dft.phs' (see
FILES ). The Fourier transformation parameters are read
from 'voice.spi' which has to exist. HINT: if the image is
drawn by you and is meant to be a spectogram then use a
phase file created from a stable vowel (speech -p
4
SPEECH(1) SPEECH(1)
vowel.wav dft) and set Iterations to more than one (see
INIFILE ).
BUGS
Take care of the file formats. The double-type spc, phs
and dft files are not portable since they use the double
format of the system in which they were created. The inte-
ger formats are portable however. Note that outfile and
infile must be at least one character long. There must be
lots of bugs, bug reports are highly appreciated.
Back to speech homepage
Wed Nov 18 20:23:52 CET 1998