speech manual





SPEECH(1)                                               SPEECH(1)


NAME
       speech - version 2.10

SYNOPSIS
       speech infile.ext [ outfile ] -types [ -m012 ]
       speech -i

DESCRIPTION
       speech converts infile.ext to the type given by types.  If
       more than one types are given, then  all  the  conversions
       are  performed  one  after another in the given order, and
       only the result of the last conversion is saved. The input
       type  is  determined  by ext only. The output extension is
       determined by the last output  type.  If  outfile  is  not
       given infile is used.

       speech  is  a  file conversion tool that helps playing and
       performing experiments with sounds. It is possible to make
       and  view  sound  preprocessing (spectrum, cepstrum, LPC),
       edit the resulting image, and convert it back to  a  sound
       file. For instance, you can even play your favorite images
       (well, probably after some editing), or  draw  words.  Not
       the less interesting way of learning phonology...  You can
       also stretch sounds preserving pitch with a high  quality.
       speech  was designed with the human speech in mind, but it
       is useful for other sound samples as well.

       The file types supported by speech are Windows PCM mono  (
       *.wav  ),  Windows  256  color  bitmap ( *.bmp ), and four
       other types namely spectogram ( *.spc ), phase ( *.phs  ),
       cepstrum  (energy  spectrum)  ( *.cps ) and LPC ( *.lpc ).
       The last four were designed by the developers.  Every  bmp
       file must have a corresponding spectogram information file
       ( *.spi ) that makes conversion to sound possible. It is a
       text  file  and  can be edited.  It is generated automati-
       cally when performing conversion to bitmap.   IMPORTANT  :
       The intensity information is in the palette index, not the
       color (i.e. color number 256 is the most intensive,  color
       number 0 is the less intensive).

OPTIONS
       -i     Sends the default ini file to standard output.

       -m     Controls the type of messages speech sends to stan-
              dard error while running.  -m0  means  no  messages
              appear, -m1 means only error messages are shown and
              -m2 means that both errors and  warning/information
              messages are shown.

TYPES
       b      Converts  to  a Windows bitmap. An spi file is also
              written containing information about the parameters
              of  the  discrete  Fourier transformation used when
              preprocessing sound data. This spi file  is  needed



                                                                1





SPEECH(1)                                               SPEECH(1)


              for conversions from bmp -s.
              Possible input: lpc, spc, phs, cps.

       c      Converts  to  a cps file that contains the cepstrum
              (energy spectrum) of the sound data.
              Possible input: wav.

       l      Converts to an LPC file.
              Possible input: wav.

       p      Converts to a phs  file  that  contains  the  phase
              information  of the discrete Fourier transformation
              of the sound data.
              Possible input: wav, bmp.

       s      Converts to an spc file (spectogram) that  contains
              the  spectrum  information  of the discrete Fourier
              transformation of the sound  data.  Note  that  lpc
              files are possible inputs.
              Possible input: wav, bmp, lpc, cps.

       w      Converts to a Windows PCM sound file.
              Possible input: wav, spc, dft.

FILES
       './bmp.pal':
              A  HSI  palette  that  is  used  for  creating  the
              bitmaps. It does not need to exist, the default  is
              a gray scale palette.

       './dft.spc':
              A  default  spectogram  file that is used when con-
              verting to sound formats from  types  that  contain
              only  phase  information.  Such conversions are not
              possible without this file.

       './dft.phs':
              A default phase file that is used  when  converting
              to sound formats from types that contain only spec-
              trum information. Such conversions are not possible
              without  this  file.  In some cases it can be over-
              written by speech (see INIFILE, [WAV] section).

       './speech.ini':
              File that contains the parameters for  the  conver-
              sions.   It  is  the only way to pass parameters to
              the conversion algorithms. If  the  file  does  not
              exist  speech  tries  to load the file given in the
              environment variable SPEECH_INI.  If it also  fails
              then defaults are used (see option -i ).

INIFILE
       Here  the  parameter  file  will  be described (see FILES,
       './speech.ini' ).  To get the current defaults see  option



                                                                2





SPEECH(1)                                               SPEECH(1)


       -i.

       [DFT]  These  values  will  effect  the  conversions  that
              involve performing a Fourier transformation on some
              sound  data.   We  will  assume  that the reader is
              familiar with the parameters  of  this  transforma-
              tions so we won't go into details.

               DataType
                       0 : short, 1 : long, 2 : float, 3 : double
                       The generated spc, phs and (temporary) dft
                       files  will  use  these  data types. Lower
                       values mean lower resolution  but  smaller
                       files.   When  getting  odd results (espe-
                       cially for cepstrum)  try  3.   Note  that
                       only  the  integer-type files are portable
                       at the moment; sorry...
               WindowType
                       0 : Rectangular, 1: Hamming, 2:  Blackman-
                       Harris
               FrameSize
                       Must be less than BlockSize.
               BlockSize
                       Must be a power of two.
               FrameDistance

       [LPC]  These  values  will  effect  the  conversions  that
              involve performing an LPC transformation.  We  will
              assume that the reader is familiar with the parame-
              ters of this transformations so  we  wont  go  into
              details. Use defaults if not sure.

               Coefficients
               WindowType
                       0  : Rectangular, 1: Hamming, 2: Blackman-
                       Harris
               FrameSize
               FrameDistance
               Preemphasis

       [BMP]

               ScaleType
                       0 : No scaling, 1 : Bark scale, 2  :  Cep-
                       strally  smoothed,  3: Cepstrally smoothed
                       with bark scale, 4 : PLP (pereptual linear
                       prediction)  like.   Bark transform on the
                       spectrum is  believed  to  describe  human
                       hearing  better.  Effective for spectogram
                       input only.

       [WAV]

               Iterations



                                                                3





SPEECH(1)                                               SPEECH(1)


                       Effective when  converting  spectogram  or
                       phase  to  sound. Setting 1 means a simple
                       conversion  using  './dft.phs'/'./dft.spc'
                       (see FILES ). Higher values will result in
                       an      iterative       improving       of
                       './dft.phs'/'./dft.spc'  and  so  a higher
                       quality    sound.    In     this     case,
                       './dft.phs'/'./dft.spc'  will  be  changed
                       after the conversion.
               Stretch
                       1 is uneffective, 2  will  result  in  the
                       generation  of a 2 times slower sound with
                       the same pitch.  Other values are  allowed
                       but  the  quality  is  highest (perfect:-)
                       with 2.

       [CPS]

               Smooth
                       This gives the minimal wave-length  to  be
                       used in cps to spc conversions.  The wave-
                       length is in  Hz.   If  you  set  200  for
                       instance,  then  the  resulting conversion
                       will smooth out the pitch information from
                       the  spectogram  if the pitch is less than
                       200 Hz.

ENVIRONMENT
       SPEECH_INI
              The name of the file containing the parameters  for
              the  conversions  performed by speech.  The default
              is './speech.ini'.

EXAMPLES
               speech -w voice.wav voice2

       Generates the file 'voice2.wav'.  If  Stretch  is  2  (see
       INIFILE  )  'voice2.wav' is two times slower with the same
       pitch.

               speech -sb voice.wav

       Generates the spectogram in 'voice.bmp' and writes an  spi
       file  'voice.spi'  containing  the  Fourier transformation
       parameters. (see FILES, './speech.ini' )

               speech -sw voice.bmp res

       Converts 'voice.bmp' to 'res.wav'  interpreting  it  as  a
       spectogram using the phase information in './dft.phs' (see
       FILES ). The Fourier transformation  parameters  are  read
       from 'voice.spi' which has to exist. HINT: if the image is
       drawn by you and is meant to be a spectogram  then  use  a
       phase   file  created  from  a  stable  vowel  (speech  -p



                                                                4





SPEECH(1)                                               SPEECH(1)


       vowel.wav dft) and set Iterations to more  than  one  (see
       INIFILE ).

BUGS
       Take  care  of  the file formats. The double-type spc, phs
       and dft files are not portable since they use  the  double
       format of the system in which they were created. The inte-
       ger formats are portable however.  Note that  outfile  and
       infile must be at least one character long.  There must be
       lots of bugs, bug reports are highly appreciated.
Back to speech homepage
Wed Nov 18 20:23:52 CET 1998