M     M  BBBB    RRRR      OOO    L        A
MM   MM  B   B   R   R    O   O   L       A A
M M M M  B   B   R   R   O     O  L      A   A
M  M  M  BBBB    RRR     O     O  L     AAAAAAA
M     M  B   B   R  R    O     O  L     A     A
M     M  B    B  R   R    O   O   L     A     A
M     M  BBBBB   R    R    OOO    LLLLL A     A

Version 2.03c

--------------------------------------------------------------
Table of Contents
--------------------------------------------------------------

1.0 License
2.0 A brief description of the MBROLA software
3.0 Distribution
4.0 Installation, and Tests
5.0 Format of input and output files - Limitations
6.0 Joining the MBROLA project as a user
7.0 Joining the MBROLA project as database provider
8.0 Acknowledgments
9.0 Contacting the author

--------------------------------------------------------------
1.0 License
--------------------------------------------------------------

This program  is  being provided to "you",   the licensee,  by Thierry
Dutoit, the "author",  under the following  license,  which applies to
any  program  or other work   which contains a   notice placed  by the
copyright holder saying it may be distributed under  the terms of this
license. The "program", below, refers to any such program or work.

By  obtaining, using and/or copying  this program, you  agree that you
have  read,    understood, and   will  comply   with  these  terms and
conditions:

Terms and conditions for the distribution of the program
--------------------------------------------------------

This program may not be sold or incorporated into any product which is
sold without prior permission from the author. 

When no charge  is made, this  program may  be copied and  distributed
freely, provided that  this   notice is copied and  distributed   with
it. Each time you redistribute  the program (or  any work based on the
program), the  recipient  automatically receives   a license from  the
original licensor  to copy or distribute the  program subject to these
terms and conditions.  You may not impose  any further restrictions on
the recipients'  exercise  of the rights  granted  herein. You are not
responsible  for  enforcing  compliance   by  third parties   to  this
License.

If you wish to incorporate the program  into other free programs whose
distribution conditions are different, write to the  author to ask for
permission.

If,  as a consequence  of a   court judgment or  allegation of  patent
infringement or  for any other  reason (not limited to patent issues),
conditions are  imposed on you (whether by   court order, agreement or
otherwise) that contradict the conditions of this license, they do not
excuse you   from  the conditions of   this   license.  If  you cannot
distribute so as to satisfy simultaneously your obligations under this
license and any other pertinent obligations, then as a consequence you
may not distribute   the program at all.   For  example, if  a  patent
license would not permit royalty-free redistribution of the program by
all those who receive copies directly or  indirectly through you, then
the only way  you could satisfy  both it and this  license would be to
refrain entirely from distribution of the program.

Terms and conditions on the use of the program
----------------------------------------------

Permission is  granted  to use    this software for    non-commercial,  
non-military purposes,    with and only with    the voice and language
databases  made  available by the author   from the MBROLA project www
homepage: 

   http://tcts.fpms.ac.be/synthesis/mbrola.html

In return, the author asks you to mention  the MBROLA reference paper:

    T. DUTOIT, V. PAGEL, N. PIERRET, F.  BATAILLE, O. VAN DER VRECKEN
    "The MBROLA Project: Towards a Set of High-Quality Speech
    Synthesizers Free of Use for Non-Commercial Purposes"
    Proc. ICSLP'96, Philadelphia, vol. 3, pp. 1393-1396.  

or,  for  a more general   reference  to Text-To-Speech synthesis, the
book: 

  An Introduction to Text-To-Speech Synthesis,
  forthcoming textbook, T. DUTOIT, Kluwer Academic Publishers, 1997.

in any scientific publication refering to work  for which this program
has been used. 

Disclaimer
----------

THIS  SOFTWARE CARRIES NO   WARRANTY, EXPRESSED OR IMPLIED.  THE  USER
ASSUMES ALL   RISKS, KNOWN  OR   UNKNOWN,  DIRECT OR  INDIRECT,  WHICH 
INVOLVE  THIS SOFTWARE  IN ANY  WAY. IN   PARTICULAR, THE AUTHOR  DOES
NOT TAKE ANY  COMMITMENT IN VIEW OF ANY  POSSIBLE THIRD  PARTY RIGHTS. 

--------------------------------------------------------------
2.0 A brief description of MBROLA v2.03
--------------------------------------------------------------

MBROLA v2.03 is  a speech synthesizer  based  on the  concatenation of
diphones. It takes a list of phonemes as input, together with prosodic
information  (duration of phonemes  and a piecewise linear description
of pitch),  and produces speech  samples on  16  bits (linear), at the
sampling  frequency  of the  diphone database.

It  is therefore NOT a Text-To-Speech (TTS) synthesizer, since it does
not accept raw text as input.  In order to obtain  a full TTS  system,
you need to use this synthesizer in combination with a text processing
system that produces phonetic and prosodic commands.

We maintain a web page with pointers to such freely available systems:

   http://tcts.fpms.ac.be/synthesis/mbrtts.html


This software is the heart of the MBROLA  project, the aim of which is
to  obtain  a set   a  speech synthesizers for  as   many languages as
possible, free of use for non-commercial applications. 

The terms of this project can be summarized as follows : 

  After some  official agreement between the  author of  this software
and the owner of a diphone database, the database  is processed by the
author and  adapted  to the  mbrola  format,  for free.  The resulting
mbrola diphone database  is made available  for non-commercial  use as
part of the MBROLA  project. Commercial rights  on the mbrola database
remain with the database provider,  for exclusive use with the  mbrola
software.  

The ultimate goal of this project is to  boost up academic research on
speech synthesis, and particulalry on prosody generation, known as one
of the biggest challenges  taken up by Text-To-Speech synthesizers for
the years to come.  

More details can be found at the MBROLA project homepage : 

   http://tcts.fpms.ac.be/synthesis

The   synthesizer uses a synthesis    method  known itself as  MBROLA.

--------------------------------------------------------------
3.0 Distribution
--------------------------------------------------------------

This distribution of mbrola contains the following files : 

   MBROLA.exe or MBROLA: An executable file of the synthesizer 
                 itself (depends on the computer supposed to run it)
   README.TXT  : This file

As   such, it requires    an  MBROLA language/voice  database to   run
properly. A French male voice sampled at 16kHz has been made available
by  the author. Additional   languages  and  voices   are or  will  be
available in  the context of  the  MBROLA project. Please  consult the
MBROLA project homepage:  

   http://tcts.fpms.ac.be/synthesis

--------------------------------------------------------------
4.0 Installation and Tests
--------------------------------------------------------------

The following computers/OS are currently supported :

   SUN Sparc 5/S5R4 (Solaris2.4)
   HPUX9.0 and HPUX10.0 tested on :
      HP-UX A.09.05 A 9000/712
      HP-UX A.09.05 A 9000/715
      HP-UX A.09.01 E 9000/755
      HP-UX B.10.01 A 9000/710
      Not tested on HP9000/735, but should work properly. 
   VAX/VMS V6.2 (V5.5-2 won't work)
      Tested on :
      VAXstation 3100-M76 
      VAXstation 4000-90A
      VAXstation 4000-60
   DECALPHA(AXP)/VMS 6.2
      Tested on :
      DEC 3000 - M600 
      DEC 2000 Model 300
      AlphaStation 200 4/233
      AlphaStation 200 4/166
   IBM RS6000 Aix 4.12
   PC486/DOS6 (but other PCs/DOSs should do, too)
   PC486/WIN31
   PC486/WIN95 
   PC/LINUX 1.2.11
   PCPentium120/Solaris2.4
   OS/2
   BeBox

Please send acknowledgement when mbrola works  on a machine not listed
here. 

See the MBROLA  Homepage if your computer or  OS is not supported yet. 

Assuming  you have  copied the  right  .zip  file, create a  directory
mbrola (although this is not critical), copy  the mbrXXX.zip file into
it (in which XXX stands for a version number), and unzip the file: 

   unzip mbrXXX.zip (or pkunzip on PC/DOS)

You are now ready to synthesize your first French words.

First try:  mbrola

to see the terms and conditions on the use of this software.

Then try: mbrola -h  

	 to get some help on how to use the software:

>USAGE: mbrola [-c CC] [-v VR] [-f FR] [-t TR] dbase command_file* output_file
> 
> A - instead of command_file or output_file means stdin or stdout
> Extension of output_file ( raw, au, wav, aiff) tells the wanted audio format
> 
> CC= Comment Char, escape sequence for a comment
> VR= Volume Ratio, float ratio applied to ouput samples
> FR= Frequency Ratio, float ratio applied to pitch points
> TR= Time Ratio, float ratio applied to phone durations

Now in order  to go further, you need  to get a  version of  an MBROLA
language/voice   database from the  MBROLA   project homepage. Let  us
assume  you  have   copied the  FR1    database and  referred   to the
accompanying fr1.txt file for its intallation. 

Then try: mbrola fr1 bonjour.pho bonjour.wav

it uses the  format:
       mbrola diphone_database command_file1 command_file2 ... output_file

and creates a sound file for the word 'bonjour'. 

Basically  output file is composed of signed  integer  numbers  on  16
bits,corresponding to samples at the sampling  frequency of the MBROLA
voice/language database (16 kHz for the diphone database supplied by
the author of MBROLA : Fr1). 
MBROLA can produce different audio file formats: .au, .wav,.aiff, .aif,
and .raw files depending  on the ouput_file extension. If the extension
is not recognized, the format is RAW (no header).

Optionnal parameters let you  shorten or lengthen synthetic speech and
transpose it by providing optional time and frequency ratios: 

   mbrola -t 1.2 -f 0.8 -v 0.7 fr1 bonjour.pho bonjour.wav

for instance,  will result in a   RIFF Wav file bonjour.wav  1.2 times
longer than the previous one  (slower rate), and containing speech  in
which  all fundamental frequency   values have been multiplied by  0.8
(sounds lower). You can also set the values of these coefficients
directly in a .pho file by adding special escape sequence like :

   ;; F=0.8
   ;; T=1.2

Option "-v" gives a VolumeRatio which multiplies each output sample.

The -c option lets you specify which symbol will be used as an escape
sequence for comments and commands in .pho files. The default value 
is the semi-colon ';', but you may want to change this if your
phonetic alphabet use this symbol, like in:

   mbrola -c ! fr1 test1.pho test2.pho test.wav

A - instead  of command_file or output_file means  stdin or stdout. On
multitasking machines, it is easy to run  the synthesizer in real time
to obtain audio output from the audio device, by using pipes. 

Below are a number of machine dependent hints for best using mbrola.

On MSDOS/Windows or OS/2
------------------------

Type: mbrola fr1 bonjour.pho bonjour.wav

Then you can play the RIFF Wav file with windows sound utility
On OS/2 pipes may be used just like below.

On modern Unix systems such as Solaris or HPUX or Linux
-------------------------------------------------------

mbrola fr1 bonjour.pho -.au | audioplay

where audioplay is your audio file player (* the name vary with the
platform, e.g. splayer for HPUX *)

If your audioplayer has problems with sun .AU files, try with .wav or
.raw

On Sun4 ( old audio interface )
-------------------------------

Try with:

   mbrola fr1 input.pho - | sox -t raw -sw -r 16000 - -t raw -Ub -r 8000 - > /dev/audio

(providing  you have   the  public domain sox   utility  developped by
Ircam). You  should hear   'bonjour'  without   the  need  to   create
intermediate files. 

Other solution:    The UTILITY.ZIP  file  available from   the  MBROLA
homepage provides RAW2SUN which does this conversion. 

On VAX or AXP workstations
--------------------------

To make   it easier  for   users to find  MBROLA,  you  should add the
following command to your system startup procedure: 

    $ DEFINE/SYSTEM/EXEC MBROLA_DIR disk:[dir]

where  "disk:[dir]" is the  name of the  directory you created for the
MBROLA_DIR files. You  could also  add  the following command to  your
system login command procedure: 

$ MBROLA   :== $MBROLA_DIR:MBROLA.EXE
$ RAW2SUN :== $MBROLA_DIR:RAW2SUN.EXE

to use the decsound device:

$ MCR DECSOUND - volume 40 -play sound.au 

See also the MBR_OLA.COM batch file in  the UTILITY.ZIP file available
from the  MBROLA Homepage if you cannot   play 16 bits  sound files on
your machine. 

--------------------------------------------------------------
5.0 Format of input and output files - Limitations
--------------------------------------------------------------

5.1 Phoneme commands
--------------------
The input file bonjour.pho in the above example simply contains : 

; bonjour 
_ 51 25 114
b 62 
o~ 127 48 170.42 
Z 110 53.5 116 
u 211 
R 150 50 91 
_ 91

This shows the format of the input data required  by MBROLA. Each line
contains a  phoneme name, a duration  (in ms), and a  series (possibly
none) of pitch  pattern points composed  of two float numbers each :
the  position of the pitch pattern  point within the  phoneme (in % of
its total duration), and the pitch value (in Hz) at this position.

In order to increase readability, it is also possible to enclose pitch
pattern points in parenthese. Hence, the first line of bonjour.pho
could be written :  

   _ 51 (25,114)

it tells the synthesizer to  produce  a silence of 51 ms, and to put a
pitch  pattern point of 114 Hz  at 25% of  51 ms. Pitch pattern points
define  a piecewise linear pitch curve.  Notice that the pitch pattern
they define is continuous, since the program automatically drops pitch
information when synthesizing unvoiced phones.

The data   on   each  line  are  separated by    blank   characters or
tabs. Comments can optionally be introduced in command files, starting
with a semi-colon ';'. This  default can be overrun with the -c option
of the command line. 

Another  special  escape  sequence  ';;'  allow the  user to introduce 
commands in the middle  of .pho files  as decribed  below. This escape
sequence is also affected by the -c option.

5.2 Changing the Freq Ratio or Time Ratio
-----------------------------------------
A command escape sequence containing a line like "T=xx" modifies the
time  ratio  to xx, the same  result is  obtained on the fundamental
frequency by replacing T with F, like in:

 ;; T = 1.2
 ;;F=0.8

5.3 Renaming phonemes in a set
------------------------------
Command escape sequences may also define renaming tables of for the
phoneme set. A line like:

;; RENAME A my_a

tells the synthesizer that the phoneme previously called A is now
called my_a. This facility is provided to make your life easier when
your Natural Language Processing unit does not complies to our SAMPA
alphabet. The only limitation is that the phoneme name can't contain
blank characters.

We suggest that you don't mix renaming commands and true .pho files,
for example grouping all your rename command in a '.set' file, and then
calling:

mbrola fr1 fr1.set command1.pho command2.pho output.wav

WARNING: circular renaming can lead to name collision, like in
;; RENAME y u
;; RENAME u ou

THIS LEADS TO UNPREDICTABLE RESULTS BECAUSE OF NAME COLLISIONS
(old y and u will be named as ou)

which should be written:
;; RENAME u ou
;; RENAME y u

When circuits in renaming can't be avoided, like in:
;; RENAME # _
;; RENAME _ #

you should write:

;; RENAME # temp
;; RENAME _ #
;; RENAME temp _

Once the renaming has occured there is absolutely NO PERFORMANCE DROPS
related to this renaming, so use it rather than a pre-processor. 

Before renaming anything as # check the paragraph below!

5.4 Flush the output stream
---------------------------
Notice, finally,that the  synthesizer   outputs  chunks of   synthetic
speech  determined    as sections  of    the   piecewise linear  pitch
curve. Phones inside  a section of  this curve are synthesized in  one
go.   The   last one of  each   chunk,  however,  cannot   be properly
synthesized while the next phone is not  known (since the program uses
diphones as base speech units). When using mbrola with pipes, this may
be a problem. Imagine, for instance,  that mbrola is  used to create a
pipe-based speaking clock on an HP:  

  speaking_clock | mbrola - -.au | splayer

which tells the time,  say, every 30 seconds.  The last phone of  each
time annoncement  will only be synthesized  when  the next annoncement
starts.  To  bypass this problem,   mbrola accepts  a  special command
phone, which flushes the synthesis buffer : "#" 

This default character can be replaced by another symbol thanks to the
command:

;; FLUSH new_flush_symbol

Limitations of the program
--------------------------
1. There may be up to  20 pitch pattern points in each phone, although
not more than three or four are sufficient to copy natural prosody. We
have set up a   higher limit so  as  to enable the   use of MBROLA  to
produce   synthetic  singing voices,  in which   case long vowels with
vibrato may require a large number of pitch pattern points. 

3. Phones can be synthesized with a  maximum duration which depends on
the fundamental frequency with which they are produced. The higher the
frequency, the   lower the duration. For a   frequency of 133  Hz, the
maximum duration is 7.5 sec. For a frequency of 66.5 Hz, is is 15 sec.
For a frequency of 266 Hz, is is 3.75 sec.

4. Although pitch pattern points are facultative, the synthesizer will
refuse to  produce sequences of  more than  250  phones with no  pitch
information. 

--------------------------------------------------------------
6.0 Joining the MBROLA project as a user 
--------------------------------------------------------------

For convenience, we have defined two mailing lists : 

* mbrola-interest@tcts.fpms.ac.be :  a forum for MBROLA  questions and
issues.  Users who have a  question, comment, think  they have found a
bug, or   simply who want to  share  .pho files  or  free applications
running on  top of mbrola should send  mail to  mbrola-interest. Users
interested in  discussing about   MBROLA  post   and read   from  this
group. Mbrola-interest@tcts.fpms.ac.be is also used by the maintainers
of the mbrola  project to annouce new  releases, bug fixes, new voices
and languages, and other information of  interest to all MBROLA users.  

It  is your interest,  as a user,  to subscribe to the mbrola-interest
mailing list, by sending an e-mail to : 
  
  mbrola-interest-request@tcts.fpms.ac.be

with the word 'subscribe'  in either the header or  the main  text. To
unsubscribe, just send another mail with 'unsubscribe'.

BUGS
----

If you detect a bug, or if you find an input  for which the quality of
the speech provided by mbrola  is not as  good as usual, first consult
the  FAQ file   from  the  MBROLA   Project homepage,   which will  be
frequently updated. 

If this is  of no help, send  a kind mail to mbrola@tcts.fpms.ac.be in
which you  include the .pho  file with  which the problem  appears and
mention your machine architecture. 

NEW DATABASES
-------------

If  you want to   participate to the   mbrola  project by  providing a
diphone database (i.e. a set of sample  files with one example of each
diphone in  your language), refer to the  mbrola WWW homepage, or send
an email to: mbrola@tcts.fpms.ac.be. 

APPLICATIONS
------------

If you have used  mbrola to build  speaking  apps on  top of it  (like
talking clocks,   talking  agendas,   talking tools  for   handicapped
persons, etc.,  and want  to make  it available  to the community (for 
free, of course, and for non-commercial, non-military applications, as
imposed by the mbrola license agreement),  just make an annoucement to
the mbrola mailing list:  

   mbrola-interest@tcts.fpms.ac.be.

COMMERCIAL VERSION
------------------

If you are interested in the commercial version of mbrola (source code
available), send  an email to   : mbrola@tcts.fpms.ac.be .

FEEDBACK
--------

If you simply find  this initiative useful, please  drop us a  note at
mbrola@tcts.fpms.ac.be. We have spent a lot of our time to provide you
with this program, and we would  like to get  some feedback in return. 

Don't forget, either, to mention the MBR-OLA reference paper :

    T. DUTOIT, V. PAGEL, N. PIERRET, F. BATAILLE, O. VAN DER VRECKEN
	 "The MBROLA Project: Towards a Set of High-Quality Speech
    Synthesizers Free of Use for Non-Commercial Purposes" 
	 Proc. ICSLP 96, Philadelphia, vol. 3, pp. 1393-1396

or, for a more    general reference to Text-To-Speech  synthesis,  the
book: 

  An Introduction to Text-To-Speech Synthesis,
  forthcoming textbook, T. DUTOIT, Kluwer Academic Publishers, 1997.

in any scientific publication refering to work  for which this program
has been used. 

--------------------------------------------------------------
7.0 Joining the MBROLA project as a database provider
--------------------------------------------------------------

One of the biggest interests of the MBROLA project (and definitely its
most original aspect) lies  in its ability to  provide an ever growing
set of languages/voices to users.

To achieve this goal, the MBROLA project has  itself been organized so 
as to incite other  research labs or  companies to share their diphone 
databases. 

The terms of this sharing policy can be summarized as follows : 

1. We shall only  use your database to  adapt it to the mbrola format, 
and destroy the copy when this is done.

2. The resulting mbrola  diphone  database will be copyright   Faculte
Polytechnique de  Mons - T.DUTOIT.  Non-commercial use of the database
in the framework  of the MBROLA  project will be automatically granted
to Internet users. In  return, we shall send  you a  license agreement
which will transfer all our commercial rights on  the database to you,
provided the database is used with and only with the MBROLA program.  

3. All these  details will be  fixed by some official agreement before
you send us anything.

If you want to create a database from scratch
---------------------------------------------

First, you should be aware that recording a diphone  database is not a
trivial operation. If it is not performed carefully, the result can be
deceiving. FR1,  for instance, required  about one month of  work, yet
with the help of some efficient laboratory  tools for signal recording
and editing. What is  more, some phonetic  knowledge of  the targetted
language is necessary to create the initial corpus.

So  if you just think  of designing a new  diphone database as a game,
forget it.

If, on the contrary, you are willing to spend some time to provide the
MBROLA community with a new language or voice, or  if you already have
a diphone database and wish to share it in  mbrola format (and receive
in return the rights  for any commercial   exploitation of the  mbrola
diphone database we will create for you), welcome here. 

If you still want to create a database from scratch
---------------------------------------------------

Creating a database is typically achieved in four steps: 

  * Creating a text corpus
  * Recording the corpus<BR><BR>
  * Segmenting the speech corpus
  * Equalizing diphones

Creating a text corpus
------------------------

Diphones are speech units that begin in the middle of the stable state
of  a phone and end  in  the middle of the  following  one. Their main
interest in  synthesis  is that they  minimize concatenation problems,
since they  involve  most  of  the transitions  and   co-articulations
between  phones,  while requiring an affordable  amount  of memory, as
their number remains relatively  small (as opposed to  other synthesis
units such as half-syllables or triphones).

Hence, the first step to build a diphone database consists of fixing a
list of all the phones of a language. Notice  that phones are acoustic
instances  of     phonemes. Phonemes are     themselves defined   on a
functional, linguistic level.

Obtaining a list of phones from a list  of phonemes requires to number
allophones, i.e. acoustic versions of some phonemes that significantly
differ     from  the standard  one,   mostly   due  to co-articulation
constraints. Although   it   is not  necessary  to   account   for all
allophonic  variations   to  build  an intelligible   synthesizer, the
naturalness of synthetic speech may be affected  if too few allophones
are considered. In FR1, for example, we did not consider allophones at
all. As a result, some allophonic  phenomena, such as devoicing of /R/
when followed or preceded by    unvoiced plosives, is only   partially
accounted for.

When a  complete list of phones  has emerged, including  allophones if
possible, a corresponding  list  of diphones is  immediately obtained,
and a  list of words  is carefully completed, in  such a way that each
diphones    appears  at    least    once  (twice    is     better, for
security). Unfavourable positions,  like inside stressed  syllables or
in  strongly reduced  (i.e.  over-co-articulated) contexts, should  be
excluded. One typically uses carrier sentences  in which the word with
the diphone  considered  is inserted. Notice  that  many diphones only
appear  in the association  of words  (i.e.  not  in single  words). A
number of diphones  even  never appear  at  all.  Hence, the task   of
creating a   text  corpus which contains  all   existing  ones is  not
trivial.  

Recording the corpus
--------------------

The   corpus is  then read, by    a professional speaker  if possible,
digitally recorded, and stored in digital format. 

IMPORTANT : In order for  the mbrola resynthesis operation to  achieve
best results, the  corpus  should be  read   with the  most  monotonic
intonation  possible   (just  like  when  reading   a long  and boring
enumeration). Even the end of words  should maintain their fundamental
frequency constant. Since this is a totally unnatural way of reading a
text, the speaker should train  before starting the recording session.  

NOTA BENE : If you already  have a diphone  database which you want to
make available in  mbrola format, contact  the author, even if  it has
not  been recorded with constant  pitch.  It is very  likely that your
database can be used anyway.

It is best to use high quality audio devices (microphone, pre-amp, A/D
converter). The sound  recording  tools provided with  many  low-price
commercial  boards, for example,  should  be avoided,  as they produce
undesired recording noise. 
To roughly test  the quality of your recording  system, just  plug the
microphone in, adjust  the  recording level, hold  your  breating, and
record. Or, if  you can, short  circuit the  microphone  entry of your
system, and record. See the  recording noise. In  the case of FR1, the
noise level  only corrupted the last three  bits of  our data, leaving 
thirteen significant bits.

Another important  type of   noise to   avoid  is ambiant  noise   and
reverberation. In  particular, the recording  should  be free  of  low
frequency  noises,   due to trucks  passing  in  the  neighborhood for
instance.  Most of the time you  won't  hear them, but your microphone
will  hardly fail to  detect them, especially if it  is a high quality
one.  The best way to  avoid them is to  install your recording system
inside a professional soundproof room. For  FR1, this is  what we did. 

Segmenting the corpus
---------------------

Once The corpus  has  been recorded, all   diphones must be   spotted,
either manually  with   the help of   signal  visualization  tools, or
automatically thanks   to  segmentation algorithms,  the  decisions of
which  are checked and  corrected interactively. A diphone database is
finally created, which centralizes  the results, in  the form of : the
name of diphones, the related waveforms,  their duration, and internal
sub-splittings.  As a  matter of  fact,   the position  of  the border
between phones  should  be stored,  so  as to be  able   to modify the
duration of one half-phone without  affecting the length of the  other
one.

NOTA   BENE : For   optimal results with mbrola,  it   is best to keep 
diphones  in   context.  The MBROLA   resynthesis   operation, indeed,
includes some pitch  analysis,  which  itself achieves more   accurate
results when, say, 50 ms of speech are  kept at the  left and right of
each diphone.

Equalizing diphones
-------------------

Since diphones  to  be chained up  have generally  been extracted from
different words, that  is in different  phonetic contexts,  they often
present   amplitude  and  timbre mismatches.     Even  in the case  of
stationary  vocalic   sounds, for  instance,    a rough sequencing  of
diphones typically leads to audible discontinuities. 

Amplitude  mismatches can  be coped  to some  extent  as early  as the
constitution of the   diphone database, thanks  to equalization.  This
operation smoothly modifies the energy levels  at the beginning and at
the end  of  segments,   in such a    way as  to   eliminate amplitude
mismatches (by setting the energy of all the phones of a given phoneme
to their average value). 

In  contrast,  timbre conflicts are    better tackled at  run-time, by 
the mbrola algorithm itself.

Notice, however, that equalization is only  facultative, as the mbrola
resynthesis operation (the one we shall perform to adapt your database
to the mbrola format) also includes some equalization facilities.

IMPORTANT
---------

If you want to build a new diphone database, please contact the author
first.  He  will help you  as  much as  he  can, by providing phonetic
information if available for instance.

In all cases,  make a first  dummy trial : create  a  small corpus for 
a few diphones,  record them, segment them,  equalize them if you can, 
and send the result directly  to the author.  He will test your  data,
tell you how  good it is, and what  should be done  to make it better. 

If you want to share an existing database
-----------------------------------------

Read the information  above to see  if your database has been designed
and recorded correctly. Contact the author (see below) anyway.

--------------------------------------------------------------
8.0 Acknowledgments
--------------------------------------------------------------

I would like  to  thank Vincent Pagel  for  his intensive programming, 
testing, and debugging of this program,  and for all sorts of fruitful
discussions.                                                           

Sam  Przyswa (Paris/FR), Fred Englert (Frankfurt/DE), Arnaud Gaudinat
(University of Geneva, CH), Cyrille Mastchenko (Paris/FR), Michael
C. Thornburgh (USA), Eric Keller (University of Lausanne,CH), Bruno
Langlois (Quebec/CA) and Christophe M. Vallat (Domerat/FR) for their
help in the compilation of MBROLA. 

Arnaud  Gaudinat, Vincent Pagel  and Michael M. Cohen (University of
 California - Santa Cruz) have arranged mirror sites. 

Last  but not least, I am  also greatly  indebted to Francois Bataille
for having supported the creation of this internet project.

--------------------------------------------------------------
9.0 Contacting the author
--------------------------------------------------------------
Dr Thierry Dutoit

Faculte Polytechnique de Mons, TCTS Lab,
31, bvd Dolez, B-7000 Mons, Belgium.
tel   : /32/65/374133
fax   : /32/65/374129
e-mail: mbrola@tcts.fpms.ac.be, for general information, 
questions on the installation of software and databases.
