Speaker Adaptation in HMM-Based Persian Speech Synthesis

Bahmaninezhad, Fahimeh | 2012

705 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 43487 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Sameti, Hossein
  7. Abstract:
  8. Text-to-speech synthesis, one of the key technologies in speech processing, is a technique for generating speech signal from arbitrarily given text with target speaker’s voice characteristics and various speaking styles and emotional expressions. Statistical parametric speech synthesishasrecently been shown to be very effective in generating acceptable synthesized speech. Therefore, in this study,the main focus is on one of the instances of these techniquescalled hidden Markov model-based speech synthesis. In text-to-speech systems, it is desirable to synthesize high quality speech using a small amount of speech data; this goal would be achieved by employing speaker adaptation framework and here, we conducted this method for the first time in Persian speech synthesis. In speaker adaptation systems, we firstly model the average voice using train data of several speakers, then with the help of adaptation data of the target speaker, the adapted model will be achieved by transforming the average voice model, and finally synthesized speech are obtained from this adapted model. At the beginning, the general structure of a hidden Markov model-based speech synthesis system and different parts of it are described. Then we introduced adaptation algorithms in speech synthesis framework; there after different adaptation techniques are examined. Implementations were done on FARSTAD database. Limitations caused by this database in the context of speech synthesis, reduce the quality of synthesized speech. Therefore, we have proposed a structure consistent with the characteristics of this database, in order to significantly enhance the quality of the synthesized speech.In accordance with the reported results,our proposed system rankedthe best among all the speech synthesis systems in terms of similarity and quality, and then conventional adaptation system and speaker-dependent system are ranked in CMOS and MOS scoring.

  9. Keywords:
  10. Persian Speech Synthesis ; Hidden Markov Model ; Avarage Voice Model ; Speaker Adaptation ; FARSDAT Database

 Digital Object List

  • محتواي پايان نامه
  •   view