Cross-Lingual Speaker Adaptation for Statistical Parametric Speech Synthesis

Saleh, Fatemeh Sadat; Sameti, Hossein

Please enable javascript in your browser.

Cross-Lingual Speaker Adaptation for Statistical Parametric Speech Synthesis

Saleh, Fatemeh Sadat | 2013

591 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 44836 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Sameti, Hossein
Abstract:
Speech synthesis and its applications have been very attractive recently. The main purpose of this technique is to produce a speech signal with natural characteristics of human speech like prosody and emotion. Among all existing methods for speech synthesis, statistical parametric speech synthesis methods are more promising because ofhigher flexibility in comparison to other methods. One of the applications of speech synthesis is speech to speech translation. In these systems, the generated voice in target language should have the same characteristics as the input voice in source language. The main purpose of this research is to review and evaluate the cross lingual speaker adaptation methods in statistical parametric speech synthesis. In this research, for the first time, speech is synthesized in English based on hidden Markov models with the same characteristics as a Persian speaker whose speech data have been used as the adaptation data. Recent researches are mostly focused on phone mapping and state mapping of the source language average voice model to target language average voice model. In these methods, the source language adaptation data influencea single leaf of the target language average voice model’s tree. But in out proposed method a statistical layer is used to show the effect of adaptation data of the source language on multiple leaves of the target language average voice model’s tree. Moreover in this method, because of a full context mapping, prosody features are also mapped and this mapping results in a more natural synthesized speech. As the result of this research, a synthesized speech has been generated in the targetlanguage with more similarity to main speaker and with higher naturality in comparison to many recent methods
Keywords:
Avarage Voice Model ; Full Context Mapping ; Speech Synthesis ; Cross Lingual Speaker Adaptation ; Hidden Markov Model

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code