Speaker Adaptation in Eigen Voice Space for Statistical Parametric Speech Syntheis

Shams, Boshra; Sameti, Hossein

Please enable javascript in your browser.

Speaker Adaptation in Eigen Voice Space for Statistical Parametric Speech Syntheis

Shams, Boshra | 2014

567 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 45313 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Sameti, Hossein
Abstract:
Recently various speaker adaptation methods in HMM-based speech synthesis are proposed. The importance of adaptation techniques is that we can design a system in which speech is generated with high quality and target speaker characteristics through limited adaptation data sets.
In this research, we focus on adaptation based on clustering and develop a new and novel method using eigenvoices in order to adapt a new speaker. We employ this approach for the first time in HSMM-based speech synthesis systems and its goal is to reduce the parameters and adaptation data of the system. In our proposed method, first some speaker dependent models are trained. For each model we combine the parameters of the systems to extract supervectors. After that by applying PCA we calculate the eigenvectors. We could reduce the dimension of the new space by choosing only K eigenvectors known as eigenvoices. In this way, each model is presented by only a few parameters. For this purpose, the adapted model is a linear combination of eigenvoices and through an algorithm based on ML estimation and adaptation data we estimate the weights to construct the adapted model in k-dimensional subspace. In comparison to previous methods, in our approach the parameters of the model are reduced and the method results in generating a synthesis system with high quality by only little adaptation data.
In this project, we implemented our proposed system based on two different methods of unifying context clustering decision tree of speaker dependent models. The first is done by the use of MDL principle and the second one is using MLLR-MAP adaptation techniques for unification. In accordance with the reported results, CMOS test shows a little improvement in synthesized speech quality. Furthermore, we adapted each stream independently, resulting in significant enhancement of the synthesized speech quality. We evaluated our proposed method by MOS with use of different amount of adaptation data in terms of quality and similarity, as well. The experiments show that in our method synthesized speech is rapidly converging to a desirable quality compared to other methods
Keywords:
Speaker Adaptation ; Avarage Voice Model ; Speech Synthesis ; Hidden Markov Model ; Eigenspace ; Eigenvoice ; Supervector

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code