Loading...

Average voice modeling based on unbiased decision trees

Bahmaninezhad, F ; Sharif University of Technology | 2013

759 Viewed
  1. Type of Document: Article
  2. DOI: 10.1007/978-3-642-38847-7-12
  3. Publisher: 2013
  4. Abstract:
  5. Speaker adaptive speech synthesis based on Hidden Semi-Markov Model (HSMM) has been demonstrated to be dramatically effective in the presence of confined amount of speech data. However, we could intensify this effectiveness by training the average voice model appropriately. Hence, this study presents a new method for training the average voice model. This method guarantees that data from every speaker contributes to all the leaves of decision tree. We considered this fact that small training data and highly diverse contexts of training speakers are considered as disadvantages which degrade the quality of average voice model impressively, and further influence the adapted model and synthetic speech unfavorably. The proposed method takes such difficulties into account in order to train a tailored average voice model with high quality. Consequently, as the experiments indicate, the proposed method outweighs the conventional one not only in the quality of synthetic speech but also in similarity to the natural voice. Our experiments show that the proposed method increases the CMOS test score by 0.6 to the conventional one
  6. Keywords:
  7. Average voice models ; CMOS test ; Hidden semi-Markov models ; High quality ; Small training ; Speech data ; Synthetic speech ; Experiments ; Speech processing ; Speech synthesis ; Decision trees
  8. Source: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Mons ; Volume 7911 LNAI , June , 2013 , Pages 89-96 ; 03029743 (ISSN) ; 9783642388460 (ISBN)
  9. URL: http://link.springer.com/chapter/10.1007/978-3-642-38847-7_12