Soft context clustering for F0 modeling in HMM-based speech synthesis

Khorram, S; Sameti, H King, S Sharif University of Technology

Please enable javascript in your browser.

Soft context clustering for F0 modeling in HMM-based speech synthesis

Khorram, S ; Sharif University of Technology | 2015

984 Viewed

Type of Document: Article
DOI: 10.1186/1687-6180-2015-2
Publisher: Springer International Publishing , 2015
Abstract:
This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional ‘hard’ decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this ‘divide-and-conquer’ approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure is developed. Both subjective and objective evaluations were conducted and indicate a considerable improvement over the conventional method
Keywords:
Decision tree-based clustering ; F0 modeling ; Maximum entropy model ; Soft context clustering ; Soft decision tree ; Binary trees ; Decision trees ; Entropy ; Hidden Markov models ; Markov processes ; Maximum likelihood ; Maximum likelihood estimation ; Membership functions ; Probability distributions ; Speech ; Speech synthesis ; Trees (mathematics) ; Trellis codes ; Context clustering ; F0 model ; HMM ; HMM-based speech synthesis ; Maximum entropy modeling ; Soft decision ; Statistical parametric speech synthesis ; Tree-based ; Data mining
Source: Eurasip Journal on Advances in Signal Processing ; Volume 2015, Issue 1 , January , 2015 ; 16876172 (ISSN)
URL: http://asp.eurasipjournals.springeropen.com/articles/10.1186/1687-6180-2015-2

Friend's email
Your name
Your email
enter code