Loading...

Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

Khorram, S ; Sharif University of Technology

1098 Viewed
  1. Type of Document: Article
  2. DOI: 10.1186/1687-4722-2014-12
  3. Abstract:
  4. Decision tree-clustered context-dependent hidden semi-Markov models (HSMMs) are typically used in statistical parametric speech synthesis to represent probability densities of acoustic features given contextual factors. This paper addresses three major limitations of this decision tree-based structure: (i) The decision tree structure lacks adequate context generalization. (ii) It is unable to express complex context dependencies. (iii) Parameters generated from this structure represent sudden transitions between adjacent states. In order to alleviate the above limitations, many former papers applied multiple decision trees with an additive assumption over those trees. Similarly, the current study uses multiple decision trees as well, but instead of the additive assumption, it is proposed to train the smoothest distribution by maximizing entropy measure. Obviously, increasing the smoothness of the distribution improves the context generalization. The proposed model, named hidden maximum entropy model (HMEM), estimates a distribution that maximizes entropy subject to multiple moment-based constraints. Due to the simultaneous use of multiple decision trees and maximum entropy measure, the three aforementioned issues are considerably alleviated. Relying on HMEM, a novel speech synthesis system has been developed with maximum likelihood (ML) parameter re-estimation as well as maximum output probability parameter generation. Additionally, an effective and fast algorithm that builds multiple decision trees in parallel is devised. Two sets of experiments have been conducted to evaluate the performance of the proposed system. In the first set of experiments, HMEM with some heuristic context clusters is implemented. This system outperformed the decision tree structure in small training databases (i.e., 50, 100, and 200 sentences). In the second set of experiments, the HMEM performance with four parallel decision trees is investigated using both subjective and objective tests. All evaluation results of the second experiment confirm significant improvement of the proposed system over the conventional HSMM
  5. Keywords:
  6. Decision tree-based context clustering ; Hidden Markov model (HMM)-based speech synthesis ; Maximum entropy ; Entropy ; Experiments ; Hidden Markov models ; Maximum entropy methods ; Speech synthesis ; Trees (mathematics) ; Context clustering ; Context-dependent acoustic modeling ; Hidden semi-Markov models ; Maximum entropy modeling ; Overlapped context clusters ; Parallel decision trees ; Speech synthesis system ; Statistical parametric speech synthesis ; Decision trees
  7. Source: Eurasip Journal on Audio, Speech, and Music Processing ; Vol. 2014, Issue. 1 , 2014 ; ISSN: 1687-4714
  8. URL: http://asmp.eurasipjournals.com/content/2014/1/12