Sharif Digital Repository / Sharif University of Technology / Search result

Speech synthesis based on gaussian conditional random fields

, Article Communications in Computer and Information Science ; Vol. 427, issue , 2014 , p. 183-193 Khorram, S ; Bahmaninezhad, F ; Sameti, H ; Sharif University of Technology

Abstract

Hidden Markov Model (HMM)-based synthesis (HTS) has recently been confirmed to be the most effective method in generating natural speech. However, it lacks adequate context generalization when the training data is limited. As a solution, current study provides a new context-dependent speech modeling framework based on the Gaussian Conditional Random Field (GCRF) theory. By applying this model, an innovative speech synthesis system has been developed which can be viewed as an extension of Context-Dependent Hidden Semi Markov Model (CD-HSMM). A novel Viterbi decoder along with a stochastic gradient ascent algorithm was applied to train model parameters. Also, a fast and efficient parameter...

Context-dependent acoustic modeling based on hidden maximum entropy model for statistical parametric speech synthesis

, Article Eurasip Journal on Audio, Speech, and Music Processing ; Vol. 2014, Issue. 1 , 2014 ; ISSN: 1687-4714 Khorram, S ; Sameti, H ; Bahmaninezhad, F ; King, S ; Drugman, T ; Sharif University of Technology

Abstract

Decision tree-clustered context-dependent hidden semi-Markov models (HSMMs) are typically used in statistical parametric speech synthesis to represent probability densities of acoustic features given contextual factors. This paper addresses three major limitations of this decision tree-based structure: (i) The decision tree structure lacks adequate context generalization. (ii) It is unable to express complex context dependencies. (iii) Parameters generated from this structure represent sudden transitions between adjacent states. In order to alleviate the above limitations, many former papers applied multiple decision trees with an additive assumption over those trees. Similarly, the current...

Soft context clustering for F0 modeling in HMM-based speech synthesis

, Article Eurasip Journal on Advances in Signal Processing ; Volume 2015, Issue 1 , January , 2015 ; 16876172 (ISSN) Khorram, S ; Sameti, H ; King, S ; Sharif University of Technology

Springer International Publishing 2015

Abstract

This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional ‘hard’ decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this ‘divide-and-conquer’ approach leads to data sparsity, with the consequence that it suffers...

Improving Speech Signal Models for Statistical Parametric Speech Synthesis

, Ph.D. Dissertation Sharif University of Technology Khorram, Soheil (Author) ; Sameti, Hossein (Supervisor)

Abstract

Statistical parametric speech synthesis (SPSS) has dominated speech synthesis research area over the last decade, due to its remarkable advantages such as high intelligibility and flexibility. Decision tree-clustered context-dependent hidden semi-Markov models are typically used in SPSS to represent probability densities of acoustic features given contextual factors. This research addresses four major limitations of this decision tree-based structure: (a) The decision tree structure lacks adequate context generalization; (b) It is unable to express complex context dependencies; (c) Parameters generated from this structure represent sudden transitions between adjacent states; (e) This...

محتواي کتاب