Persian Speech Synthesis Using Hidden Markov Models

Bahaadini, Sara; Sameti, Hossein

Please enable javascript in your browser.

Persian Speech Synthesis Using Hidden Markov Models

Bahaadini, Sara | 2011

730 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 43090 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Sameti, Hossein
Abstract:
Scattered and little research in the field of Persian speech synthesis systems has been performed during the last ten years. Comprehensive framework that properly implements and adapts statistical speech synthesis methods for Persian has not been conducted yet. In this thesis, recent statistical parametric speech synthesis methods including CLUSTERGEN, traditional HMM-based speech synthesis and its STRAIGHT version, are implemented and adapted for Persian language. CCR test is carried out to compare these methods with each other and with unit selection method. Listeners Score samples based on CMOS. The methods were ranked by averaging the CCR scores. The results show that STRAIGHT-based system produces the best quality by 2.02 score. Traditional HMM-based and unit selection are second and third in quality ranking by -0.13 score and -0.14 score respectively. Finally CLUSTERGEN produces the worst quality among these four systems by -1.7 score.
One of the main problems of traditional HMM-based system is the buzziness of the produced speech. It is because of the excitation signal. In this thesis a new statistical parametric speech synthesis system is bulit. In this system a new vocoder which uses residual signal as an ideal excitation is proposed. Resdidual signal is compositions of deterministic plus stochastic parts. We use propose phoneme based modeling. In order to model determistic part PCA coeficiente are used. Two methods are proposed for determistic part modeling: Hard and soft method. In hard modelling PCA tranformation is used when phoenme changes. This causes sudden change in produced speech, which makes it unnatural. For solving this problem a SPD PCA is propsed. In this method each farme belongs to three phonemes with different percentages. For stochastic part modelling, five different methods are proposed. Filter bank and energy power are used in thsese methods. In the proposed statistical parametric speech synthesis method four groups of featres, which are mel cepstral, fundamental frequency, PCA coefficients and noise coefficients are used. These are modeled by multi stream HMM. PCA and fundamental frequency uses MSD HMM as they do not have value in unvoiced parts. The proposed system is compared to STRAIGHT and traditional HMM based system. STRAIGHT produced the best quality by 1.35 score, proposed system is second in quality ranking by -0.83 score and traditional HMM based is the last by -1.52 score
Keywords:
Hidden Markov Model ; Persian Speech Synthesis ; Residual Signal ; Comparison Category Rating (CCR)Test ; SPD-PCA Method

Digital Object List

محتواي پايان نامه
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code