Prosody generation in TTS system for Azeri

Please enable javascript in your browser.

Damadi, M. S ; Sharif University of Technology | 2010

487 Viewed

Type of Document: Article
DOI: 10.1109/AIM.2010.5695772
Publisher: 2010
Abstract:
Naturalness in Text-to-Speech (TTS) systems is very important in achieving high quality waveform. The naturalness of the waveform is highly correlated to phonetic coverage and prosodic features such as loudness, duration and pitch. This paper addresses the implementation of a prosodic TTS for Azeri. The TTS system to which the prosodic information is added, is a concatenative synthesizer based on diphones. For adding prosody and increasing naturalness, we have obtained a primary pitch curve for each word, based on the location of the stressed syllable. Also using sentence type effects, the final pitch contour has been modified. As far as we know, the output speech that is produced with this system is the first prosodic Azeri synthetic speech ever created. High intelligibility and acceptable naturalness of the synthesized speech have been confirmed by subjective listening tests
Keywords:
Intonation ; Stress ; Concatenation ; Diphone ; F contour ; Intonation ; Pitchpattern ; Asymptotic analysis ; Mechatronics ; Speech intelligibility ; Speech synthesis ; Intelligent mechatronics
Source: IEEE/ASME International Conference on Advanced Intelligent Mechatronics, AIM, 6 July 2010 through 9 July 2010, Montreal, QC ; 2010 , Pages 1335-1338 ; 9781424480319 (ISBN)
URL: http://ieeexplore.ieee.org/document/5695772