Predication of prosodic data in Persian text-to-speech systems using recurrent neural network

Please enable javascript in your browser.

Farrokhi, A ; Sharif University of Technology | 2003

71 Viewed

Type of Document: Article
DOI: 10.1049/el:20031151
Publisher: 2003
Abstract:
A simplified four-layer recurrent neural network (RNN) based architecture is introduced to generate prosodic information for improving naturalness in Persian text-to-speech (TTS) systems. The proposed RNN uses the first two layers at word level and the last two layers at syllable level to provide the TTS system with major prosodic parameters, including: pitch contour, energy contour, length of syllables, length and onset time of vowels, and duration of pauses. The experimental results show improvement of accuracy in prediction of prosodic parameters, as compared to similar prosody generation systems of higher complexity
Keywords:
Computer simulation ; Speech synthesis ; Speech analysis ; Speech ; Recurrent neural networks ; Linguistics
Source: Electronics Letters ; Volume 39, Issue 25 , 2003 , Pages 1868-1869 ; 00135194 (ISSN)
URL: https://digital-library.theiet.org/content/journals/10.1049/el_20031151