Loading...

Predication of prosodic data in Persian text-to-speech systems using recurrent neural network

Farrokhi, A ; Sharif University of Technology | 2003

71 Viewed
  1. Type of Document: Article
  2. DOI: 10.1049/el:20031151
  3. Publisher: 2003
  4. Abstract:
  5. A simplified four-layer recurrent neural network (RNN) based architecture is introduced to generate prosodic information for improving naturalness in Persian text-to-speech (TTS) systems. The proposed RNN uses the first two layers at word level and the last two layers at syllable level to provide the TTS system with major prosodic parameters, including: pitch contour, energy contour, length of syllables, length and onset time of vowels, and duration of pauses. The experimental results show improvement of accuracy in prediction of prosodic parameters, as compared to similar prosody generation systems of higher complexity
  6. Keywords:
  7. Computer simulation ; Speech synthesis ; Speech analysis ; Speech ; Recurrent neural networks ; Linguistics
  8. Source: Electronics Letters ; Volume 39, Issue 25 , 2003 , Pages 1868-1869 ; 00135194 (ISSN)
  9. URL: https://digital-library.theiet.org/content/journals/10.1049/el_20031151