Sequence-to-Sequence Voice Conversion Using Deep Learning

Shadbash, Hamed; Sameti, Hossein

Please enable javascript in your browser.

Sequence-to-Sequence Voice Conversion Using Deep Learning

Shadbash, Hamed | 2019

473 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 52925 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Sameti, Hossein
Abstract:
Apart from the content of the language that expresses the speaker's purpose and intent, human speech also contains other content, including other information such as the identity of the speaker, his or her gender and approximate age, the Intonation and mode of expression, the feeling of the speaker, the parts emphasized in the speech and so on. "Voice conversion" seeks to change the speaker-dependent content in an audio signal so that speaker-independent content (especially language content) remains unchanged. In other words, the purpose in voice conversion is to change the audio signal of speech created by one person in order to create the notion that the same speech was spoken by someone else. This thesis first briefly outlines different approaches to solving this problem. It then focuses specifically on a specific architecture which is a vector quantized conditional autoencoder with a convolutional encoder and WaveNet as decoder. Finally, given the high computational cost of training this model, completely new tricks to better train this model on a single GPU are introduced. Specifically, one of these tricks is introducing a new method for sampling from WaveNet that, according to our evaluation, 73.2% of the times outputs obtained by this method are preferred over the conventional method
Keywords:
Deep Learning ; Sequence to Sequence Learning ; Voice Conversion ; Conditional Autoencoder ; Vector Quantized Autoencoder ; Convolutional Encoder ; Wavenet ; Sequence-to-Sequence Voice Conversion

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code