Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 52925 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Sameti, Hossein
- Abstract:
- Apart from the content of the language that expresses the speaker's purpose and intent, human speech also contains other content, including other information such as the identity of the speaker, his or her gender and approximate age, the Intonation and mode of expression, the feeling of the speaker, the parts emphasized in the speech and so on. "Voice conversion" seeks to change the speaker-dependent content in an audio signal so that speaker-independent content (especially language content) remains unchanged. In other words, the purpose in voice conversion is to change the audio signal of speech created by one person in order to create the notion that the same speech was spoken by someone else. This thesis first briefly outlines different approaches to solving this problem. It then focuses specifically on a specific architecture which is a vector quantized conditional autoencoder with a convolutional encoder and WaveNet as decoder. Finally, given the high computational cost of training this model, completely new tricks to better train this model on a single GPU are introduced. Specifically, one of these tricks is introducing a new method for sampling from WaveNet that, according to our evaluation, 73.2% of the times outputs obtained by this method are preferred over the conventional method
- Keywords:
- Deep Learning ; Sequence to Sequence Learning ; Voice Conversion ; Conditional Autoencoder ; Vector Quantized Autoencoder ; Convolutional Encoder ; Wavenet ; Sequence-to-Sequence Voice Conversion
-
محتواي کتاب
- view