Deep Learning for Speech Recognition

Please enable javascript in your browser.

Azadi Yazdi, Saman | 2014

676 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 45288 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Sameti, Hossein
Abstract:
Speech recognition is one of the first goals of speech processing. Our goal in this thesis is to use deep learning for speech recognition. In recent years little improvement of speech recognition accuracies are reported. Deep learning is a new learning algorithm that results in improvement in many machine learning tasks. Following improvements reported in speech recognition in English language by deep learning, in this thesis we tried to improve accuracy over common and new recognition methods for Persian language.
First the overall structure of a typical speech recognition system is introduced. For this purpose, the modules of a speech recognition system are introduced. Deep multilayer perceptron is trained over Farsdat dataset which is a Persian languae dataset. Then its accuracy is compared with the common speech recognition approach, the HMM-GMM approach. Also to prove its advantage, its accuracy is compared whith a new approach in speech recognition, namely HMM-SGMM. The deep networks used resulted in improvement of %2.25 in Phoneme Error Rate compared to HMM-GMM and an improvement of %0.46 compared to HMM-SGMM
Keywords:
Speech Recognition ; Neural Networks ; Deep Learning ; Deep Multilayer Perceptron

No TOC