On the Use of Artificial Neural Networks in Automatic Speech Recognition

Hassani, Adel; Ghorshi, Mohammad Ali Khayyat, Amir Ali Akbar

Please enable javascript in your browser.

On the Use of Artificial Neural Networks in Automatic Speech Recognition

Hassani, Adel | 2015

787 Viewed

Type of Document: M.Sc. Thesis
Language: English
Document No: 48217 (55)
University: Sharif University of Technology, International Campus, Kish Island
Department: Science and Engineering
Advisor(s): Ghorshi, Mohammad Ali; Khayyat, Amir Ali Akbar
Abstract:
In this thesis, the Artificial Neural Networks (ANN) will be used in Automatic Speech Recognition (ASR) instead of Hidden Markov Models (HMM). Hidden Markov Model is one of the most dominant Bayesian network technologies and is the most successful model in current ASR systems. However, excessive training time is a major issue in speech recognition based on Hidden Markov Model (HMM). This thesis presents an Artificial Neural Network language model for human speech by mapping the spectral features of speech namely the formants, cepstrum (Mel-Frequency Cepstral Coefficients (MFCCs)) and Power Spectral Density (PSD) as features of samples of specific words into a discrete vector space. The Artificial Neural Network is trained with feed forward and recurrent neural networks topologies to operate as a probability estimator. The smooth nature of the resulting distributions will reduce perplexity for limited data sets of the vocabulary. Therefore, optimized features extraction and discrete converting algorithms are employed for an efficient implementation of feed forward and Elman recurrent neural networks during training process. A word-class interpretation of the neural network inputs and outputs, which are defined by their relevant features, is demonstrated to obtain improved perplexity over formants, MFCCs and PSD model when training data is limited
Keywords:
Feedforward Neural Network ; Artificial Neural Network ; Automatic Speech Recognition ; Power Spectral Density (PSD)Analysis ; Formant Speech Synthesis ; Elman Recurrent Neural Networks (ERNN)

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code