Loading...

Large Vocabulary Isolated Word Recognition Using Neural Networks

Hajitabar, Alireza | 2017

521 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 49767 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Sameti, Hossein
  7. Abstract:
  8. Speech Recognition is an important topic in speech processing. In this thesis, we intend to do Isolated Word Recognition (IWR) a large vocabulary dataset. Previous works on large vocabulary IWR have used Hidden Markov Models, Gaussian Mixture Model and hybrid methods for this purpose, But our approach is based on Deep Neural Network (DNN). DNNs have shown excellent performance recently in different applications of voice and image processing. A key factor in speech recognition is the availability at appropriate datasets. There has been no acceptable speech corpus in Persian language for isolated word recognition before this work. In addition, Persian IWR systems reported so far are quite limited in terms of lexicon size and performance. In this research, we collected a large speech corpus and implemented a complete IWR system for large vocabulary Persian name recognition. The prepared corpus, called Farsname, contains more than 20,000 utterances including 5235 distinct words spoken by 226 speakers. The speakers are from all provinces in Iran and each speaker has read 88 names in average. All major smart cell phone brands have been used to record this dataset. We achieved a word error rate on 10.34% using Kaldi speech recognition toolkit on Farsname corpus. This performance is considered a good result knowing that the data has been recorded in non-closed and noisy environment. We utilized phonem-based END-to-END using DBLSTM-CTC in this research and improved the WER by 11.3%
  9. Keywords:
  10. Hidden Markov Model ; Gaussian Mixture Modeling ; Hybrid Methods ; Neural Networks ; Isolated Word Recognition ; Dataset Gathering

 Digital Object List

 Bookmark

No TOC