Loading...

Nevisa, a Persian continuous speech recognition system

Sameti, H ; Sharif University of Technology | 2008

321 Viewed
  1. Type of Document: Article
  2. DOI: 10.1007/978-3-540-89985-3_60
  3. Publisher: 2008
  4. Abstract:
  5. In this paper we have reviewed Nevisa Persian speech recognition engine. Nevisa is an HMM-based, large vocabulary speaker-independent continuous speech recognition system. Like most successful recognition systems, MFCC with some modification has been used as speech signal features. It also utilizes a VAD based on signal energy and zero-crossing rate. Maximum likelihood estimation criterion the core of which are the classical segmental k-means and Baum-Welsh algorithms is used for training the acoustic models. The system is based on phoneme modeling and utilizes synchronous beam search based on lexicon tree for decoding the acoustic utterances. Language modeling for Persian has been implemented in two statistical (n-gram) and grammatical forms. Nevisa is equipped with out-of-vocabulary capability for applications with small size vocabulary. In order to compensate the effect of accuracy reduction in noisy environments, powerful robustness methods are utilized. Model-based approaches like PMC, MLLR, and MAP, feature robustness methods like CMS, PCA, RCC, and VTLN, and speech enhancement methods like spectral subtraction and Wiener filtering were investigated. Some of these methods were modified to achieve higher robustness. For training Nevisa, Farsdat database was used. To evaluate the system accuracy, a clean test set was selected from Farsdat and four noisy tasks with different noise types were recorded in different real environments. By taking the advantages of the robustness methods, performance of Nevisa in real environments is similar to clean condition. © 2008 Springer-Verlag
  6. Keywords:
  7. Grammar ; Language modeling ; Nevisa ; Persians ; Robustness ; Search and decoding ; Computational linguistics ; Computer science ; Continuous speech recognition ; Decoding ; Maximum likelihood estimation ; Natural language processing systems ; Speech enhancement ; Feature extraction
  8. Source: 13th International Computer Society of Iran Computer Conference on Advances in Computer Science and Engineering, CSICC 2008, Kish Island, 9 March 2008 through 11 March 2008 ; Volume 6 CCIS , 2008 , Pages 485-492 ; 18650929 (ISSN); 3540899847 (ISBN); 9783540899846 (ISBN)
  9. URL: https://link.springer.com/chapter/10.1007/978-3-540-89985-3_60