Loading...

A POS-based fuzzy word clustering algorithm for continuous speech recognition systems

Momtazi, S ; Sharif University of Technology | 2007

338 Viewed
  1. Type of Document: Article
  2. DOI: 10.1109/ISSPA.2007.4555528
  3. Publisher: 2007
  4. Abstract:
  5. Using word base n-gram language models in continuous speech recognition systems is so prevalent. For using this type of language models, we should extract them from large corpora. Since Persian corpora are not rich, therefore the extracted language models are not credible. For this reason, most researchers extract class n-grams instead of finding word n-grams. In this research a new idea for fuzzy word clustering is represented that each word can be assigned to more that one class. The Fuzzy c-mean algorithm is used for our clustering method and we have examined its various parameters of it. Finally, this algorithm was applied on 20000 most frequent Persian words extracted from "Persian Text Corpus ". The extracted language models are evaluated by perplexity criterion and the results show that a considerable reduction in perplexity has been achieved. Also, the results of this language model were evaluated on speaker independent continuous speech recognition system and improved the system accuracy. ©2007 IEEE
  6. Keywords:
  7. Boolean functions ; Clustering algorithms ; Computational linguistics ; Computer networks ; Continuous speech recognition ; Diesel engines ; Flow of solids ; Fuzzy clustering ; Linguistics ; Signal processing ; Software agents ; Speech ; Speech analysis ; Continuous speech ; Language modelling ; Speech recognition
  8. Source: 2007 9th International Symposium on Signal Processing and its Applications, ISSPA 2007, Sharjah, 12 February 2007 through 15 February 2007 ; 2007 ; 1424407796 (ISBN); 9781424407798 (ISBN)
  9. URL: https://ieeexplore.ieee.org/document/4555528?reload=true&arnumber=4555528