Sharif Digital Repository / Sharif University of Technology / Search result

Please enable javascript in your browser.

Search for: momtazi--s

0.005 seconds

Total 1 records

A new word clustering method for building n-gram language models in continuous speech recognition systems

, Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 18 June 2008 through 20 June 2008, Wroclaw ; Volume 5027 LNAI , 2008 , Pages 286-293 ; 03029743 (ISSN) ; 354069045X (ISBN); 9783540690450 (ISBN) Bahrani, M ; Sameti, H ; Hafezi, N ; Momtazi, S ; Sharif University of Technology

2008

Abstract

In this paper a new method for automatic word clustering is presented. We used this method for building n-gram language models for Persian continuous speech recognition (CSR) systems. In this method, each word is specified by a feature vector that represents the statistics of parts of speech (POS) of that word. The feature vectors are clustered by k-means algorithm. Using this method causes a reduction in time complexity which is a defect in other automatic clustering methods. Also, the problem of high perplexity in manual clustering methods is abated. The experimental results are based on "Persian Text Corpus" which contains about 9 million words. The extracted language models are evaluated...

Total 1 records