Loading...

A new word clustering method for building n-gram language models in continuous speech recognition systems

Bahrani, M ; Sharif University of Technology | 2008

754 Viewed
  1. Type of Document: Article
  2. DOI: 10.1007/978-3-540-69052-8_30
  3. Publisher: 2008
  4. Abstract:
  5. In this paper a new method for automatic word clustering is presented. We used this method for building n-gram language models for Persian continuous speech recognition (CSR) systems. In this method, each word is specified by a feature vector that represents the statistics of parts of speech (POS) of that word. The feature vectors are clustered by k-means algorithm. Using this method causes a reduction in time complexity which is a defect in other automatic clustering methods. Also, the problem of high perplexity in manual clustering methods is abated. The experimental results are based on "Persian Text Corpus" which contains about 9 million words. The extracted language models are evaluated by the perplexity criterion and the results show that a considerable reduction in perplexity has been achieved. Also reduction in word error rate of CSR system is about 16% compared with a manual clustering method. © 2008 Springer-Verlag Berlin Heidelberg
  6. Keywords:
  7. Artificial intelligence ; Chlorine compounds ; Computational linguistics ; Continuous speech recognition ; Database systems ; Error analysis ; Flow of solids ; Intelligent control ; Linguistics ; Speech ; Speech analysis ; Speech recognition ; Technology ; Vectors ; Automatic clustering ; Class n-gram models ; Continuous speech ; Feature vector ; Feature vectors ; International conferences ; K-means algorithms ; Language modelling ; Manual clustering ; N-gram language models ; Part of speech ; Persian ; Persian text corpus ; Text corpora ; Time complexities ; Word clustering ; Word error rate ; Intelligent systems
  8. Source: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 18 June 2008 through 20 June 2008, Wroclaw ; Volume 5027 LNAI , 2008 , Pages 286-293 ; 03029743 (ISSN) ; 354069045X (ISBN); 9783540690450 (ISBN)
  9. URL: https://link.springer.com/chapter/10.1007/978-3-540-69052-8_30