Loading...

Persian text classification based on topic models

Ahmadi, P ; Sharif University of Technology | 2016

686 Viewed
  1. Type of Document: Article
  2. DOI: 10.1109/IranianCEE.2016.7585495
  3. Publisher: Institute of Electrical and Electronics Engineers Inc , 2016
  4. Abstract:
  5. With the extensive growth in information, text classification as one of the text mining methods, plays a vital role in organizing and management information. Most text classification methods represent a documents collection as a Bag of Words (BOW) model and then use the histogram of words as the classification features. But in this way, the number of features is very large; therefore performing text classification faces serious computational cost problems. Moreover, the BOW representation is unable to recognize semantic relations between words. Recently, topic-model approaches have been successfully applied for text classification to overcome the problems of BOW. Our main goal in this paper is to investigate the possibility of applying the topic models for Persian text classification and compare between the feature processing techniques of BOW and the topic model based approaches. The experimental results show that the topic-model approach for representing the Persian documents yields at least 9% accuracy improvement compared to the BOW based algorithm. © 2016 IEEE
  6. Keywords:
  7. Persian text ; Topic models ; Data mining ; Information retrieval ; Information retrieval systems ; Semantics ; Text processing ; Bag of words ; Classification features ; Computational cost problems ; Document classification ; Management information ; Persians ; Text classification methods ; Topic model ; Classification (of information)
  8. Source: 24th Iranian Conference on Electrical Engineering, ICEE 2016, 10 May 2016 through 12 May 2016 ; 2016 , Pages 86-91 ; 9781467387897 (ISBN)
  9. URL: http://ieeexplore.ieee.org/document/7585495