Loading...
Search for:
bigram
0.116 seconds
Utilizing Latent Topic Models for Persian Document Classification and Providing Appropriate Solutions to Improve It
, M.Sc. Thesis Sharif University of Technology ; Bahrani, Mohammad (Supervisor) ; Vazirnezhad, Bahram (Co-Advisor)
Abstract
Text classification accompanied by high precision has become a challenging issue in computational linguistics and natural language processing science. Proper data set accessibility, utilizing the best method and prominent linguistics features has been always regarded as the basic concern of this process. The following study relying on Bijan Khan Corpus is tried to represent keywords vectors of different documents using tf_idf. These vectors are regarded as an input for latent topic models algorithms including probabilistic latent semantic analysis. The output of this algorithm will be the documents feature vectors which will be later used in order to train different classifiers like K...