Loading...

Cluster-based sparse topical coding for topic mining and document clustering

Ahmadi, P ; Sharif University of Technology | 2018

438 Viewed
  1. Type of Document: Article
  2. DOI: 10.1007/s11634-017-0280-3
  3. Publisher: Springer Verlag , 2018
  4. Abstract:
  5. In this paper, we introduce a document clustering method based on Sparse Topical Coding, called Cluster-based Sparse Topical Coding. Topic modeling is capable of improving textual document clustering by describing documents via bag-of-words models and projecting them into a topic space. The latent semantic descriptions derived by the topic model can be utilized as features in a clustering process. In our proposed method, document clustering and topic modeling are integrated in a unified framework in order to achieve the highest performance. This framework includes Sparse Topical Coding, which is responsible for topic mining, and K-means that discovers the latent clusters in documents collection. Experimental results on widely-used datasets show that our proposed method significantly outperforms the traditional and other topic model based clustering methods. Our method achieves from 4 to 39% improvement in clustering accuracy and from 2% to more than 44% improvement in normalized mutual information. © Springer-Verlag Berlin Heidelberg 2017
  6. Keywords:
  7. Document clustering ; K-means ; Sparse topical coding ; Topic model ; Cluster analysis ; Clustering algorithms ; Codes (symbols) ; Information retrieval ; Semantics ; Bag-of-words models ; Clustering accuracy ; Clustering process ; Normalized mutual information ; Topic Modeling ; Data mining
  8. Source: Advances in Data Analysis and Classification ; Volume 12, Issue 3 , 2018 , Pages 537-558 ; 18625347 (ISSN)
  9. URL: https://link.springer.com/article/10.1007/s11634-017-0280-3