Loading...
Cluster-based sparse topical coding for topic mining and document clustering
Ahmadi, P ; Sharif University of Technology | 2018
438
Viewed
- Type of Document: Article
- DOI: 10.1007/s11634-017-0280-3
- Publisher: Springer Verlag , 2018
- Abstract:
- In this paper, we introduce a document clustering method based on Sparse Topical Coding, called Cluster-based Sparse Topical Coding. Topic modeling is capable of improving textual document clustering by describing documents via bag-of-words models and projecting them into a topic space. The latent semantic descriptions derived by the topic model can be utilized as features in a clustering process. In our proposed method, document clustering and topic modeling are integrated in a unified framework in order to achieve the highest performance. This framework includes Sparse Topical Coding, which is responsible for topic mining, and K-means that discovers the latent clusters in documents collection. Experimental results on widely-used datasets show that our proposed method significantly outperforms the traditional and other topic model based clustering methods. Our method achieves from 4 to 39% improvement in clustering accuracy and from 2% to more than 44% improvement in normalized mutual information. © Springer-Verlag Berlin Heidelberg 2017
- Keywords:
- Document clustering ; K-means ; Sparse topical coding ; Topic model ; Cluster analysis ; Clustering algorithms ; Codes (symbols) ; Information retrieval ; Semantics ; Bag-of-words models ; Clustering accuracy ; Clustering process ; Normalized mutual information ; Topic Modeling ; Data mining
- Source: Advances in Data Analysis and Classification ; Volume 12, Issue 3 , 2018 , Pages 537-558 ; 18625347 (ISSN)
- URL: https://link.springer.com/article/10.1007/s11634-017-0280-3