Loading...

Harmony K-means algorithm for document clustering

Mahdavi, M ; Sharif University of Technology | 2009

676 Viewed
  1. Type of Document: Article
  2. DOI: 10.1007/s10618-008-0123-0
  3. Publisher: 2009
  4. Abstract:
  5. Fast and high quality document clustering is a crucial task in organizing information, search engine results, enhancing web crawling, and information retrieval or filtering. Recent studies have shown that the most commonly used partition-based clustering algorithm, the K-means algorithm, is more suitable for large datasets. However, the K-means algorithm can generate a local optimal solution. In this paper we propose a novel Harmony K-means Algorithm (HKA) that deals with document clustering based on Harmony Search (HS) optimization method. It is proved by means of finite Markov chain theory that the HKA converges to the global optimum. To demonstrate the effectiveness and speed of HKA, we have applied HKA algorithms on some standard datasets. We also compare the HKA with other meta-heuristic and model-based document clustering approaches. Experimental results reveal that the HKA algorithm converges to the best known optimum faster than other methods and the quality of clusters are comparable. © 2008 Springer Science+Business Media, LLC
  6. Keywords:
  7. Data-sets ; Document clustering ; Finite markov chains ; Global optimum ; Harmony search ; High qualities ; K-means algorithm ; Large data sets ; Local optimal solutions ; Markov chain ; Meta-heuristic ; Model-based ; Optimization methods ; Partition-based clustering ; Search engine results ; Web crawling ; Cluster analysis ; Global optimization ; Heuristic methods ; Information retrieval ; Information retrieval systems ; Information services ; Markov processes ; Optimization ; Search engines ; Security of data ; Clustering algorithms
  8. Source: Data Mining and Knowledge Discovery ; Volume 18, Issue 3 , 2009 , Pages 370-391 ; 13845810 (ISSN)
  9. URL: https://link.springer.com/article/10.1007/s10618-008-0123-0