Loading...

Improving density-based methods for hierarchical clustering of web pages

Haghir Chehreghani, M ; Sharif University of Technology | 2008

557 Viewed
  1. Type of Document: Article
  2. DOI: 10.1016/j.datak.2008.06.006
  3. Publisher: 2008
  4. Abstract:
  5. The rapid increase of information on the web makes it necessary to improve information management techniques. One of the most important techniques is clustering web data. In this paper, we propose a new 3-phase clustering method that finds dense units in a data set using density-based algorithms. The distances in the dense units are stored in order in structures such as a min heap. In the extraction stage, these distances are extracted one by one, and their effects on the clustering process are examined. Finally, in the combination stage, clustering is completed using improved versions of well-known single and average linkage methods. All steps of the methods are performed in O(n log n) time complexity. The proposed methods have the benefit of low complexity, and experimental results show they generate clusters with high quality. Other experiments also show that they provide additional advantages, such as clustering by sampling. © 2008 Elsevier B.V. All rights reserved
  6. Keywords:
  7. Chlorine compounds ; Cluster analysis ; Flow of solids ; Industrial management ; Information management ; Management information systems ; Average linkage ; Clustering methods ; Clustering processes ; Clustering web ; Data sets ; Density-based ; Density-based algorithms ; Density-based approaches ; Hierarchical clustering ; High-quality ; Low-complexity ; Management techniques ; Single linkage ; Time complexities ; Web clustering ; Web pages ; Clustering algorithms
  8. Source: Data and Knowledge Engineering ; Volume 67, Issue 1 , 2008 , Pages 30-50 ; 0169023X (ISSN)
  9. URL: https://www.sciencedirect.com/science/article/abs/pii/S0169023X08000815