Loading...

A novel method to find appropriate ε for DBSCAN

Esmaelnejad, J ; Sharif University of Technology | 2010

623 Viewed
  1. Type of Document: Article
  2. DOI: 10.1007/978-3-642-12145-6_10
  3. Publisher: 2010
  4. Abstract:
  5. Clustering is one of the most useful methods of data mining, in which a set of real or abstract objects are categorized into clusters. The DBSCAN clustering method, one of the most famous density based clustering methods, categorizes points in dense areas into same clusters. In DBSCAN a point is said to be dense if the ε-radius circular area around it contains at least MinPts points. To find such dense areas, region queries are fired. Two points are defined as density connected if the distance between them is less than ε and at least one of them is dense. Finally, density connected parts of the data set extracted as clusters. The significant issue of such a method is that its parameters (ε and MinPts) are very hard for a user to guess. So, it is better to remove them or to replace them with some other parameters that are simpler to estimate. In this paper, we have focused on the DBSCAN algorithm, tried to remove the ε and replace it with another parameter named ρ (Noise ratio of the data set). Using this method will not reduce the number of parameters but the ρ parameter is usually much more simpler to set than the ε. Even in some applications the user knows the noise ratio of the data set in advance. Being a relative (not absolute) measure is another advantage of ρ over ε. We have also proposed a novel visualization technique that may help users to set the ε value interactively. Also experimental results have been represented to show that our algorithm gets almost similar results to the original DBSCAN with ε set to an appropriate value
  6. Keywords:
  7. Parameter Estimation ; Abstract object ; Clustering ; Clustering methods ; Data sets ; DBSCAN algorithm ; Density-based Clustering ; Noise ratio ; Novel methods ; Two-point ; Visualization technique ; Data mining ; Database systems ; Visualization
  8. Source: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 24 March 2010 through 26 March 2010 ; Volume 5990 LNAI, Issue PART 1 , 2010 , Pages 93-102 ; 03029743 (ISSN) ; 3642121446 (ISBN)
  9. URL: http://link.springer.com/chapter/10.1007%2F978-3-642-12145-6_10