Loading...

Active Constrained Clustering Using Instance-Level Constraints Ranking

Abin, Ahmad Ali | 2014

578 Viewed
  1. Type of Document: Ph.D. Dissertation
  2. Language: Farsi
  3. Document No: 45917 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Beigy, Hamid
  7. Abstract:
  8. Cluster analysis is one of the basic tools for exploring the underlying structure of a given data set. It is being used in a wide variety of engineering and scientific disciplines. It can be broadly defined as the process of dividing a set of objects into clusters, each representing
    a meaningful sub-group of data. Clustering techniques require the definition of a similarity measure between patterns. This similarity measure is not easy to specify in the absence of any prior knowledge about cluster shapes. Recently, constrained clustering has been emerged as an efficient approach for data clustering and learning the similarity measure between patterns. It has become popular because it can take advantage of side information in the forms of mustlink and cannot-link constraints when it is available. Incorporating domain knowledge into the clustering by addition of constraints enables users to specify desirable properties of the result and improves the robustness of clustering algorithm. The initial work in constrained clustering has led to further study of the impact of incorporating constraints into clustering algorithms,particularly when applied to real-world problems. An important issue that has arisen in constrained clustering is minimizing the cost of constraint acquisition. Existing methods in constrained clustering assumed that the algorithm is fed with a good and passively chosen set of constraints. They reported the clustering performance averaged over multiple randomly generated constraints. This assumption is not applicable in real-life applications. A randomly selected constraint set does not always improve the quality of results. In addition, averaging over several trials is not possible in many applications because of the nature of the given problem or the cost and the difficulty of constraint acquisition.In this research the problem of constrained clustering along with active selection of clustering constraints is studied. Active selection of clustering constraints means selection of the most useful set of constraints in the forms of ML and CL constraints to be presented to the clustering algorithms. Research done in this document can be classified into three main areas:
    1) constrained clustering using side information, 2) Active selection of clustering constraints,and 3) joint constrained clustering and active selection of clustering constraints. At the first step, the author focused on the problem of constrained clustering considering three issues: 1)
    data structure preserving, 2) constraints importance degree, and 3) learning in known directions.The author has proposed a unified framework in this area that learns a distance metric in two linear and non-linear setups. In continue, a sequential approach has been proposed for active selection of clustering constraints. The proposed approach considers three basic issues:
    1) validity of constraints, 2) distribution of constraints, and 3) constraints utility dependencies in constraints selection process. Finally, the joint problem of constrained clustering and active selection of clustering constraints is studied in an aggregated framework. The aim of this study is to select useful constraints considering the current status of clustering algorithm and then using these constraints to improve the accuracy of clustering in an iterative model.Four important issues: 1) the non-linear property of data, 2) the overlaps of clusters, 3) the noise and outliers, and 4) dependency of constraints utility to clustering algorithm have been considered in this study. The proposed framework has extended the ideas from fuzzy and possibilistic domain to multiple kernel learning and configures it in a learning strategy based on side information. The evaluation of the proposed methods was done on some standard real and synthetic datasets
  9. Keywords:
  10. Clustering ; Core ; Constrained Clustering ; Must-Link Constraint ; Cannot-Link Constraint ; Active Constraint Selection

 Digital Object List

 Bookmark

...see more