Search for: clustering-algorithms
0.007 seconds
Total 129 records

    A novel pre-processing method to reduce noise effects in a prototype-based clustering algorithm

    , Article 2008 International Conference on Information and Knowledge Engineering, IKE 2008, Las Vegas, NV, 14 July 2008 through 17 July 2008 ; July , 2008 , Pages 587-593 ; 1601320752 (ISBN); 9781601320759 (ISBN) Taghikhaki, Z ; Minaei, B ; Masoum, A ; Sharif University of Technology
    In this paper we introduce a preprocessing method to reduce noise effects in noise prone environments. Prototype based clustering algorithms are sensitive to noise because the effect of noisy data are as same as effect of true data and this affects on calculation of clusters center and then reduces accuracy. Therefore, these algorithms can not be applied in noise-prone environments and if this is applied there, we can not trust to the results. To overcome such problems we reduce and in some cases eliminate the noisy data. Also a part of our method is applied on the source of generated data in a network. Then noisy data that the number of them is high in noisy environments are eliminated and... 

    GDCLU: A new grid-density based clustring algorithm

    , Article Proceedings - 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, SNPD 2012, 8 August 2012 through 10 August 2012 ; August , 2012 , Pages 102-107 ; 9780769547619 (ISBN) Esfandani, G ; Sayyadi, M ; Namadchian, A ; Sharif University of Technology
    This paper addresses the density based clustering problem in data mining where clusters are established based on density of regions. The most well-known algorithm proposed in this area is DBSCAN [1] which employs two parameters influencing the shape of resulted clusters. Therefore, one of the major weaknesses of this algorithm is lack of ability to handle clusters in multi-density environments. In this paper, a new density based grid clustering algorithm, GDCLU, is proposed which uses a new definition for dense regions. It determines dense grids based on densities of their neighbors. This new definition enables GDCLU to handle different shaped clusters in multi-density environments. Also... 

    Unsupervised induction of persian semantic verb classes based on syntactic information

    , Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Warsaw ; Volume 7912 LNCS , June , 2013 , Pages 112-124 ; 03029743 (ISSN) ; 9783642386336 (ISBN) Aminian, M ; Rasooli, M. S ; Sameti, H ; Sharif University of Technology
    Automatic induction of semantic verb classes is one of the most challenging tasks in computational lexical semantics with a wide variety of applications in natural language processing. The large number of Persian speakers and the lack of such semantic classes for Persian verbs have motivated us to use unsupervised algorithms for Persian verb clustering. In this paper, we have done experiments on inducing the semantic classes of Persian verbs based on Levin's theory for verb classes. Syntactic information extracted from dependency trees is used as base features for clustering the verbs. Since there has been no manual classification of Persian verbs prior to this paper, we have prepared a... 

    GoSCAN: Decentralized scalable data clustering

    , Article Computing ; Volume 95, Issue 9 , 2013 , Pages 759-784 ; 0010485X (ISSN) Mashayekhi, H ; Habibi, J ; Voulgaris, S ; Van Steen, M ; Sharif University of Technology
    Identifying clusters is an important aspect of analyzing large datasets. Clustering algorithms classically require access to the complete dataset. However, as huge amounts of data are increasingly originating from multiple, dispersed sources in distributed systems, alternative solutions are required. Furthermore, data and network dynamicity in a distributed setting demand adaptable clustering solutions that offer accurate clustering models at a reasonable pace. In this paper, we propose GoScan, a fully decentralized density-based clustering algorithm which is capable of clustering dynamic and distributed datasets without requiring central control or message flooding. We identify two major... 

    An algorithm for discovering clusters of different densities or shapes in noisy data sets

    , Article Proceedings of the ACM Symposium on Applied Computing ; March , 2013 , Pages 144-149 ; 9781450316569 (ISBN) Khani, F ; Hosseini, M. J ; Abin, A. A ; Beigy, H ; Sharif University of Technology
    In clustering spatial data, we are given a set of points in Rn and the objective is to find the clusters (representing spatial objects) in the set of points. Finding clusters with different shapes, sizes, and densities in data with noise and potentially outliers is a challenging task. This problem is especially studied in machine learning community and has lots of applications. We present a novel clustering technique, which can solve mentioned issues considerably. In the proposed algorithm, we let the structure of the data set itself find the clusters, this is done by having points actively send and receive feedbacks to each other. The idea of the proposed method is to transform the input... 

    How to extend visibility polygons by mirrors to cover invisible segments

    , Article 11th International Conference and Workshops on Algorithms and Computation, WALCOM 2017, 29 March 2017 through 31 March 2017 ; Volume 10167 LNCS , 2017 , Pages 42-53 ; 03029743 (ISSN); 9783319539249 (ISBN) Vaezi, A ; Ghodsi, M ; Sharif University of Technology
    Springer Verlag  2017
    Given a simple polygon P with n vertices, the visibility polygon (V P) of a point q (V P(q)), or a segment (formula present) (V P(pq)) inside P can be computed in linear time. We propose a linear time algorithm to extend V P of a viewer (point or segment), by converting some edges of P into mirrors, such that a given non-visible segment (formula present) can also be seen from the viewer. Various definitions for the visibility of a segment, such as weak, strong, or complete visibility are considered. Our algorithm finds every edge such that, when converted to a mirror, makes (formula present) visible to our viewer. We find out exactly which interval of (formula present) becomes visible, by... 

    Attaining higher quality for density based algorithms

    , Article 1st International Conference on Web Reasoning and Rule Systems, RR 2007, Innsbruck, 7 June 2007 through 8 June 2007 ; Volume 4524 LNCS , 2007 , Pages 329-338 ; 03029743 (ISSN); 354072981X (ISBN); 9783540729815 (ISBN) Haghir Chehreghani, M ; Abolhassani, H ; Haghir Chehreghani, M ; Sharif University of Technology
    Springer Verlag  2007
    So far several methods have been proposed for clustering the web. On the other hand, many algorithms have been developed for clustering the relational data, but their usage for the Web is to be investigated. One main category of such algorithms is density based methods providing high quality results. In this paper first, a new density based algorithm is introduced and then it is compared with other algorithms of this category. The proposed algorithm has some interesting properties and capabilities such as hierarchical clustering and sampling, making it suitable for clustering the web data. © Springer-Verlag Berlin Heidelberg 2007  

    MSDBSCAN: Multi-density scale-independent clustering algorithm based on DBSCAN

    , Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 19 November 2010 through 21 November 2010, Chongqing ; Volume 6440 LNAI, Issue PART 1 , November , 2010 , Pages 202-213 ; 03029743 (ISSN) ; 3642173152 (ISBN) Esfandani, G ; Abolhassani, H ; Sharif University of Technology
    A good approach in data mining is density based clustering in which the clusters are constructed based on the density of shape regions. The prominent algorithm proposed in density based clustering family is DBSCAN [1] that uses two global density parameters, namely minimum number of points for a dense region and epsilon indicating the neighborhood distance. Among others, one of the weaknesses of this algorithm is its un-suitability for multi-density data sets where different regions have various densities so the same epsilon does not work. In this paper, a new density based clustering algorithm, MSDBSCAN, is proposed. MSDBSCAN uses a new definition for core point and dense region. The... 

    Active distance-based clustering using k-medoids

    , Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 19 April 2016 through 22 April 2016 ; Volume 9651 , 2016 , Pages 253-264 ; 03029743 (ISSN) ; 9783319317526 (ISBN) Aghaee, A ; Ghadiri, M ; Soleymani Baghshah, M ; Sharif University of Technology
    Springer Verlag  2016
    k-medoids algorithm is a partitional, centroid-based clustering algorithm which uses pairwise distances of data points and tries to directly decompose the dataset with n points into a set of k disjoint clusters. However, k-medoids itself requires all distances between data points that are not so easy to get in many applications. In this paper, we introduce a new method which requires only a small proportion of the whole set of distances and makes an effort to estimate an upperbound for unknown distances using the inquired ones. This algorithm makes use of the triangle inequality to calculate an upper-bound estimation of the unknown distances. Our method is built upon a recursive approach to... 

    Supplier selection using a clustering method based on a new distance for interval type-2 fuzzy sets: A case study

    , Article Applied Soft Computing Journal ; Volume 38 , 2016 , Pages 213-231 ; 15684946 (ISSN) Heidarzade, A ; Mahdavi, I ; Mahdavi Amiri, N ; Sharif University of Technology
    Elsevier Ltd  2016
    Supplier selection is a decision-making process to identify and evaluate suppliers for making contracts. Here, we use interval type-2 fuzzy values to show the decision makers' preferences and also introduce a new formula to compute the distance between two interval type-2 fuzzy sets. The performance of the proposed distance formula in comparison with the normalized Hamming, normalized Hamming based on the Hausdorff metric, normalized Euclidean and the signed distances is evaluated. The results show that the signed distance has the same trend as our method, but the other three methods are not appropriate for interval type-2 fuzzy sets. Using this approach, we propose a hierarchical... 

    UALM: unsupervised active learning method for clustering low-dimensional data

    , Article Journal of Intelligent and Fuzzy Systems ; Volume 32, Issue 3 , 2017 , Pages 2393-2411 ; 10641246 (ISSN) Javadian, M ; Bagheri Shouraki, S ; Sharif University of Technology
    In this paper the Unsupervised Active Learning Method (UALM), a novel clustering method based on the Active Learning Method (ALM) is introduced. ALM is an adaptive recursive fuzzy learning algorithm inspired by some behavioral features of human brain functionality. UALM is a density-based clustering algorithm that relies on discovering densely connected components of data, where it can find clusters of arbitrary shapes. This approach is a noise-robust clustering method. The algorithm first blurs the data points as ink drop patterns, then summarizes the effects of all data points, and finally puts a threshold on the resulting pattern. It uses the connected-component algorithm for finding... 

    A clustering fuzzification algorithm based on ALM

    , Article Fuzzy Sets and Systems ; Volume 389 , 2020 , Pages 93-113 Javadian, M ; Malekzadeh, A ; Heydari, G ; Bagheri Shouraki, S ; Sharif University of Technology
    Elsevier B.V  2020
    In this paper, we propose a fuzzification method for clusters produced from a clustering process, based on Active Learning Method (ALM). ALM is a soft computing methodology which is based on a hypothesis claiming that human brain interprets information in pattern-like images. The proposed fuzzification method is applicable to all non-fuzzy clustering algorithms as a post process. The most outstanding advantage of this method is the ability to determine the membership degrees of each data to all clusters based on the density and shape of the clusters. It is worth mentioning that for existing fuzzy clustering algorithms such as FCM the membership degree is usually determined as a function of... 

    A fuzzy clustering algorithm for finding arbitrary shaped clusters

    , Article 6th IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2008, Doha, 31 March 2008 through 4 April 2008 ; 2008 , Pages 559-566 ; 9781424419685 (ISBN) Soleymani Baghshah, M ; Bagheri Shouraki, S ; Sharif University of Technology
    Until now, many algorithms have been introduced for finding arbitrary shaped clusters, but none of these algorithms is able to identify all sorts of cluster shapes and structures that are encountered in practice. Furthermore, the time complexity of the existing algorithms is usually high and applying them on large dataseis is time-consuming. In this paper, a novel fast clustering algorithm is proposed. This algorithm distinguishes clusters of different shapes using a twostage clustering approach. In the first stage, the data points are grouped into a relatively large number of fuzzy ellipsoidal sub-clusters. Then, connections between sub-clusters are established according to the Bhatiacharya... 

    Communities detection for advertising by futuristic greedy method with clustering approach

    , Article Big Data ; Volume 9, Issue 1 , 2021 , Pages 22-40 ; 21676461 (ISSN) Bakhthemmat, A ; Izadi, M ; Sharif University of Technology
    Mary Ann Liebert Inc  2021
    Community detection in social networks is one of the advertising methods in electronic marketing. One of the approaches to find communities in large social networks is to use greedy methods, because these methods perform very fast. Greedy methods are generally designed based on local decisions; thus, inappropriate local decisions may result in an improper global solution. The use of a greedy improved index with a futuristic approach can, to some extent, prevent inappropriate local choices. Our proposed method determines the influential nodes in the social network based on the followers and following and new futuristic greedy index. It classifies the nodes based on the influential nodes by... 

    Scalable semi-supervised clustering by spectral kernel learning

    , Article Pattern Recognition Letters ; Vol. 45, issue. 1 , August , 2014 , p. 161-171 ; ISSN: 01678655 Soleymani Baghshah, M ; Afsari, F ; Bagheri Shouraki, S ; Eslami, E ; Sharif University of Technology
    Kernel learning is one of the most important and recent approaches to constrained clustering. Until now many kernel learning methods have been introduced for clustering when side information in the form of pairwise constraints is available. However, almost all of the existing methods either learn a whole kernel matrix or learn a limited number of parameters. Although the non-parametric methods that learn whole kernel matrix can provide capability of finding clusters of arbitrary structures, they are very computationally expensive and these methods are feasible only on small data sets. In this paper, we propose a kernel learning method that shows flexibility in the number of variables between... 

    An analytical delumping methodology for PC-SAFT with application to reservoir fluids

    , Article Fluid Phase Equilibria ; Volume 339 , 2013 , Pages 40-51 ; 03783812 (ISSN) Assareh, M ; Ghotbi, C ; Pishvaie, M. R ; Mittermeir, G. M ; Sharif University of Technology
    The strong bases statistical associated fluid theory (SAFT) equations of state allow modeling for a wide range of scales and applications. The equilibrium calculations are very time-consuming in SAFT-based family of equations of state; therefore the number of components used in describing a fluid mixture must be reduced by grouping. On the other hand, in some applications it is required to retrieve the detailed fluid description from equilibrium calculation performed on the lumped fluid description. The purpose of this paper is to develop a systematic approach for lumping and delumping with equilibrium calculations using the Perturbed Chain (PC)-SAFT equation of state. The methodology... 

    Using minimum matching for clustering with balancing constraints

    , Article 2009 Second ISECS International Colloquium on Computing, Communication, Control, and Management, CCCM 2009, Sanya, 8 August 2009 through 9 August 2009 ; Volume 1 , 2009 , Pages 225-228 ; 9781424442461 (ISBN) Shirali Shahreza, S ; Abolhassani, H ; Shirali Shahreza, M. H ; Yangzhou University; Guangdong University of Business Studies; Wuhan Institute of Technology; IEEE SMC TC on Education Technology and Training; IEEE Technology Management Council ; Sharif University of Technology
    Clustering is a major task in data mining which is used in many applications. However, general clustering is inappropriate for many applications where some constraints should be applied. One category of these constraints is the cluster size constraint. In this paper, we propose a new algorithm for solving the clustering with balancing constraints by using the minimum matching. We compare our algorithm with the method proposed by Banerjee and Ghosh that uses stable matching and show that our algorithm converge to the final solution in fewer iterations. ©2009 IEEE  

    Visibility extension via mirror-edges to cover invisible segments

    , Article Theoretical Computer Science ; Volume 789 , 2019 , Pages 22-33 ; 03043975 (ISSN) Vaezi, A ; Ghodsi, M ; Sharif University of Technology
    Elsevier B.V  2019
    Given a simple polygon P with n vertices, the visibility polygon (VP) of a point q, or a segment pq‾ inside P can be computed in linear time. We propose a linear time algorithm to extend the VP of a viewer (point or segment), by converting some edges of P into mirrors, such that a given non-visible segment uw‾ can also be seen from the viewer. Various definitions for the visibility of a segment, such as weak, strong, or complete visibility are considered. Our algorithm finds every edge that, when converted to a mirror, makes uw‾ visible to our viewer. We find out exactly which interval of uw‾ becomes visible, by every edge middling as a mirror, all in linear time. In other words, in this... 

    ECG beat classification based on a cross-distance analysis

    , Article 6th International Symposium on Signal Processing and Its Applications, ISSPA 2001, Kuala Lumpur, 13 August 2001 through 16 August 2001 ; Volume 1 , 2001 , Pages 234-237 ; 0780367030 (ISBN); 9780780367036 (ISBN) Shahram, M ; Nayebi, K ; Sharif University of Technology
    IEEE Computer Society  2001
    This paper presents a multi-stage algorithm for QRS complex classification into normal and abnormal categories using an unsupervised sequential beat clustering and a cross-distance analysis algorithm. After the sequential beat clustering, a search algorithm based on relative similarity of created classes is used to detect the main normal class. Then other classes are labeled based on a distance measurement from the main normal class. Evaluated results on the MIT-BIH ECG database exhibits an error rate less than 1% for normal and abnormal discrimination and 0.2% for clustering of 15 types of arrhythmia existing on the MIT-BIH database. © 2001 IEEE  

    A sensitivity study of FILTERSIM algorithm when applied to DFN modeling

    , Article Journal of Petroleum Exploration and Production Technology ; Vol. 4, issue. 2 , June , 2014 , p. 153-174 ; ISSN: 21900558 Ahmadi, R ; Masihi, M ; Rasaei, M. R ; Eskandaridalvand, K ; Shahalipour, R ; Sharif University of Technology
    Realistic description of fractured reservoirs demands primarily for a comprehensive understanding of fracture networks and their geometry including various individual fracture parameters as well as network connectivities. Newly developed multiple-point geostatistical simulation methods like SIMPAT and FILTERSIM are able to model connectivity and complexity of fracture networks more effectively than traditional variogrambased methods. This approach is therefore adopted to be used in this paper. Among the multiple-point statistics algorithms, FILTERSIM has the priority of less computational effort than does SIMPAT by applying filters and modern dimensionality reduction techniques to the...