Sharif Digital Repository / Sharif University of Technology / Search result

A combined approach based on K-means and modified electromagnetism-like mechanism for data clustering

, Article International Journal of Information Technology and Decision Making ; Volume 16, Issue 5 , 2017 , Pages 1279-1307 ; 02196220 (ISSN) Mehdizadeh, E ; Teimouri, M ; Zaretalab, A ; Akhavan Niaki, S. T ; Sharif University of Technology

Abstract

Clustering is one of the useful methods in many scientific fields. It is a classification process to group data in specific clusters based on their similarities. Many heuristic and meta-heuristic algorithms have been successfully applied in the literature to solve clustering problems. Among them, the K-means is one of the best due to its simplicity and computational efficiency. However, it suffers from several drawbacks, the most significant of which is its dependency on the initial state that leads to trapping in local optima. In this paper, the K-means method is combined with a modified electromagnetism-like mechanism (MEM) algorithm to develop a new algorithm called K-MEM in order to...

Supervised fuzzy partitioning

, Article Pattern Recognition ; Volume 97 , 2020 Ashtari, P ; Nateghi Haredasht, F ; Beigy, H ; Sharif University of Technology

Elsevier Ltd 2020

Abstract

Centroid-based methods including k-means and fuzzy c-means are known as effective and easy-to-implement approaches to clustering purposes in many applications. However, these algorithms cannot be directly applied to supervised tasks. This paper thus presents a generative model extending the centroid-based clustering approach to be applicable to classification and regression tasks. Given an arbitrary loss function, the proposed approach, termed Supervised Fuzzy Partitioning (SFP), incorporates labels information into its objective function through a surrogate term penalizing the empirical risk. Entropy-based regularization is also employed to fuzzify the partition and to weight features,...

K-means-G*: Accelerating k-means clustering algorithm utilizing primitive geometric concepts

, Article Information Sciences ; Volume 618 , 2022 , Pages 298-316 ; 00200255 (ISSN) Ismkhan, H ; Izadi, M ; Sharif University of Technology

Elsevier Inc 2022

Abstract

The k-means is the most popular clustering algorithm, but, as it needs too many distance computations, its speed is dramatically fall down against high-dimensional data. Although, there are some quite fast variants proposed in literature, but, there is still much room for improvement against high-dimensional large-scale datasets. What proposed here, k-means-g*, is based on a simple geometric concept. For four distinct points, if distance between all pairs except one pair are known, then, a lower bound can be determined for the unknown distance. Utilizing this technique in the assignment step of the k-means, many high-dimensional distance computations can be easily ignored, where small amount...

Preclustering algorithms for imprecise points

, Article Algorithmica ; Volume 84, Issue 6 , 2022 , Pages 1467-1489 ; 01784617 (ISSN) Abam, M. A ; de Berg, M ; Farahzad, S ; Haji Mirsadeghi, M. O ; Saghafian, M ; Sharif University of Technology

Springer 2022

Abstract

We study the problem of preclustering a set B of imprecise points in Rd: we wish to cluster the regions specifying the potential locations of the points such that, no matter where the points are located within their regions, the resulting clustering approximates the optimal clustering for those locations. We consider k-center, k-median, and k-means clustering, and obtain the following results. Let B: = { b1, … , bn} be a collection of disjoint balls in Rd, where each ball bi specifies the possible locations of an input point pi. A partition C of B into subsets is called an (f(k) , α) -preclustering (with respect to the specific k-clustering variant under consideration) if (i) C consists of...

List Estimation

, M.Sc. Thesis Sharif University of Technology Shahrivari, Farzad (Author) ; Amini, Arash (Supervisor) ; Aminzadeh Gohari, Amin (Co-Advisor)

Abstract

Let X be an unknown vector of size n which is to be estimated from a known m 1 vector Y. According to the MMSE criterion, the best estimator (denoted bX(Y)) is an estimator which minimizes the mean squared error. Now, consider a List Decodingproblem in which the sender delivers a list of codes instead of a single decoder. Assume that it is allowed to use multiple parallel estimators (bX1 (Y); ^X2(Y); : : : ; bX k(Y)) instead of delivering a single estimation of samples. The goal is to find the best possible list of estimators, in a way that the mean squared error is optimized between the multiple bX i(Y); (i = 1; 2; : : : ; k). As a medical example, imagine a MRI device which produces three...

محتواي کتاب

Home Healthcare Routing and Scheduling Problem During the Covid-19 Pandemic with Uncertainties

, M.Sc. Thesis Sharif University of Technology Nabavizadeh Rafsanjani, Najmeh (Author) ; Rafiee, Majid (Supervisor)

Abstract

In summary, this thesis presents a new mathematical model for Home Health Care (HHC) services during the Corona era, which aims to increase the efficiency and quality of services provided by these organizations while ensuring compliance with quarantine protocols. The model is an extension of the VRPPDTW formulation and considers relevant features such as patient classification, caregiver classification, work and break regulations, workload balancing, and multi-depot capabilities. The optimization can be performed with two separate objective functions, one to minimize traveling and idle costs and the other to minimize the total working time of caregivers. The contradictions between two...

محتواي کتاب

Preclustering algorithms for imprecise points

, Article 17th Scandinavian Symposium and Workshops on Algorithm Theory, SWAT 2020, 22 June 2020 through 24 June 2020 ; Volume 162 , 2020 Abam, M. A ; de Berg, M ; Farahzad, S ; Haji Mirsadeghi, M. O ; Saghafian, M ; Sharif University of Technology

Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing 2020

Abstract

We study the problem of preclustering a set B of imprecise points in Rd: we wish to cluster the regions specifying the potential locations of the points such that, no matter where the points are located within their regions, the resulting clustering approximates the optimal clustering for those locations. We consider k-center, k-median, and k-means clustering, and obtain the following results. Let B := {b1, . . ., bn} be a collection of disjoint balls in Rd, where each ball bi specifies the possible locations of an input point pi. A partition C of B into subsets is called an (f(k), α)preclustering (with respect to the specific k-clustering variant under consideration) if (i) C consists of...

Online exams and the COVID-19 pandemic: a hybrid modified FMEA, QFD, and k-means approach to enhance fairness

, Article SN Applied Sciences ; Volume 3, Issue 10 , 2021 ; 25233971 (ISSN) Haghshenas Gorgani, H ; Shabani, S ; Sharif University of Technology

Springer Nature 2021

Abstract

COVID-19 pandemic caused an increasing demand for online academic classes, which led to the demand for effective online exams with regards to limitations on time and resources. Consequently, holding online exams with sufficient reliability and effectiveness became one of the most critical and challenging subjects in higher education. Therefore, it is essential to have a preventive algorithm to allocate time and financial resources effectively. In the present study, a fair test with sufficient validity is first defined, and then by analogy with an engineering product, the design process is implemented on it. For this purpose, a hybrid method based on FMEA, which is a preventive method to...

A bi-objective Hybrid Algorithm to Reduce Noise and Data Dimension in Diabetes Disease Diagnosis Using Support Vector Machines

, M.Sc. Thesis Sharif University of Technology Alirezaei, Mahsa (Author) ; Akhavan Niaki, Taghi (Supervisor)

Abstract

There is a significant amount of data in the healthcare domain and it is unfeasible to process such volume of data manually in order to diagnose the diseases and develop a treatment method in the short term. Diabetes mellitus has attracted the attention of data miners for a couple of reasons among which significant effects on the health and well-being of the contracted people and the economic burdens on the health care system are of prime importance. Researchers are trying to find a statistical correlation between the causes of this disease and factors like patient's lifestyle, hereditary information, etc. The purpose of data mining is to discover rules that facilitate the early diagnosis...

محتواي کتاب

Routing and Scheduling of Home Health Care Problem Under Uncertainty

, M.Sc. Thesis Sharif University of Technology Khodabandeh, Pouria (Author) ; Rafiee, Majid (Supervisor) ; Kayvanfar, Vahid (Co-Supervisor)

Abstract

Home health care is one of the newest methods of providing services to patients in developed societies that can respond to the individual lifestyle of the modern age and increase of life expectancy. In this study, a new mathematical model is developed taking into account the flexibility of starting and ending locations of each nurse, according to the specific requirements of each service. In this context, there are some special services that require the picking of materials and health equipment from laboratory or force the nurse to return to laboratory to deliver the specimens and equipment. In the next step, this model is expanded to downgrading aspects by adding the objective of minimizing...

محتواي کتاب

Clustering for Large-Scale Datasets

, Ph.D. Dissertation Sharif University of Technology Ismkhan, Hassan (Author) ; Izadi, Mohammad (Supervisor)

Abstract

Sofar, many clustering algorithms have been proposed, however, they lose their speed against the large-scale dataset. The large-scale datasets are those with many number of points that their dimensions are also high. For low-dimensionl datasets with many number of points, the classic methods like tree-based structres can easily speed up the algorithms. In this thesis, to accelerate the data clustering, number of distance computations are reduced, because high number of distance computations is the main reason that clustering algorithms are slow. To reach this goal, in this thesis, it is considered that to accelerate clustering algorithms, it is needed to accelerate the tasks of searching for...

محتواي کتاب

An efficient hybrid approach based on K-means and generalized fashion algorithms for cluster analysis

, Article 2015 AI and Robotics, IRANOPEN 2015 - 5th Conference on Artificial Intelligence and Robotics, Qazvin, Iran, 12 April 2015 ; April , 2015 , Page(s): 1 - 7 ; 9781479987337 (ISBN) Aghamohseni, A ; Ramezanian, R ; Sharif University of Technology

Institute of Electrical and Electronics Engineers Inc 2015

Abstract

Clustering is the process of grouping data objects into set of disjoint classes called clusters so that objects within a class are highly similar with one another and dissimilar with the objects in other classes. The k-means algorithm is a simple and efficient algorithm that is widely used for data clustering. However, its performance depends on the initial state of centroids and may trap in local optima. In order to overcome local optima obstacles, a lot of studies have been done in clustering. The Fashion Algorithm is one effective method for searching problem space to find a near optimal solution. This paper presents a hybrid optimization algorithm based on Generalized Fashion Algorithm...

A combination of PSO and K-means methods to solve haplotype reconstruction problem

, Article 2009 International Conference on Innovations in Information Technology, IIT '09, 15 December 2009 through 17 December 2009 ; 2009 , Pages 190-194 ; 9781424456987 (ISBN) Sharifian R, S ; Baharian, A ; Asgarian, E ; Rasooli, A ; Sharif University of Technology

Abstract

Disease association study is of great importance among various fields of study in bioinformatics. Computational methods happen to be advantageous specifically when experimental approaches fail to obtain accurate results. Haplotypes are believed to be the most responsible biological data for genetic diseases. In this paper, the problem of reconstructing haplotypes from error-containing SNP fragments is discussed For this purpose, two new methods have been proposed by a combination of k-means clustering and particle swarm optimization algorithm. The methods and their implementation results on real biological and simulation datasets are represented which shows that they outperform the methods...

A weighted K-means clustering approach to solve the redundancy allocation problem of systems having components with different failures

, Article Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability ; Volume 233, Issue 6 , 2019 , Pages 925-942 ; 1748006X (ISSN) Karimi, B ; Akhavan Niak, S. T ; Miriha, S. M ; Ghare Hasanluo, M ; Javanmard, S ; Sharif University of Technology

SAGE Publications Ltd 2019

Abstract

A nonlinear integer programming model is developed in this article to solve redundancy allocation problems with multiple components having different failure rates in the series–parallel configuration using an active strategy. The main objective of this research is to select the number and the type of each component in subsystems so as the reliability of the system under certain constraints is maximized. To this aim, a weighted K-means clustering method is proposed, in which the analytical network process is employed to assign weights to the components of each cluster. As the proposed model belongs to the class of nondeterministic polynomial-time hardness problems, precise solution methods...

Three hybrid GAs for discounted fixed charge transportation problems

, Article Cogent Engineering ; Volume 5, Issue 1 , 2018 ; 23311916 (ISSN) Ghassemi Tari, F ; Hashemi, Z ; Sharif University of Technology

Cogent OA 2018

Abstract

The problem of allocating heterogeneous fleet of vehicles to the existing distribution network for dispensing products fro. manufacturing firm t. set of depots is considered. It is assume. heterogeneous fleet of vehicles with the given capacities and total costs consisting o. discounted fixed cost an. variable cost proportional to the amount shipped is employed for handling products. To minimize the total transportation costs, the problem is modeled i. form of the nonlinear mixed integer program. Due to the NP hard complexity of the mathematical model, three prioritized K-mean clustering hybrid GAs, by incorporating two new heuristic algorithms, are proposed. The efficiency of the algorithms...

A novel pre-processing method to reduce noise effects in a prototype-based clustering algorithm

, Article 2008 International Conference on Information and Knowledge Engineering, IKE 2008, Las Vegas, NV, 14 July 2008 through 17 July 2008 ; July , 2008 , Pages 587-593 ; 1601320752 (ISBN); 9781601320759 (ISBN) Taghikhaki, Z ; Minaei, B ; Masoum, A ; Sharif University of Technology

2008

Abstract

In this paper we introduce a preprocessing method to reduce noise effects in noise prone environments. Prototype based clustering algorithms are sensitive to noise because the effect of noisy data are as same as effect of true data and this affects on calculation of clusters center and then reduces accuracy. Therefore, these algorithms can not be applied in noise-prone environments and if this is applied there, we can not trust to the results. To overcome such problems we reduce and in some cases eliminate the noisy data. Also a part of our method is applied on the source of generated data in a network. Then noisy data that the number of them is high in noisy environments are eliminated and...

An innovative implementation of Circular Hough Transform using eigenvalues of Covariance Matrix for detecting circles

, Article Proceedings Elmar - International Symposium Electronics in Marine, 14 September 2011 through 16 September 2011, Zadar ; 2011 , Pages 397-400 ; 13342630 (ISSN) ; 9789537044121 (ISBN) Tooei, M. H. D. H ; Mianroodi, J. R ; Norouzi, N ; Khajooeizadeh, A ; Sharif University of Technology

2011

Abstract

In this paper, a fast and accurate algorithm for identifying circular objects in images is proposed. The presented method is a robust, fast and optimized adaption of Circular Hough Transform (CHT), Eigenvalues of Covariance Matrix and K-means clustering techniques. Results are greatly improved by implementing iterative K-means clustering algorithm and establishing an exponential growth instead of updating values in the parameter space of CHT through summation, both in runtime and quality. In fact, using the Eigenvalues of Covariance Matrix as a validating method, a well balanced compromise between the speed and accuracy of results is achieved. This method is tested on several real world...

Hybridization of k-means and harmony search methods for web page clustering

, Article Proceedings - 2008 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2008, 9 December 2008 through 12 December 2008, Sydney, NSW ; 2008 , Pages 329-335 ; 9780769534961 (ISBN) Forsati, R ; Meybodi, M. R ; Mahdavi, M ; Ghari Neiat, A ; Sharif University of Technology

2008

Abstract

Clustering is currently one of the most crucial techniques for dealing with massive amount of heterogeneous information on the web, which is beyond human being's capacity to digest. Recent studies have shown that the most commonly used partitioning-based clustering algorithm, the K-means algorithm, is more suitable for large datasets. However, the K-means algorithm can generate a local optimal solution. In this paper we present novel harmony search clustering algorithms that deal with documents clustering based on harmony search optimization method. By modeling clustering as an optimization problem, first, we propose a pure harmony search based clustering algorithm that finds near global...

Solving MEC and MEC/GI problem models, using information fusion and multiple classifiers

, Article Innovations'07: 4th International Conference on Innovations in Information Technology, IIT, Dubai, 18 November 2007 through 20 November 2007 ; 2007 , Pages 397-401 ; 9781424418411 (ISBN) Asgarian, E ; Moeinzadeh, M. H ; Mohammadzadeht, J ; Ghazinezhad, A ; Habibi, J ; Najafi Ardabili, A ; Sharif University of Technology

IEEE Computer Society 2007

Abstract

Mutations in Single Nucleotide Polymorphisms (SNPs - different variant positions (1%) from human genomes) are responsible for some genetic diseases. As a consequence, obtaining all SNPs from human populations is one of the primary goals of recent studies in human genomics. Two sequences of mentioned SNPs in diploid human organisms are called haplotypes. In this paper, we study haplotype reconstruction from SNP-fragments with and without genotype information, problems. Designing serial and parallel classifiers was center of our research. Genetic algorithm and K-means were two components of our approaches. This combination helps us to cover the single classifier's weaknesses. ©2008 IEEE

Climate Classification of the MENA (Middle East and North Africa) by Introducing a New Index for Clustering Validation

, M.Sc. Thesis Sharif University of Technology Rajabi, Reza (Author) ; Moghim, Sanaz (Supervisor)

Abstract

Clustering presents valuable information in discovery of the climatic zones. To use clustering approaches, similarity measure, clustering algorithm, and clustering validity index should be determined. To find climatic zones over Middle East nad North Africa (MENA), this study performs k-means clustering with Euclidean distance as the similarity measure on four monthly precipitation datasets (CRU, GPCC, UDEL, and PREC/L) and two monthly temperature datasets (CRU, NOAA GHCN-CAMS). This study aims to validate clustering results and find a proper number of clusters. For this purpose, five traditional validity indices are examined on experimental datasets. Results show significant differences...

محتواي کتاب