Loading...

A bi-objective Hybrid Algorithm to Reduce Noise and Data Dimension in Diabetes Disease Diagnosis Using Support Vector Machines

Alirezaei, Mahsa | 2017

535 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 50218 (01)
  4. University: Sharif University of Technology
  5. Department: Industrial Engineering
  6. Advisor(s): Akhavan Niaki, Taghi
  7. Abstract:
  8. There is a significant amount of data in the healthcare domain and it is unfeasible to process such volume of data manually in order to diagnose the diseases and develop a treatment method in the short term. Diabetes mellitus has attracted the attention of data miners for a couple of reasons among which significant effects on the health and well-being of the contracted people and the economic burdens on the health care system are of prime importance. Researchers are trying to find a statistical correlation between the causes of this disease and factors like patient's lifestyle, hereditary information, etc. The purpose of data mining is to discover rules that facilitate the early diagnosis and control of diabetes. In this research, the proposed method is applied to the database of diabetic patients accessible from the website of the University of California. First, in order to impute a value to the missing data, KNN method is adopted and data are assigned to the mode of each class. Then, the clustering-based K-Means approach is used to eliminate the outliers. This study is mainly focused on feature selection by using multi-objective meta heuristic algorithms which are aimed at maximizing the accuracy and minimizing the number of features. Four meta heuristic algorithms, namely NSGA-II, MOPSO, MOICA, and MOFA are applied to the data. Finally, the support vector machine (SVM) is used as the classification algorithm and the 10-fold cross validation method is adopted to verify the results. The NSGA-II, MOFA and MOICA algorithms result in the selection of 5 features out of 8 available options, while the MOPSO algorithm leads to the selection of 4 features. The accuracy of MOFA and MOICA algorithms is 100%, while those of NSGA-II and MOPSO algorithms are 98.2% and 94.6%, respectively. This dataset is examined based on CRISP-DM methodology by using MATLAB, SPSS, Rapid Miner, Minitab, and R software
  9. Keywords:
  10. Support Vector Machine (SVM) ; Feature Selection ; Meta Heuristic Algorithm ; K-means Clustering ; Diabetes ; Disease Diagnosis

 Digital Object List

 Bookmark

No TOC