Loading...
Application of Data Mining in Prediction of Diabetes type 2
Bagherzadeh Khiabani, Farideh | 2013
557
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 45217 (01)
- University: Sharif University of Technology
- Department: Industrial Engineering
- Advisor(s): Akhavan Niaki, Taghi
- Abstract:
- Developments in the field of data storage which is due to computers have led to an extraordinary increase in medical data just like the increase in all other fields. As a result, physicians are faced with the problem of using the stored data. Therefore, the traditional manual data analysis is inadequate due to the large amounts of data. Furthermore, the ability to use this data to extract useful information is critical for the quality of medical care. Therefore, data mining techniques arose so that we will be able to extract knowledge through applying them to the raw data and subsequently help the doctors in making decisions.
In this study, we are pursuing four goals. First, in order to select an appropriate method to overcome the problem of missing values, several methods of data imputation are evaluated. Error measure and methods for the comparison of these imputation techniques are presented. The results indicate that the iterative robust model-based method has the highest performance among all methods investigated in this study. Then, three different base classifiers (Naïve Bayes, Decision Tree and SVM) are combined in a stacking model in order to achieve a higher accuracy. The accuracy of the presented model equals 87.10%. For the next step, the effect of joining multiple imputations of missing values through adding noise to the imputed values achieved by the selected imputation method on the performance criterias are investigated. Finally, results show that models in data-mining can easily become sensitive to the experts` opinions about types of errors by the use of a cost matrix. In this study, the actual Pima Indian Diabetes data provided by California University is used for the purpose of experiment. This data set is mined according to CRISP-DM methodology using R, SPSS and Rapid Miner Softwares - Keywords:
- Data Mining ; Classification ; Diabetes ; Missing Value Imputation Methods ; Stacking Method ; Cost-Sensitive Learning
-
محتواي کتاب
- view