Loading...

Statistical Labeling, Cluster-Based Approach for Improving Fraud Detection Classification Performance in Unbalanced Datasets

Khodabandeh Yalabadi, Ali | 2020

375 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 53036 (01)
  4. University: Sharif University of Technology
  5. Department: Industrial Engineering
  6. Advisor(s): Shadrokh, Shahram; Khedmati, Majid
  7. Abstract:
  8. Nowadays, researchers working on classifiers which are designed to predict minority class. In this work, we attempt to improve fraud detection performance, with minimum possible complexity. In this regard, by incrementing model sensitivity to minority class samples, we solve the problem of model ignorance to these instances. Moreover, by using clustering, we cluster similar inputs based on their features, and split each class to smaller bins. Then with considering the fact that, prediction probability threshold influences the final performance, we define statistical hypothesis testing exclusively for each cluster to evaluate predictions with expected range. In this method, model is not limited to a single threshold for all datasets. After entering a new instance and finding best cluster to fit in, according to internal features value, model will decide whether it follow the probability distribution of other members of its cluster, or not. If it meets the condition, predicted class label is the same label of other members in the cluster, otherwise, it would be tested with the other class
  9. Keywords:
  10. Classification ; Clustering ; Statistical Hypothesize ; Prediction Probability Threshold ; Data Prediction

 Digital Object List

 Bookmark

No TOC