Loading...

Binary classification of imbalanced datasets: The case of CoIL challenge 2000

Khalilpour Darzi, M. R ; Sharif University of Technology | 2019

343 Viewed
  1. Type of Document: Article
  2. DOI: 10.1016/j.eswa.2019.03.024
  3. Publisher: Elsevier Ltd , 2019
  4. Abstract:
  5. This paper presents some approaches based on data mining techniques to solve the prediction task of Computational Intelligence and Learning (CoIL) Challenge 2000. The prediction task of the contest is a direct mailing problem and the goal is to improve its response rate. The main issue in this competition is the incompatibility of the dataset in which the distribution of the classes of the target attribute is highly unbalanced. This in turn causes high error rate in identifying the minority class samples. Three different level methods including data-level, algorithm-level, and hybrid method are used to overcome this issue. The specificity, sensitivity, precision-recall, and ROC criteria are employed to compare the performance of the methods. Among the methods proposed in this paper, the best one performs much better than the winner of the competition. © 2019 Elsevier Ltd
  6. Keywords:
  7. Classification ; Cost sensitive learning ; Data mining ; Direct mail ; Classification (of information) ; Insurance ; Mail handling ; Sampling ; Algorithm level ; Binary classification ; Cost-sensitive learning ; Direct mailing ; Direct mails ; Imbalanced Data-sets ; Prediction tasks ; Response rate
  8. Source: Expert Systems with Applications ; Volume 128 , 2019 , Pages 169-186 ; 09574174 (ISSN)
  9. URL: https://www.sciencedirect.com/science/article/abs/pii/S0957417419301861