Loading...

Using a classifier pool in accuracy based tracking of recurring concepts in data stream classification

Hosseini, M. J ; Sharif University of Technology | 2013

1054 Viewed
  1. Type of Document: Article
  2. DOI: 10.1007/s12530-012-9064-3
  3. Publisher: 2013
  4. Abstract:
  5. Data streams have some unique properties which make them applicable in precise modeling of many real data mining applications. The most challenging property of data streams is the occurrence of "concept drift". Recurring concepts is a type of concept drift which can be seen in most of real world problems. Detecting recurring concepts makes it possible to exploit previous knowledge obtained in the learning process. This leads to quick adaptation of the learner whenever a concept reappears. In this paper, we propose a learning algorithm called Pool and Accuracy based Stream Classification with some variations, which takes the advantage of maintaining a pool of classifiers to track recurring concepts. Each classifier is used to describe an existing concept. Consecutive batches of instances are first classified by the pool of classifiers. Two approaches are presented for this task: active classifier and weighted classifiers methods. Then the true labels are revealed and the pool is updated at the end of the batch. Updating the pool is done using one of the following methods: exact Bayesian, Bayesian and Heuristic. As the algorithm may assign multiple classifiers to a single concept, a classifier merging process is used to resolve this problem. Experimental results on real and artificial datasets show the effectiveness of weighted classifiers method while dealing with sudden concept drifting datasets. In addition, the proposed updating methods outperform the existing algorithms in datasets with arbitrary attributes. Finally some performed experiments represent superiority of using merging process in large datasets
  6. Keywords:
  7. Merging process ; Concept drift ; Recurring concepts ; Stream mining ; Artificial datasets ; Bayesian ; Concept drifting ; Concept drifts ; Data mining applications ; Data stream ; Data stream classifications ; Large datasets ; Learning process ; Ensemble learning ; Multiple classifiers ; Precise modeling ; Real-world problem ; Stream classification ; Two Approaches ; Updating methods ; Data communication systems ; Data mining ; Lakes ; Learning algorithms ; Viscosity measurement ; Heuristic methods
  8. Source: Evolving Systems ; Volume 4, Issue 1 , 2013 , Pages 43-60 ; 18686478 (ISSN)
  9. URL: http://link.springer.com/article/10.1007%2Fs12530-012-9064-3