Loading...

A Semi-Supervised Ensemble Learning Algorithm for Nonstationary Data Streams Classification

Hosseini, Mohammad Javad | 2012

1587 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 44001 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Beigy, Hamid
  7. Abstract:
  8. Recent advances in storage and processing, have provided the ability of automatic gathering of information which in turn leads to fast and contineous flow of data. The data which are produced and stored in this way, are named data streams. data streams have many applications such as processing financial transactions, the recorded data of various sensors or the collected data by web sevices. Data streams are produced with high speed, large size and much dynamism and have some unique properties which make them applicable in precise modeling of many real data mining applications. The main challenge of data streams is the occurrence of concept drift which can be in four types: sudden, gradual, incremental or recurring. In addition, regarding the costs of labeling of the instances, it is often assumed that only a fraction of the instances are labeled. The goal of this thesis which is titled “A semi-supervised nsemble learning algorithm for non-stationary data streams classification”is to use ensemble algorithms to classify the instances in an environment with the mentioned properties. In this thesis, a new algorithm for semi-supervised classification of non-stationary data streams is presented which uses ensemble classifiers. This method, using the ensemble classifiers, is intended to recognize the recurring concepts of data streams and use them in classifying the instances. Recurring concepts detection is used to improve the existing algorithms which lack this mechanism or do not properly recognize or use the concepts. A pool of classifiers is maintained by the algorithm. Each of the classifiers is the representative of a concept. Batches of instances are classified by the algorithm. Thereafter, the instances along with their revealed labels are used to update the classifiers of the pool. First, a supervised version of the algorithm is presented and compared with the supervised methods.Then,semi-Supervised version of the algorithm is presented which has some differences from the supervised one. This algorithm uses the unlabeled instances beside the labeled ones in the learning task. Performed experiments show the effectiveness of the supervised and semi-supervised versions of the algorithm over the existing methods and regarding different aspects
  9. Keywords:
  10. Data Stream ; Ensemble Learning ; Concept Drift ; Semi-Supervised Learning ; Recurring Concepts

 Digital Object List

 Bookmark

...see more