Loading...

Multi-cass Semi-srvised Classification of Data Streams

Sepehr, Arman | 2013

409 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 44879 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Beigy, Hamid
  7. Abstract:
  8. Recent advances in storage and processing have provided the ability of automatic gathering of information which in turn leads to fast and contineous flow of data. The data which are produced and stored in this way are named data streams. It has many applications such as processing financial transactions, the recorded data of various sensors or the collected data by web sevices. Data streams are produced with high speed, large size and much dynamism and have some unique properties which make them applicable in precise modeling of many real data mining applications. The main challenge of data streams is the occurrence of concept drift which can be in four types: sudden, gradual, incremental or recurring. In addition, regarding the costs of labeling of the instances, it is often assumed that only a fraction of the instances are labeled. This thesis which is titled “Multi-class Semi-supervised Classification of Data Streams” uses ensemble algorithms to classify the instances in an environment with the mentioned properties. So, a new algorithm for semi-supervised classifying of non-stationary data streams is presented which uses ensemble classifiers. This method uses the ensemble of classifiers for each class. Batches of instances are classified by the algorithm. Thereafter, the instances along with their revealed labels are used to update the classifiers. In order to design the described algorithms, two substantial steps are defined. In the first step, a supervised algorithm for data streams is introduced. In the next step, a method for managing unlabeled data is designed and based on the previous algorithm, a semisupervised algorithm for classification of data streams in the presence of concept drift using ensemble algorithm is introduced. Results of implementations and analyses show the ability of our algorithms to perform better in terms of accuracy, running time, or both, compared to similar state-of-the-art methods proposed by others
  9. Keywords:
  10. Learning Algorithm ; Multiclass Classification ; Data Stream ; Ensemble Learning ; Semisupervised Classification

 Digital Object List

 Bookmark

No TOC