Loading...

Data Stream Classification in Presence of Concept Drift Using Ensemble Learning

Sobhani, Parinaz | 2011

406 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 42253 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Beigy, Hamid
  7. Abstract:
  8. Traditional classification techniques of machine learning assume that data have stationary distributions. This assumption for recent challenges where tremendous amount of data are generated at unprecedented rates with evolving patterns, is not true anymore. Classification of data streams has become an important area of machine learning, as the number of applications facing these challenges increases. Examples of such data streams applications include text streams, surveillance video streams, credit card fraud detection, market basket analysis, information filtering, computer security, etc. An appropriate method for such problems should adapt to drifting concepts by revising and refining the method as new data become available, without the need to store all data. Superiority of ensemble methods over single classifier for data streams has been proven both theoretically and experimentally. Accordingly, in this thesis, a new drift detection method and a new ensemble method for classification of data streams is proposed. The proposed classification method works with contribution of our drift detection method. In our classification method a new classifier is built on each new chunk of data that becomes available and in addition we use boosting techniques to use prior knowledge efficiently. Experiments and results on artificial, semi-artificial and real datasets show that our method performs well in changing and stationary environments and can detect and adapt to different kinds of concept drifts. In addition, this algorithm outperforms three state of the art methods learn++.NSE, IMORL and WSEA and as the severity and speed of concept drift increase, superiority of the proposed method over other methods becomes more noticeable
  9. Keywords:
  10. Machine Learning ; Classification ; Concept Drift ; Data Stream ; Ensemble Learning

 Digital Object List

 Bookmark

No TOC