Loading...

Online Stream Classification Using Bayesian Non-Parametric Models

Hosseini, Abbas | 2014

1532 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 45907 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Rabiee, Hamid Reza
  7. Abstract:
  8. The emergence of applications such as spam detection and online advertising coupled with the dramatic growth of user-generated content has attracted more and more attention to stream classification. The data stream in such applications is large or even unbounded; moreover, the system is often required to respond in an online manner. Furthermore, one of the main challenges of stream classification is that often the process that generates the data is non-stationary. This phenomenon, known as concept drift, poses different challenges to the classification problem.Therefore, an adaptive approach is required that can manage concept drift in an online fashion. This thesis presents a probabilistic non-parametric generative model for stream classification that can handle concept drift efficiently and adjust its complexity over time. This is realized by building a model selection based classifier via a mixture model with potentially infinite mixtures. To model the emergence and death of concepts, the method uses the temporal Dirichlet process mixture model. Moreover, unlike recent methods, the proposed model handles concept drift by adapting dataconcept association without unnecessary i.i.d. assumption among the data of a batch. This allows the model to efficiently classify data using fewer and simpler base classifiers. This method finds the number of concepts and the data-concept association through inference on the proposed model. An online algorithm for making inference on the proposed non-conjugate time-dependent non-parametric model is proposed which is based on Gibbs sampling. Extensive experimental results on several stream datasets demonstrate the effectiveness of the proposed model
  9. Keywords:
  10. Concept Drift ; Data Stream Classification ; Bayesian Nonparametric Model ; Online Inference

 Digital Object List