Loading...
A Semisupervised Classification Algorithm for Data Streams Using Decision Tree Algorithm
Gholipour Shahraki, Ameneh | 2013
692
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 44354 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Beigy, Hamid
- Abstract:
- Nowadays, living in information era has forced us to face with a great deal of problems of which the input data is received like a nonstop endless stream. Intrusion detection in networks or filtering spam emails out of legal ones are instances of such problems. In such areas, traditional classification algorithms show function improperly, thus it is necessary to make use of novel algorithms that can tackle these problems. Among classification algorithms, decision trees have significant advantages such as being independent of any parameter and acting robust against outliers or unrelated attributes. Moreover, results of a decision tree are quite easy to interpret and extract rules from. Therefore, we are going to suggest an algorithm for classification of data streams that works based on a decision tree algorithm. On the other hand, in most data streams, labeling data is difficult, time consuming, and costly. This is why in classification problems we usually suffer from lack of labeled data. In these conditions, supervised learners do not perform acceptably and it is required to use semisupervised versions of learning algorithms. Semisupervised methods try to exploit unlabeled data, in addition to labeled data, in their learning procedure. In this dissertation we are going to suggest both a supervised and a semisupervised algorithm. In order to design the described algorithms, three substantial steps are defined. In the first step, a supervised regression algorithm for data streams is introduced. In the next step, this algorithm is extended and some modifications are applied to its structure such that it would be able to do classification. Finally in the third step, a method for managing unlabeled data is designed and based on the previous algorithm, a semisupervised algorithm for classification of data streams in the presence of concept drift using decision tree algorithm is introduced. Results of implementations and analyses show the ability of our algorithms to perform better in terms of accuracy, running time, or both, compared to similar state-of-the-art methods proposed by others
- Keywords:
- Decision Making Tree ; Data Stream ; Learning Algorithm ; Concept Drift ; Semisupervised Classification
- محتواي کتاب
- view