Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 43813 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Beigy, Hamid
- Abstract:
- Spam filtering is one of the large scale applications of machine learning. Much research has been carried out in the machine learning field with regards to spam filtering. Most of this work falls in the areas of batch learning or offline incremental learning. In batch learning, the learning process is carried out once on all the learning data. In applications such as spam filtering, in which the learning data is large in comparison to memory resources and data is generated in a stream, using incremental learning is required, in which the learning phase is repeated periodically. In each learning iteration of an offline incremental learning algorithm, a new set of data is learnt by the learner, while in online incremental learning, the model built by the learning agent is updated upon receiving every new learning data. Considering that detection speed of new types of spam affects the efficiency of the spam filter; therefore, applying online learning to this area can be beneficial.
Ensemble learning is a type of learning which uses several base classifiers for classification and has demonstrated its efficiency in various classification problems. Ensemble learners generally outperform their base classifiers. These approaches have been widely applied to online environments, which demonstrate their efficiency in online situations.
In this thesis, in addition to reviewing online learning techniques for spam filters, a new online ensemble algorithm is proposed for this application. The proposed approach is applied to existing datasets, such as TREC2005, and the results were compared to other algorithms. Our algorithm demonstrates higher efficiency and about 3% better accuracy in comparison - Keywords:
- Ensemble Learning ; Online Learning ; Spam Filtering
-
محتواي پايان نامه
- view