Loading...

Incremental Learning Approach in Spam Detection

Ghanbari, Elham | 2010

627 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 40678 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Beygi, Hamid
  7. Abstract:
  8. Studies show that a large proportion of sent emails are spam. Spam is one of the major problems of e-mail users that result in wasting time and cost. To overcome this problem different ways are used, one of the best ways is detecting spam based on their contents. Separating legitimate e-mails and spam within their contents can be categorized as text classification. So machine-learning approaches are extremely applied in text classification, that machine-learning algorithms can be used for spam classification. However, in the majority of these algorithms, training phase is in a batch. Whereas using incremental learning algorithms is preferred in many applications, especially spam detections. These algorithms let agents learn to extract new knowledge from new training data and add it to the previous knowledge. Among incremental learning algorithms, ensemble based algorithms create robust and precise results. Accordingly, in this thesis, an incremental learning algorithm using ensemble based learning algorithms in order to detect spam is represented. In proposed algorithm, it is supposed that the probability distribution function of the training data is uniform. Its ultimate goal is that incremental learning model is very close and similar to batch learning model. By new dataset arrival, this algorithm generates multi classifiers for the dataset and then combines them with previous classifiers based on weighted majority voting. Experimental results show that proposed algorithm is an incremental learning. In addition, this algorithm outperforms incremental Bayesian and Learn++.



  9. Keywords:
  10. Spam ; Machine Learning ; Ensemble Learning ; Incremental Learning ; Text Categorization ; Filtering

 Digital Object List

 Bookmark

No TOC