Feature Ranking in Text Classification

Sadeghi, Sabereh | 2010

460 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 41183 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Beigy, Hamid
  7. Abstract:
  8. Text classification is one if the widest and most important applications in data mining. Because of the huge number of features in these applications, a method for dimensionality reduction is needed before applying the classification algorithm. Various number of methods for dimensionality reduction and feature selection are proposed. Feature selection based on feature ranking has received much attention by researchers. The major reasons are their scalability, ease of use, and fast computation. Feature ranking methods are divided to different categories and use different measures for scoring features. Recently ensemble methods have entered the field of ranking, and achieved more accuracy among others. Accordingly, in this thesis a Heterogeneous ensemble based algorithm for feature ranking is proposed. The base ranking methods in this ensemble structure are chosen from different categories like information theoretic, distance based, and statistical methods. The results are fusioned into a final feature subset by means of genetic algorithm. Also, the problem of determining an appropriate threshold by the user, is reduced in the proposed algorithm. The performance of the algorithm is evaluated on three different text datasets. Results show that it outperforms all other five feature ranking methods compared. Furthermore the performance of the proposed algorithm is not dependant on the classification method
  9. Keywords:
  10. Text Categorization ; Ensemble Learning ; Dimensionality Reduction ; Features Ranking

 Digital Object List

  • محتواي پايان نامه
  •   view