Loading...

Intelligent classification of web pages using contextual and visual features

Ahmadi, A ; Sharif University of Technology

1036 Viewed
  1. Type of Document: Article
  2. DOI: 10.1016/j.asoc.2010.05.003
  3. Abstract:
  4. In this paper we address classification of Web content and in particular its application in the detection of pornographic Web pages. Filtering of undesirable Web content is mainly achieved based on blocking a specific Web address via searching it in a reference list of black URLs or doing a plain contextual analysis on the page by searching special keywords in the text. The main problem with current filtering methods is the requirement for instantly update of the URL list and also the high rate of over-blocking the usual pages. In this paper, we propose an intelligent approach which is based on using textual, profile, and visual features in a hierarchical structure classifier. Textual features contain information about keywords, black-words, etc. and profile features contain structural information like number of links, meta-tags, pictures, etc. As for the visual features we employ a sort of global and local indicative features including topological and shape-based characteristics which are extracted from the skin region. The algorithm was applied on a dataset with 1295 Web pages as training set including 700 porn pages (coming with text, image, or both) in English and Persian, and 595 non-porn pages including pages with medical, health, sports, etc. topics. Using a test dataset with 290 Web-ages a 95% accuracy rate was obtained
  5. Keywords:
  6. Adult image detection ; Content based filtering ; Porn image detection ; Skin color detection ; Web-pages classification ; Adult images ; Content based filtering ; Image detection ; Statistical tests ; Topology ; Websites ; Medical imaging
  7. Source: Applied Soft Computing Journal ; Volume 11, Issue 2 , 2011 , Pages 1638-1647 ; 15684946 (ISSN)
  8. URL: http://www.sciencedirect.com/science/article/pii/S1568494610001006