Loading...

An Outlier Detection and Cleaning Algorithm in Classification Applications

Kasaeian, Mojtaba | 2013

577 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 44345 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Beigy, Hamid
  7. Abstract:
  8. Increasing information in real world needs the special instrument for data saving, cleaning and processing. Data cleaning is so important steps in machine learning application that include various kind of procedures such as, duplicate detection, fill out missing value and outlier detection. Outliers are observation, which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism. Many researches has been carried out in the machine learning field with regards to the outlier detection that has applications in real world, like: Intrusion detection for network security, fraud detection in credit cards, fault detection for security in critical systems, Disease detection using medical images, etc. Depending on the application, these data are expressed by different names such as anomaly, incompatible, noise, novelty, etc. In this thesis, two new reference based algorithms for detecting outliers are proposed which use some new concepts as named: left and right proximity, average of left and right proximity, etc. for pruning normal data which cause to reduce run time regardless the accuracy of algorithms with increasing size and attributes of dataset. Experimental result on some real dataset shows that 80 up to 95 percent of datasets are pruned by using this method. Also the result of analysis shows significant improvement in speed and time complexity of our proposed algorithms rather than other methods in classification applications
  9. Keywords:
  10. Outliers ; Anomaly Data ; Reference-Based Methods ; Left and Right Proximity

 Digital Object List

 Bookmark

No TOC