Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 39485 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Abolhassani, Hassan
- Abstract:
- Due to the application of data streams in various data sources such as Web click streams, Web pages, and data generated by sensors and satellites, data streams have attracted a huge attention recently. A data stream is an ordered sequence of points that must be accessed in order and can be read only once or a small number of times. For mining such data, the ability to process in one pass along with limited memory usage is very important. Data stream clustering also has received a huge attention in recent years and numerous algorithms are developed in this field. None of them has paid attention to the feature selection problem as an effective factor in clustering quality especially when the dimensionality of data is high. This factor has been taken into account in this thesis and FBSClu which is a feature-based data stream clustering algorithm has been introduced. This algorithm, in absence of each feature, incrementally clusters the stream and then evaluates the clustering solution using the combination of Compactness and Separation measures. Then, according to their importance, obtains a ranking between features. Afterwards, it uses an appropriate algorithm to identify improper and unimportant features and removes them from feature collection. Since the amount of process taken by FBSClu per point is a bit high, it may miss some points in high-speed high-dimensional streams. Therefore, in order to overcome this problem the FFBSClu which is a fast feature-based data stream clustering algorithm is introduced
- Keywords:
- Feature Selection ; Clustering ; Data Mining ; Data Stream ; Incremental Feature Selection
- محتواي پايان نامه
- view