Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 43625 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Mirian Hosseinabadi, Hassan
- Abstract:
- Data Stream is a sequence of continuous data which in recent years, its management and processing is widely introduced in different research areas of computer science such as distributed systems. Although clustering single data stream systems have many usages, the emergence of systems and applications that their main equirement is clustering of multiple data stream that increases rapidly and reveals the need for further research in this area.Sources that are generating data stream have limited memory and processing power.On the other hand, we need information about the entire data stream and require that the information sends to a central site to produce the final result of data mining. As a consequence, the required memory resources and also the amount of information exchanged between them should be reduced as much as possible, while the quality of the clustering result is preserved. Another feature that is needed in data stream applications is that clustering result must constantly be updated and each time a user desire to see the results, so can do it.In this thesis, the two phase algorithm (online and offline) for clustering data that are generated by multiple data streams has been proposed. In this algorithm, memory usage and communication cost have been reduced by using micro cluster and constructing a hierarchical structure of streams. Experimental results show that clustering quality is also high and will provide in short time. In addition since the clustring algorithm is density based, arbitrary shapes can be detected
- Keywords:
- Clustering ; Data Stream ; Microcluster ; Designated Streams ; Normal Streams
-
محتواي پايان نامه
- view
