Loading...

Clustering Massive Data Streams

Bahrami, Mohammad Saleh | 2024

0 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 57666 (02)
  4. University: Sharif University of Technology
  5. Department: Mathematical Sciences
  6. Advisor(s): Zarei, Alireza
  7. Abstract:
  8. With the advancement of technologies such as the Internet of Things the number of devices that are connected to the Internet are increasing rapidly and they continuously generate data that is increasing day by day. Data generation rate in these devices can be so large that practically the existing hardware has not the ability to store and process all information related to that data in appropriate way. In this situation, streaming models become more important. In the stream computing models, online processing of these data are implemented with memory complexity of O(1) . Stream algorithms refers to algorithms that input data is in the form of streams. Data appear and are removed from the input and are not pre-stored in the computer’s memory. One of the most well-known methods The methods available in unsupervised learning algorithms are known as clustering methods. Clustering is usually one of the frst steps to better understand data and extract knowledge from it. In this research, we have tried to frst review the well-known clustering algorithms then we try to understand the mechanisms used in it. We try to improving the existing algorithms in the streaming clustering algorithms. In particular, we focus on density- grid based algorithms. The weakness of grid-based algorithms is their ineffcient memory consumption in the high dimension data. In this research, we have tried to improve the weaknesses of these algorithms to some extent. Our idea for this is to use a grid data structure with cells of diferent precision. Also,we suggest mechanisms for pruning cells whom low data entered in,at last try to combine Well-known algorithms in this feld, such as D stream and AMR, to design an algorithm that, in terms of time complexity, memory complexity and the quality of clustering be more optimal than previous ones
  9. Keywords:
  10. Clustering ; Massive Data ; Data Stream ; Data Stream Clustering

 Digital Object List

 Bookmark

...see more