Loading...
- Type of Document: Ph.D. Dissertation
- Language: Farsi
- Document No: 56644 (05)
- University: Sharif University of Technology
- Department: Electrical Engineering
- Advisor(s): Bagheri Shouraki, Saeed
- Abstract:
- Today, the pace of information generation, fast processing and instant decision-making is increasing. In this regard, one of the main needs in the field of data management and processing is stream data processing. Today's world needs new methods to deal with and analyze these data. Two of the most challenging aspects of data streams are (i) concept drift, i.e. evolution of data stream over time, which requires the ability to make timely decisions against the high speed of receiving new data; (ii) limited memory storage and the impracticality of using memory due to the large amount of data. Clustering is one of the common methods for processing data streams, without having basic knowledge about the data. In recent years, various methods have been proposed that have the ability to cluster data streams. The main limitation of these methods is the use of parameters based on expert knowledge, which somehow challenges the real-time property of stream data. In this work, we propose a novel, fully-online, density-based method for clustering evolving data streams. We determine the value of the parameters by using statistical theories and do not require more information, taking advantage of expert-knowledge. Time sequence does not necessarily mean that the data follow each other on the time axis. Any type of stream data, as well as data that we are unable to process at once due to the limitations of processing techniques, and we can process a part of them at each time, is the target of our view in this thesis. In the learning process, there is no basic knowledge about the phenomenon, at first. To face this condition, we have used a two-stage strategy to reach a preliminary knowledge by examining the data. Considering how satisfying this initial knowledge is, we refer to the data again and make this knowledge more accurate and precise. In other words, in this framework, we are looking for accuracy and quality at the same time. A framework that is sensitive to changes while being indifferent to noise and disturbances. The proposed framework has two main loops. First, it collects data. Then it determines the cause and effect relationships between input and output (initial guesses), which is done with the help of a new clustering method called KIDS. In the next step, the corresponding relations between input and output are done using another new clustering method called CFCP. These two stages together constitute our model. The proposed framework has the ability to identify clusters with arbitrary shapes. It is robust to noise and offers high accuracy and efficiency in both low and high dimensions. The presented experimental results show that the method performs data clustering at high speeds without reducing the quality compared to the state-of-the-art algorithms
- Keywords:
- Incremental Learning ; Data Stream Clustering ; Learning Framework ; K-Dimensional Data ; Fully-Online Clustering ; Stream Data Processing
- محتواي کتاب
- view