Loading...
Search for:
data-stream
0.006 seconds
Total 57 records
Detection of evolving concepts in non-stationary data streams: A multiple kernel learning approach
, Article Expert Systems with Applications ; Volume 91 , 2018 , Pages 187-197 ; 09574174 (ISSN) ; Zare Moodi, P ; Beigy, H ; Sharif University of Technology
Elsevier Ltd
2018
Abstract
Due to the unprecedented speed and volume of generated raw data in most of applications, data stream mining has attracted a lot of attention recently. Methods for solving these problems should address challenges in this area such as infinite length, concept-drift, recurring concepts, and concept-evolution. Moreover, due to the speedy intrinsic of data streams, the time and space complexity of the methods are extremely important. This paper proposes a novel method based on multiple-kernels for classifying non-stationary data streams, which addresses the mentioned challenges with special attention to the space complexity. By learning multiple kernels and specifying the boundaries of classes in...
Evolving data stream clustering based on constant false clustering probability
, Article Information Sciences ; Volume 614 , 2022 , Pages 1-18 ; 00200255 (ISSN) ; Bagheri Shouraki, S ; Norouzi, Y ; Sharif University of Technology
Elsevier Inc
2022
Abstract
Today's world needs new methods to deal with and analyze the ever-increasingly generated data streams. Two of the most challenging aspects of data streams are (i) concept drift, i.e. evolution of data stream over time, which requires the ability to make timely decisions against the high speed of receiving new data; (ii) limited memory storage and the impracticality of using memory due to the large amount of data. Clustering is one of the common methods to process data streams. In this paper, we propose a novel, fully-online, density-based method for clustering evolving data streams. In recent years, a number of methods have been proposed, which also have the ability to cluster data streams....
HB2DS: a behavior-driven high-bandwidth network mining system
, Article Journal of Systems and Software ; Volume 127 , 2017 , Pages 266-277 ; 01641212 (ISSN) ; Jalili, R ; Sharif University of Technology
Elsevier Inc
2017
Abstract
This paper proposes a behavior detection system, HB2DS, to address the behavior-detection challenges in high-bandwidth networks. In HB2DS, a summarization of network traffic is represented through some meta-events. The relationships amongst meta-events are used to mine end-user behaviors. HB2DS satisfies the main constraints exist in analyzing of high-bandwidth networks, namely online learning and outlier handling, as well as one-pass processing, delay, and memory limitations. Our evaluation indicates significant improvement in big data stream analyzing in terms of accuracy and efficiency. © 2016 Elsevier Inc
A New Approach for Data Stream Clustering of Arbitrary Shaped Cluster
,
M.Sc. Thesis
Sharif University of Technology
;
Abolhassani, Hassan
(Supervisor)
Abstract
Recently, data stream has been popular in many contexts like click sequences in web pages, obtained data from sensor networks and satellite data. A data stream is an ordered list of points that should be used once. Clustering of these kinds of data is one of the most difficult issues in data mining. Due to the high amount of data, the traditional clustering algorithms are not suitable for this family of problems. Many data stream clustering algorithms have been proposed in recent years considered the scalability (largeness) of data, but most of them didn’t attend to the following issues. •The quality of clustering can be bad over the time. •Some of the algorithms cannot handle arbitrary...
Approximation Algorithms for Clustering Points in the Data Stream Model
, M.Sc. Thesis Sharif University of Technology ; Zarrabi Zadeh, Hamid (Supervisor)
Abstract
The k-center problem—covering a set of points using k congruent balls with minimum radius—is a well-known clustering model in computer science with a wide range of applications. The k-center is a well known NP-Hard problem. In this thesis, we focus on the k-center problem with outliers in high dimensional data streams. Due to increase in data size, we focus on the data stream model of the problem. Moreover, in real-world applications, where input points are noisy, it is very important to consider outliers. In this thesis, we study 1-center and 2-center with outliers in high dimensional data streams in Euclidean space. We provide a 1:7-approximation streaming algorithm for 1-center with z...
New ensemble method for classification of data streams
, Article 2011 1st International eConference on Computer and Knowledge Engineering, ICCKE 2011, Mashhad, 13 October 2011 through 14 October 2011 ; 2011 , Pages 264-269 ; 9781467357135 (ISBN) ; Beigy, H ; Sharif University of Technology
Abstract
Classification of data streams has become an important area of data mining, as the number of applications facing these challenges increases. In this paper, we propose a new ensemble learning method for data stream classification in presence of concept drift. Our method is capable of detecting changes and adapting to new concepts which appears in the stream
A streaming algorithm for 2-center with outliers in high dimensions
, Article Computational Geometry: Theory and Applications ; Volume 60 , 2017 , Pages 26-36 ; 09257721 (ISSN) ; Zarrabi Zadeh, H ; Sharif University of Technology
Abstract
We study the 2-center problem with outliers in high-dimensional data streams. Given a stream of points in arbitrary d dimensions, the goal is to find two congruent balls of minimum radius covering all but at most z points. We present a (1.8+ε)-approximation streaming algorithm, improving over the previous (4+ε)-approximation algorithm available for the problem. The space complexity and update time of our algorithm are poly(d,z,1/ε), independent of the size of the stream. © 2016 Elsevier B.V
Clustering Data Streams using Core-Sets
, M.Sc. Thesis Sharif University of Technology ; Zarei, Alireza (Supervisor)
Abstract
We design a new algorithm for clustering data streams in any fixed di- mension, that use the framework of core-set to summarize data, in order to reduce the complexity of computation. Clustering is to separate data into distinct sets called clusters, which objects in the same cluster has the most similarity and objects in the different clusters has the least similarity.This problem has many application in the areas such as: machine learning,image processing, financial and stock transactions. Data stream has recently emerged as an important concept because in many applications, data is inherently streaming over a network or the data base is extremely large and sequential access is way faster...
An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams
, Article Knowledge and Information Systems ; Volume 46, Issue 3 , 2016 , Pages 567-597 ; 02191377 (ISSN) ; Gholipour, A ; Beigy, H ; Sharif University of Technology
Springer-Verlag London Ltd
Abstract
Recent advances in storage and processing have provided the possibility of automatic gathering of information, which in turn leads to fast and continuous flows of data. The data which are produced and stored in this way are called data streams. Data streams are produced in large size, and much dynamism and have some unique properties which make them applicable to model many real data mining applications. The main challenge of streaming data is the occurrence of concept drift. In addition, regarding the costs of labeling of instances, it is often assumed that only a small fraction of instances are labeled. In this paper, we propose an ensemble algorithm to classify instances of non-stationary...
Concept-evolution detection in non-stationary data streams: a fuzzy clustering approach
, Article Knowledge and Information Systems ; 2018 ; 02191377 (ISSN) ; Kamali Siahroudi, S ; Beigy, H ; Sharif University of Technology
Springer London
2018
Abstract
We have entered the era of networked communications where concepts such as big data and social networks are emerging. The explosion and profusion of available data in a broad range of application domains cause data streams to become an inevitable part of the most real-world applications. In the classification of data streams, there are four major challenges: infinite length, concept drift, recurring and evolving concepts. This paper proposes a novel method to address the mentioned challenges with a focus on the last one. Unlike the existing methods for detection of evolving concepts, we cast joint classification and detection of evolving concepts into optimizing an objective function by...
Concept-evolution detection in non-stationary data streams: a fuzzy clustering approach
, Article Knowledge and Information Systems ; Volume 60, Issue 3 , 2019 , Pages 1329-1352 ; 02191377 (ISSN) ; Kamali Siahroudi, S ; Beigy, H ; Sharif University of Technology
Springer London
2019
Abstract
We have entered the era of networked communications where concepts such as big data and social networks are emerging. The explosion and profusion of available data in a broad range of application domains cause data streams to become an inevitable part of the most real-world applications. In the classification of data streams, there are four major challenges: infinite length, concept drift, recurring and evolving concepts. This paper proposes a novel method to address the mentioned challenges with a focus on the last one. Unlike the existing methods for detection of evolving concepts, we cast joint classification and detection of evolving concepts into optimizing an objective function by...
Security Policy Enforcement on Heavy Network Traffic
, M.Sc. Thesis Sharif University of Technology ; Jalili, Rasool (Supervisor)
Abstract
Today’s large networks, such as global enterprise networks, carry heavy network traffic from a wide range of diverse protocols. Scalable and accurate classifcation of network traffic is of the most importance to security policy enforcement of large networks. The complexity of current network traffic along with the high speed links makes traffic classification more difficult. The dynamicity of heavy network traffic have necessitated the need for traffic classification algorithms which are adaptable to new concepts. The changes in traffic characteristic over time lead to concept drift, which is an important challenge in this domain. Data stream classification methods have been introduced to...
An adaptive regression tree for non-stationary data streams
, Article Proceedings of the ACM Symposium on Applied Computing ; March , 2013 , Pages 815-816 ; 9781450316569 (ISBN) ; Hosseini, M. J ; Beigy, H ; Sharif University of Technology
2013
Abstract
Data streams are endless flow of data produced in high speed, large size and usually non-stationary environments. The main property of these streams is the occurrence of concept drifts. Using decision trees is shown to be a powerful approach for accurate and fast learning of data streams. In this paper, we present an incremental regression tree that can predict the target variable of newly incoming instances. The tree is updated in the case of occurring concept drifts either by altering its structure or updating its embedded models. Experimental results show the effectiveness of our algorithm in speed and accuracy aspects in comparison to the best state-of-the-art methods
DSCLU: A new data stream CLUstring algorithm for multi density environments
, Article Proceedings - 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, SNPD 2012 ; 2012 , Pages 83-88 ; 9780769547619 (ISBN) ; Esfandani, G ; Sharif University of Technology
2012
Abstract
Recently, data stream has become popular in many contexts of data mining. Due to the high amount of incoming data, traditional clustering algorithms are not suitable for this family of problems. Many data stream clustering algorithms proposed in recent years considered the scalability of data, but most of them did not attend the following issues: (1) The quality of clustering can be dramatically low over the time. (2) Some of the algorithms cannot handle arbitrary shapes of data stream and consequently the results are limited to specific regions. (3) Most of the algorithms have not been evaluated in multi-density environments. Identifying appropriate clusters for data stream by handling the...
Feature-based data stream clustering
, Article Proceedings of the 2009 8th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2009, 1 June 2009 through 3 June 2009, Shanghai ; 2009 , Pages 363-368 ; 9780769536415 (ISBN) ; Abolhassani, H ; IEEE Computer Society; International Association for; Computer and Information Science, ACIS ; Sharif University of Technology
2009
Abstract
Data stream clustering has attracted a huge attention in recent years. Many one-pass and evolving algorithms have been developed in this field but feature selection and its influence on clustering solution has not been addressed by these algorithms. In this paper we explain a feature-based clustering method for streaming data. Our method establishes a ranking between features based on their appropriateness in terms of clustering compactness and separateness. Then, it uses an automatic algorithm to identify unimportant features and remove them from feature set. These two steps take place continuously during lifetime of clustering task. © 2009 IEEE
DSCA: an inline and adaptive application identification approach in encrypted network traffic
, Article 3rd International Conference on Cryptography, Security and Privacy, ICCSP 2019 with Workshop 2019 the 4th International Conference on Multimedia and Image Processing, ICMIP 2019, 19 January 2019 through 21 January 2019 ; 2019 , Pages 39-43 ; 9781450366182 (ISBN) ; Noferesti, M ; Jalili, R ; Sharif University of Technology
Association for Computing Machinery
2019
Abstract
Adaptive application detection in today's high-bandwidth networks is resource consuming and inaccurate due to the high volume, velocity, and variety characteristics of the networks traffic. To generate a robust classifier for identifying applications over encrypted traffic, we proposed DSCA as a DPI-based Stream Classification Algorithm. DSCA utilizes applications detected by the DPI, Deep Packet Inspection technique, as ground truth data and updates the classification model accordingly. To reduce the classification algorithms overhead without accuracy reduction, a feature selection method, named CfsSubsetEval, is deployed in DSCA. The proposed approach is implemented via the MOA tool and...
Predicting Novelty Concepts in Data Streams
, M.Sc. Thesis Sharif University of Technology ; Beigy, Hamid (Supervisor)
Abstract
Many real-world environment challenges are not considered in laboratory-controlled models. Although different and powerful models have been developed for object detection and classification in diverse applications, many fail in the real world. One of the most important challenges is dealing with unknown data at the inference time. The second challenge is to change the characteristics of the data distribution over time, known as concept drift. These two important challenges are explored in the Data Stream environment, along with many of the events that a model may face in the real world. To address the challenges of learning in a data stream environment, this thesis first designs a...
Data Stream Whole Clustering
, M.Sc. Thesis Sharif University of Technology ; Abolhassani, Hassan (Supervisor)
Abstract
Due to the application of data streams in various data sources such as Web click streams, Web pages, and data generated by sensors and satellites, data streams have attracted a huge attention recently. A data stream is an ordered sequence of points that must be accessed in order and can be read only once or a small number of times. For mining such data, the ability to process in one pass along with limited memory usage is very important. Data stream clustering also has received a huge attention in recent years and numerous algorithms are developed in this field. None of them has paid attention to the feature selection problem as an effective factor in clustering quality especially when the...
Clustering Multiple Data Stream
, M.Sc. Thesis Sharif University of Technology ; Mirian Hosseinabadi, Hassan (Supervisor)
Abstract
Data Stream is a sequence of continuous data which in recent years, its management and processing is widely introduced in different research areas of computer science such as distributed systems. Although clustering single data stream systems have many usages, the emergence of systems and applications that their main equirement is clustering of multiple data stream that increases rapidly and reveals the need for further research in this area.Sources that are generating data stream have limited memory and processing power.On the other hand, we need information about the entire data stream and require that the information sends to a central site to produce the final result of data mining. As a...
A Semi-Supervised Ensemble Learning Algorithm for Nonstationary Data Streams Classification
, M.Sc. Thesis Sharif University of Technology ; Beigy, Hamid (Supervisor)
Abstract
Recent advances in storage and processing, have provided the ability of automatic gathering of information which in turn leads to fast and contineous flow of data. The data which are produced and stored in this way, are named data streams. data streams have many applications such as processing financial transactions, the recorded data of various sensors or the collected data by web sevices. Data streams are produced with high speed, large size and much dynamism and have some unique properties which make them applicable in precise modeling of many real data mining applications. The main challenge of data streams is the occurrence of concept drift which can be in four types: sudden, gradual,...