Loading...
Search for: data-chunks
0.004 seconds

    Evaluation of Deduplication Technuques in Data Storage Systems

    , M.Sc. Thesis Sharif University of Technology Bazazzadegan, Mohammad Hossein (Author) ; Asadi, Hossein (Supervisor)
    Abstract
    Deduplication is a data reduction technique, which eliminates redundant data by storing only a single copy of each file or block, along with a reference to the unique copy of data. Deduplication reduces the storage space and bandwidth requirements of data storage systems, and becomes more effective when applied across multiple users. All deduplication techniques use cryptographic algorithms to detect duplication instead of byte-by-byte comparison between input data and all previously stored data. In this manner, the data stream is divided into non-overlapping chunks of data. Then, the hash values of non-overlapping chunks create an exclusive criterion to identify duplicates. In this thesis,... 

    New drift detection method for data streams

    , Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; Volume 6943 LNAI , 2011 , Pages 88-97 ; 03029743 (ISSN) ; 9783642238567 (ISBN) Sobhani, P ; Beigy, H ; Sharif University of Technology
    Abstract
    Correctly detecting the position where a concept begins to drift is important in mining data streams. In this paper, we propose a new method for detecting concept drift. The proposed method, which can detect different types of drift, is based on processing data chunk by chunk and measuring differences between two consecutive batches, as drift indicator. In order to evaluate the proposed method we measure its performance on a set of artificial datasets with different levels of severity and speed of drift. The experimental results show that the proposed method is capable to detect drifts and can approximately find concept drift locations