Fault Tolerance in Cloud Storage Systems Using Erasure Codes

Safaei, Bardia; Miremadi, Ghassem

Please enable javascript in your browser.

Fault Tolerance in Cloud Storage Systems Using Erasure Codes

Safaei, Bardia | 2016

536 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 48691 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Miremadi, Ghassem
Abstract:
International Data Company (IDC) has reported, at the end of 2020, the total amount of digital data stored in the entire world will reach 40 thousand Exabytes. The idea of accessing this volume of data, anywhere at any time by exploiting commodity hardware, led into the introduction of cloud storage. The abounded rate and variety of failures in the equipment used in cloud storage systems, placed fault tolerance, at top of the challenges in these systems. HDFS layer in Hadoop has provided cloud with reliable storage. Replication is the conventional method to protect data against failures in HDFS. But the storage overhead is a big deal and therefore designers are tending towards erasure codes. Despite the high rise number of researches on overcoming different challenges in erasure codes (including decoding time), the study on fault tolerance of these codes, is pale. In this paper by focusing on decoding time of erasure codes, we have proposed a model for decoding time procedure, to identify the parameters which explicitly affect decoding and implicitly affect fault tolerance. These parameters have an identical impact on decoding time and encoding time of a stripe in MDS erasure codes. Since the encoding time and bandwidth consumption of decoding and encoding operations can affect decoding time, the influence of proposed parameters will be evaluated on encoding time and network related parameters such as bandwidth consumption and network latency. Evaluations have been taken place by embedding erasure codes in Hadoop by means of HDFS-RAID module. Our implementations show that exploiting smaller block sizes and less number of stripes will reduce decoding time and also the probability of more simultaneous failures in the cluster which is a threat to fault tolerance. In addition, very large (1024MB) and very small (16MB) blocks, despite of their cons, reduce the encoding time and the bandwidth. Among the fault tolerant methods in HDFS-RAID, RS(10,4) codes impose more network latency and bandwidth consumption than XOR
Keywords:
Fault Tolerance ; Hadoop ; Erasure Code ; Cloud Storage ; Decoding Algorithm ; Coding

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code