Loading...
Network-aware Key Partitioner for Efficient MapReduce Computation
Nasehi Basharzad, Saeed | 2017
607
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 50239 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Goudarzi, Maziar
- Abstract:
- MapReduce and its open source implementation, Hadoop, are the prevailing platforms for big data processing. MapReduce is a simple programming model for performing large computational problems in large-scale distributed systems. This model consists of two major phases: Map and Reduce. Between these two main phases, partitioner part is embedded which distributes produced keys by Map tasks among Reduce tasks When the amount of keys and their associated values, which are called intermediate data, is huge, this part has significant impact on execution time of Reduce tasks, and consequently, completion time of jobs. In this paper, we present a network and resource aware key partitioner to decrease the execution time of MapReduce jobs. Using sampling, our algorithm finds the distribution of keys in intermediate data. Then, considering aforementioned distribution, the amount of each key on each machine, the placement of Reduce tasks on machines, and the network bandwidth between machines, our algorithm assigns keys to Reduce tasks to decrease the total execution time of job. We implemented this approach in Hadoop and experiments show that our approach can improve completion time of Reduce phase by up to 51% compared with Hadoop default partitioner and can find the solution within 9% of ideal partitioner
- Keywords:
- Big Data ; Hadoop ; Big Data Proccessing ; Locality Awareness ; Network Awareness ; Map Reduce Processing
- محتواي کتاب
- view