Loading...

A novel key partitioning schema for efficient execution of MapReduce applications

Nasehi Basharzad, S ; Sharif University of Technology | 2018

621 Viewed
  1. Type of Document: Article
  2. DOI: 10.1109/CADS.2017.8310681
  3. Publisher: Institute of Electrical and Electronics Engineers Inc , 2018
  4. Abstract:
  5. MapReduce and its open source implementation, Hadoop, are the prevailing platforms for big data processing. MapReduce is a simple programming model for performing large computational problems in large-scale distributed systems. This model consists of two major phases: Map and Reduce. Between these two main phases, partitioner part is embedded which distributes produced keys by Map tasks among Reduce tasks. When the amount of keys and their associated values, which are called intermediate data, is huge, this part has significant impact on execution time of Reduce tasks, and consequently, completion time of jobs. In this paper, we present a network and resource aware key partitioner to decrease the execution time of MapReduce jobs. Using sampling, our algorithm finds the distribution of keys in intermediate data. Then, considering aforementioned distribution, the amount of each key on each machine, the placement of Reduce tasks on machines and the network bandwidth between machines, our algorithm assigns keys to Reduce tasks to decrease the total execution time of job. Our experiments show that our approach can improve completion time of Reduce phase and job execution time by up to 52% and 31% respectively compared with Hadoop default partitioner and can find the solution within 8% of ideal partitioner. © 2017 IEEE
  6. Keywords:
  7. MapReduce ; Computer architecture ; Data handling ; Multiprocessing systems ; Network architecture ; Open source software ; Computational problem ; Hadoop ; Large-scale distributed system ; Map-reduce ; Open source implementation ; Partitioner ; Performance ; Programming models ; Big data
  8. Source: 19th International Symposium on Computer Architecture and Digital Systems, CADS 2017, 21 December 2017 through 22 December 2017 ; Volume 2018-January , March , 2018 , Pages 1-6 ; 9781538643792 (ISBN)
  9. URL: https://ieeexplore.ieee.org/document/8310681