Loading...

A Task-Based Greedy Scheduling Algorithm for Minimizing Energy of MapReduce Jobs

Yousefi, M.H.N ; Sharif University of Technology | 2018

650 Viewed
  1. Type of Document: Article
  2. DOI: 10.1007/s10723-018-9464-0
  3. Publisher: Springer Netherlands , 2018
  4. Abstract:
  5. MapReduce and its open source implementation, Hadoop, have gained widespread adoption for parallel processing of big data jobs. Since the number of such big data jobs is also rapidly rising, reducing their energy consumption is increasingly more important to reduce environmental impact as well as operational costs. Prior work by Mashayekhy et al. (IEEE Trans. Parallel Distributed Syst. 26, 2720–2733, 2016), has tackled the problem of energy-aware scheduling of a single MapReduce job but we provide a far more efficient heuristic in this paper. We first model the problem as an Integer Linear Program to find the optimal solution using ILP solvers. Then we present a task-based greedy scheduling algorithm, TGSAVE, to select a slot for each task to minimize the total energy consumption of the MapReduce job for big data applications in heterogeneous environments without significant performance loss while satisfying the service level agreement (SLA). We perform several experiments on a Hadoop cluster to measure characteristics of tasks for nine different applications to evaluate our proposed algorithm. The results show that the total energy consumption of MapReduce jobs obtained by TGSAVE is up to 35% less than that achieved by EMRSA proposed in Mashayekhy et al. (IEEE Trans. Parallel Distributed Syst. 26, 2720–2733, 2016), its closest rival, for same workloads. Besides, TGSAVE is capable of finding a solution in same order of time for up to 74% tighter deadlines than the tightest deadline that EMRSA can find a feasible one. On average, TGSAVE solution is approximately 1.4% far from the optimal solution, and it can meet deadlines as tight as 12%, on average, above the energy-oblivious minimum makespan in the benchmarks we examined. © 2018, Springer Nature B.V
  6. Keywords:
  7. Big data ; Energy-aware ; MapReduce ; Clustering algorithms ; Data handling ; Energy utilization ; Environmental impact ; Integer programming ; Job shop scheduling ; Multiprocessing systems ; Optimal systems ; Power management ; Scheduling ; Scheduling algorithms ; Energy aware ; Greedy scheduling algorithms ; Heterogeneous environments ; Heterogeneous systems ; Map-reduce ; Open source implementation ; Service Level Agreement (SLA) ; Total energy consumption
  8. Source: Journal of Grid Computing ; Volume 16, Issue 4 , 2018 , Pages 535-551 ; 15707873 (ISSN)
  9. URL: https://link.springer.com/article/10.1007/s10723-018-9464-0