Loading...

Big Data Application Performance Prediction and Heterogeneous Resource Recommendation in Cloud

Aseman Manzar, Mohammd Mohsen | 2020

561 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 53350 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Movaghar Rahim Abadi, Ali
  7. Abstract:
  8. In the last years, we have seen a rapid growth of data generation that leads to an increasing need for data analysis and big data technologies. Meanwhile, cloud computing witnessed great popularity and has become one of the best platforms for big data analytics due to its flexibility and scalability. However, there is not an obvious method for choosing the right cluster configuration for running big data jobs in clouds. Because different big data applications, workloads and cloud configurations can lead to different costs. Moreover, a heterogeneous cluster can use the benefits of distinguished properties of each machine to reduce the cost and execution time of the big data application. The purpose of this thesis is to propose a way to choose the best cluster configuration based on a given budget and with the help of a cost function. For a wise decision, we need to search in the configuration space. And that can not be possible without a good performance model for each cluster choices. So in this thesis, we first propose two gray-box heterogeneity-aware performance models based on linear programming for DAG-based big data applications. We validate the accuracy of these two models with the well-known TPC-DS benchmark and achieve the prediction accuracy of 95.82. Then we propose two approximation methods for the first model so that we manage the complexity of the model for bigger clusters. Besides, the second model has a polynomial complexity and does not need approximations. In the end, we propose two algorithms to recommend cloud resources based on the proposed performance models, one minimizes the execution time for a given budget, and the other one aiming at minimizing the cost
  9. Keywords:
  10. Cloud Environment ; Heterogeneous Resources ; Apache Spark ; Big Data ; Directed Acyclic Graph (DAG)-Based Applications ; Resource Recommendation ; Big Data Application Performance Prediction

 Digital Object List

 Bookmark

No TOC