Loading...
Performance Modeling and Evaluation of MapReduce Applications
Karimian Aliabadi, Soroush | 2021
420
Viewed
- Type of Document: Ph.D. Dissertation
- Language: Farsi
- Document No: 54745 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Movaghar Rahimabadi, Ali; Entezari Maleki, Reza
- Abstract:
- Businesses are dependent on mining of their Big Data more than ever and configuring clusters and frameworks to reach the best performance is still one of the challenges. An accurate performance prediction of the Big Data application helps reduce costs and SLA-violations with better tuning of the configuration parameters. Among the Big Data frameworks, Hadoop, Tez, and Apache Spark are the widely used and popular ones, with the MapReduce and graph-based workflows, usually running on top of the YARN cluster. While a great number of attempts have been made to predict the execution time of Big Data applications, to the best of our knowledge, none of them considered multiple simultaneous YARN queues and users in the underlying layer which, by the way, play a great role in the cluster's performance. We presented a set of different models with different levels of details and using various formalisms and tools. Where scalability is the main concern, we applied Lumping and Fixed-point iterations technique to decrease the solving time of the model and where building the model is difficult for changing workflow graphs, we presented hierarchical modeling. Formalisms used in this thesis include Queuing Networks, Stochastic Well-formed Nets, Stochastic Activity Networks, and Stochastic Reward Nets. Validation of model accuracy and applicability performed through real-world experiments on TPC-DS benchmark. We compared results of the proposed model in predicting the job execution time with that of measurements and reported average error. Facts and figures show acceptable accuracy and solving time. We then demonstrated the practicality of the proposed models in exemplary scenarios including per-stage analysis, makespan optimization, and performance-cost tradeoffs. While most of the thesis is focused on homogeneous infrastractures, the final part is proposing a learning approach to predict application execution time in Heterogeneous environment with the help of the Linear Programming technique
- Keywords:
- Analytical Modeling ; Performance Evaluation ; Stochastic Activity Networks ; Big Data ; Big Data Analytics ; Fixed Point Index ; Map-Reduce Algorithm
-
محتواي کتاب
- view
- چکیده
- فهرست جدولها
- فهرست تصویرها
- فصل1 - مقدمه
- فصل2 - ادبیات مسئله
- فصل3 - مروری بر کارهای انجام شده
- فصل4 - مدلسازی و ارزیابی کارایی برنامههای نگاشت-کاهش
- فصل5 – مدلسازی و ارزیابی کارایی برنامههای مبتنی بر DAG
- فصل6 - مدلسازی و ارزیابی کارایی صفهای یارن
- فصل7 – کاربرد مدلهای تحلیلی
- فصل8 - نتیجهگیری و کارهای آتی
- اختصارات
- واژگان
- مراجع
