Loading...

Fixed-point iteration approach to spark scalable performance modeling and evaluation

Karimian Aliabadi, S ; Sharif University of Technology | 2021

306 Viewed
  1. Type of Document: Article
  2. DOI: 10.1109/TCC.2021.3119943
  3. Publisher: Institute of Electrical and Electronics Engineers Inc , 2021
  4. Abstract:
  5. Companies depend on mining data to grow their business more than ever. To achieve optimal performance of Big Data analytics workloads, a careful configuration of the cluster and the employed software framework is required. The lack of flexible and accurate performance models, however, render this a challenging task. This paper fills this gap by presenting accurate performance prediction models based on Stochastic Activity Networks (SANs). In contrast to existing work, the presented models consider multiple work queues, a critical feature to achieve high accuracy in realistic usage scenarios. We first introduce a monolithic analytical model for a multi-queue YARN cluster running DAG-based Big Data applications that models each queue individually. To overcome the limited scalability of the monolithic model, we then present a fixed-point model that iteratively computes the throughput of a single queue with respect to the rest of the system until a fixed-point is reached. The models are evaluated on a real-world cluster running the widely-used Apache Spark framework and the YARN scheduler. Experiments with the common transaction-based TPC-DS benchmark show that the proposed models achieve an average error of only 5.6% in predicting the execution time of the Spark jobs. IEEE
  6. Keywords:
  7. Analytical models ; Big data ; Iterative methods ; Job analysis ; Queueing theory ; Stochastic models ; Stochastic systems ; Wool ; Activity network ; Apache spark ; Approximation techniques ; Big data framework ; Computational modelling ; Data framework ; Fixed-point iteration methods ; Performances evaluation ; State-space explosion ; Stochastic activity network ; Stochastics ; Task analysis ; Yarn
  8. Source: IEEE Transactions on Cloud Computing ; 2021 ; 21687161 (ISSN)
  9. URL: https://ieeexplore.ieee.org/document