Loading...

SAIR: significance-aware approach to improve QoR of big data processing in case of budget constraint

Ahmadvand, H ; Sharif University of Technology | 2019

330 Viewed
  1. Type of Document: Article
  2. DOI: 10.1007/s11227-019-02797-7
  3. Publisher: Springer New York LLC , 2019
  4. Abstract:
  5. Nowadays, a wide range of enterprises are faced with big data processing in different domains such as transaction operations, business calculations and analytical computations. Large-scale computing is an approach for big data processing. Due to the cost of large-scale computing and limitations of enterprise budgets, it is hardly possible to process all the input data and therefore the Quality of Result (QoR) may be affected. SAIR is an approach to improve QoR of big data processing for aggregative usages based on significance variety when there is a budget constraint. In this paper, the most significant data portions have been assigned to the most efficient resources in terms of time and cost. If the budget is still available, other data portions have been assigned to remaining resources. In this approach, statistical methods and a sampling technique with a 95% of the confidence interval and 5% of error margin are used to identify the most and least significant data portions. By using this method, the users are able to improve QoR with respect to budget constraint and preferred finishing time. In the evaluation phase, applications from different domains such as document and text, transaction data and system logs are used. Our results indicate that SAIR improves QoR while meeting budget constraint for considered usages. This approach improves the QoR up to 15%, compared with the state of the art. © 2019, Springer Science+Business Media, LLC, part of Springer Nature
  6. Keywords:
  7. Data variety ; Quality of Result ; Big data ; Budget control ; Sampling ; Analytical computations ; Budget constraint ; Business calculations ; Confidence interval ; Large-scale computing ; Quality of results ; Significance ; Data handling
  8. Source: Journal of Supercomputing ; Volume 75, Issue 9 , 2019 , Pages 5760-5781 ; 09208542 (ISSN)
  9. URL: https://link.springer.com/article/10.1007/s11227-019-02797-7