Loading...

Using Data Variety in Progressive Processing of Big Data in Cloud Environment

Ahmadvand, Hossein | 2019

310 Viewed
  1. Type of Document: Ph.D. Dissertation
  2. Language: Farsi
  3. Document No: 51935 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Goudarzi, Maziar
  7. Abstract:
  8. Nowdays a large number of companies are faced with Big Data processing. Due to the lack of budget or time, it may not be possible to process all the input data. Data variety is one of the main features of Big Data. Considering Data variety can help us to solve the problem of resource capacity. In this research, we focused on the impact of data variety on the performance of Big Data processing. In the first part, we offer a solution for increasing the performance of progressive Big Data processing. We provide a simple low-overhead mechanism to quickly assess the significance of each data portion, and show its effectiveness in finding the best ranking of data portions. We continue by demonstrating how this ranking is used in resource allocation to improve time and cost by up to 24% and 9% respectively. In the second part, we extended this solution and offered a framework to covercome the lack of budget and processing time constraints. In this framework we assigned more significant parts of data to the more efficient resources and increased the quality of result (QoR) with respect to budget constraint and preferred finishing time. This approach improves the QoR up to 15%, compared with the state of the art. In the third part of research, we have presented a framework for approximation in Big Data processing. This framework causes a good performance in case of data variety and data skew. The experimental results show that our approach surpasses the state of the art and improves processing time up to 17X compared to ApproxHadoop and 8X compared to Sapprox when the user can tolerate an error bound of 5% with 95% confidence
  9. Keywords:
  10. Big Data ; Progressive Processing ; Data Variety ; Data Significance ; Resources Allocation ; Cloud Environment

 Digital Object List

 Bookmark

No TOC