Loading...

Approximation MapReduce Algorithms for Some Geometric Problems

Aghamolaei, Sepideh | 2025

0 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 57911 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Ghodsi, Mohammad
  7. Abstract:
  8. The challenge of designing massively parallel data structures is to create geometric data structures with possibly approximation queries and size sublinear in the input size which can be built in parallel and be used to answer a large number of simultaneous queries efficiently. In the MapReduce model for big data analysis, a number of machines independently process the data in synchronous rounds and have one-way communications after each round. The efficiency of algorithms in theoretical models for MapReduce are the number of machines (L), the memory of each machine (m), and the number of rounds (R). The constraints of Map Reduce Class (MRC) and Massively Parallel Computation (MPC) models which are most commonly used in cloud computing are as follow: The number of machines is strongly sublinear, each machine has sublinear memory, and the number of rounds is polylogarithmic. Most existing methods in this model are based on summarization, for example, locality-sensitive hashing, vector sketches, core-sets, and random sampling. Many of these methods are also useful in the streaming model.We review the most important results of this thesis:1) We introduce the problem of heat map sorting for reducing the number of dimensions and outliers, and show that the problem is NP-hard in general, then we give a polynomial-time algorithm for a special case of the problem and show that the hardness of connectivity in MapReduce extends to this special case and give fixed-parameter (using the number of clusters as the parameter) and approximation algorithms for this special case. For summarizing points, this method has advantages such as dimensionality reduction and preserving clusters, compared to core-sets.2) We break the problem of density-based clustering into several sub-problems: We solve the problem of approximate nearest neighbor counting range query with threshold in a constant number of rounds in MapReduce, assuming that the threshold is constant. For the problem of computing the connected components of a unit disk graph, in the special case where the number of required circles for covering clusters fits inside the memory of one machine and the radius can be increased to 3 times the input radius, we give a constant-round MapReduce algorithm. In general, if the edges of a graph are distributed in such a way that for the set of vertices in a machine, the edges of a connected subgraph with those vertices are inside the same machine (if such a subgraph exists), then, the problem of graph connectivity in MapReduce can be solved in a constant number of rounds.3) We improve the time required to compute the threshold range query for the length of a polyline from linear to logarithmic, then we use it to solve the popular places problem and we give a MapReduce algorithm for a special case of the problem.4) Using approximation algorithms, we reduce the memory required to solve range nearest neighbor queries for each point in the computation of Yao graph geometric spanner from linear to sublinear and we present a constant-round MapReduce algorithm for the problem
  9. Keywords:
  10. Core Sets ; Approximation Methods ; Map-Reduce Algorithm ; Streaming Algorithm ; Computational Geometry ; Data Stream Processing

 Digital Object List

 Bookmark

...see more