Design and Implementation of Distributed Dimensionality Reduction Algorithms under Communication Constraints

Rahmani, Mohammad Reza | 2021

240 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 54537 (05)
  4. University: Sharif University of Technology
  5. Department: Electrical Engineering
  6. Advisor(s): Maddah Ali, Mohammad Ali; Salehkaleybar, Saber
  7. Abstract:
  8. Nowadays we are witnessing the emergence of machine learning in various applications. One of the key problems in data science and machine learning is the problem of dimensionality reduction, which deals with finding a mapping that embeds samples to a lower-dimensional space such that, the relationships between the samples and their properties are preserved in the secondary space as much as possible. Obtaining such mapping is essential in today's high-dimensional settings. Moreover, due to the large volume of data and high-dimensional samples, it is infeasible or insecure to process and store all data in a single machine. As a result, we need to process data in a distributed manner.In this dissertation, we aim to devise dimensionality reduction algorithms in a distributed setting. In this setting, we assume that the data is distributed among several machines and each machine has a part of the dimensions of all the samples. Furthermore, there is a central server, trying to find the mapping for dimensionality reduction. The communication budget is limited and each machine can send only up to B bits. The natural question is, with these assumptions, what is the optimal algorithm in terms of estimation error for a fixed communication budget? To address this problem, we first perform experiments for several proposed algorithms for distributed dimensionality reduction and compare their performances. These algorithms are all based on the idea that machines first send a small portion of their data to the central server and the central server obtains an initial dimensionality reduction mapping. Then at subsequence iterations of the algorithm, the machines send some information to the server to adjust its mapping.Next, we show that an upper bound on the covariance matrix estimation error is an upper bound for the error of the above-mentioned problem. Also, in some important cases, we study the problem of estimating the covariance matrix and provide theoretical bounds for the estimation error. We finally propose an algorithm for the distributed covariance matrix estimation problem, when the communication budget is limited to B bits. We will show that the estimation error of the proposed algorithm would be in the order of 1/√B, and discuss the optimality of this error rate
  9. Keywords:
  10. Machine Learning ; Distributed Algorithm ; Dimensionality Reduction ; Covariance Matrix Estimation ; Communication Overhead

 Digital Object List