Coded Computing for Distributed Machine Learning

Jahaninezhad, Tayyebeh; Maddah Ali, Mohammad Ali

Please enable javascript in your browser.

Coded Computing for Distributed Machine Learning

Jahaninezhad, Tayyebeh | 2022

79 Viewed

Type of Document: Ph.D. Dissertation
Language: Farsi
Document No: 55521 (05)
University: Sharif University of Technology
Department: Electrical Engineering
Advisor(s): Maddah Ali, Mohammad Ali
Abstract:
Nowadays, we are forced to use distributed computing due to the growth of data, the challenge of storing and processing it, as well as the emergence of new problems in machine learning and the complexity of the models. In distributed computing, the computation is per- formed by a distributed system consisting of several worker nodes such that, the main task is divided into several smaller tasks and assigned to each worker node. Then, different worker nodes will cooperate to accomplish the main task. Although distributed systems are efficient in solving problems and dealing with the mentioned challenges, they are vulnerable to the presence of stragglers, adversarial worker nodes, high communication loads and privacy con- cerns. Coding theory has also been introduced as a way to deal with the non-ideality of some system components and to ensure the output is correct.In this dissertation, we specifically address the problem of coded computing for dis- tributed machine learning. Due to the nature of machine learning algorithms, we consider the computation from two perspectives: exact computation and approximate computation. For exact computation, in the first step, we address the problem of gradient coding in a het- erogeneous distributed system and propose a method to achieve optimal communication load in the presence of stragglers and adversarial worker nodes. We also present a trade-off be- tween communication and computation load in heterogeneous gradient coding. Next, we address a similar problem in federated learning, where each user has its private and local data-set and tries to learn a global model with the cooperation of other users without reveal- ing the information about its private data-set. We present a scheme which can significantly reduce the communication overhead, and the server communication load is within factor 1 of the cut-set lower bound. while achieves worst-case information-theoretic security against a curious server and semi-honest users. As the first problem in the approximate computation part of this dissertation, we address large-scale matrix multiplication problem which is one of the basic and important operations in machine learning. In this problem, using ideas in approximation theory and coding theory, we propose a method that reduces the total number of worker nodes required compared to the exact computation. The total number of required worker nodes is determined by the accuracy of the final result and the precision guarantee. Next, we go beyond polynomial computations in coded distributed computing, and propose a scheme for a general problem, i.e., the approximate computation of any arbitrary function over a data-set in a distributed system. Unlike most of the existing coded computing schemes, the proposed scheme is numerically stable in real numbers with low computational complex- ity. The proposed schemes is used to train a deep learning model and implementation results show its performance in terms of the rate of convergence.
Keywords:
Distributed Computing ; Coding Theory ; Approximate Computing ; Federated Learning ; Approximate Computing ; Distributed Machine Learning

Digital Object List

محتواي کتاب
view

Bookmark

Friend's email
Your name
Your email
enter code