Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 58520 (02)
- University: Sharif University of Technology
- Department: Mathematical Sciences
- Advisor(s): Mahdavi Amiri, Nezamoddin
- Abstract:
- Learning is the process of transforming experience into expertise. Machine learning is the automation of this process by a machine, where data is received as input and learned expertise is produced as output. This becomes especially important when the volume of input data is large and it is impossible for humans to detect the patterns within it. In this thesis, we study a precise mathematical model of learning, known as Probably Approximately Correct (PAC), that was first introduced by Valiant, who received the Turing Award for his research in this area. Based on the PAC model, the notion of learnability of a problem is formalized, and we observe that not all problems are learnable, and even those that are learnable are not necessarily solvable efficiently. Along this path, we examine the important class of convex learning problems, which encompasses most efficiently learnable problems. This is due to the existence of efficient optimization algorithms for convex optimization problems. We introduce two important subclasses of convex learning, Lipschitz and smooth problems, and demonstrate their learnability, which relies on the existence of a stable learner that minimizes the regularized empirical risk using the Tikhonov regularizer. After establishing the learnability of these two central subclasses, we turn to two randomized algorithms well-suited for solving the convex optimization problems associated with them. The first algorithm is stochastic gradient descent (SGD), which can be seen as a randomized version of gradient descent. Gradient descent is a greedy algorithm that locally reduces the value of a function by moving in the negative gradient direction, eventually reaching a local or global minimizer. In its stochastic counterpart, the update direction is a random vector whose expectation aligns with the negative gradient, or more generally, with the negative of a vector in the subdifferential of the function. We show convergence of this algorithm for strongly convex objective functions, which in turn implies convergence for optimization problems arising in Lipschitz and smooth convex learning. The second algorithm, closely related to SGD, is stochastic dual coordinate ascent (SDCA). In this method, the primal optimization problem is replaced by its Fenchel dual, which is then solved via a randomized version of coordinate ascent. We present an analysis of the convergence of the duality gap for this algorithm and show that it enjoys strong theoretical guarantees, competitive with those of stochastic gradient descent. To validate these theoretical results, we conduct experiments on suitable benchmark problems, evaluate the performance of the algorithms, and compare them against each other
- Keywords:
- Machine Learning ; Learning Theory ; Convex Optimization ; Nonlinear Optimization ; Randomized Algorithm ; Stochastic Gradient Descent ; Stochastic Dual Coordinate Ascent ; Fenchel Duality ; Learnability
