Combinatorial Optimization with Reinforcement Learning

Hosseini, Amir; Saleh Kaleybar, Saber

Please enable javascript in your browser.

Combinatorial Optimization with Reinforcement Learning

Hosseini, Amir | 2021

293 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 54634 (05)
University: Sharif University of Technology
Department: Electrical Engineering
Advisor(s): Saleh Kaleybar, Saber
Abstract:
One of the key subjects in the area of mathematical optimization is a class of problems known as combinatorial optimization. We can find the optimal solution of continuous optimization problems feasible in time. But, in combinatorial optimization, we aim to obtain the optimal solution of the problem over a finite set. These problems are NP-hard and no polynomial-time solution has been proposed for this class of problems so far. Thus, in practical scenarios, we often use heuristic methods for solving NP-hard problems. There are lots of heuristic methods and choosing the best one in different situations might be challenging. In recent years, with the advances in deep neural networks, researchers studied using these networks for solving combinatorial optimization problems. Because of the dynamic nature of these problems, using classic machine learning methods like supervised learning is not very effective and usually reinforcement learning methods are more preferable. In this thesis, we will propose models for solving these problems with two different approaches. In the first approach, we solve problems using machine learning directly. We first propose a model based on transformer architecture to solve capacitated vehicle routing problem with time window (CVRPTW) and train it with the Reinforce algorithm. The model produces solutions with good quality in a short time. Then, we propose a model that improves an initial solution iteratively. We use this model for solving capacitated vehicle routing problem (CVRP). The results of this model are better than heuristic methods. In the second approach, we use machine learning in a sub-routine of exact algorithms. In this way, we can reduce the run time of these algorithms considerably. We first propose a model similar to a recent proposed model which learns the branching stage of branch and bound algorithm and train it with the generative adversarial imitation learning method. Then we analyze the performance of a model which learns adding cut stage of branch and cut algorithm on traveling salesman problem with time window (TSPTW). We achieve better performance on this problem by changing the reward function of the model.
Keywords:
Combinatorial Optimization ; Deep Neural Networks ; Reinforcement Learning ; Vehicle Routing Problem ; Traveling Salesman Problem ; Mixed Integer Linear Programming ; Time Window

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code