Some Model-free Discrete Reinforcement Learning Algorithms

Please enable javascript in your browser.

Yousefizadeh, Hossein | 2021

472 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 54227 (02)
University: Sharif University of Technology
Department: Mathematical Sciences
Advisor(s): Daneshgar, Amir
Abstract:
In this thesis, we review some methods related to model-free discrete reinforcement learning and their corresponding algorithms. Our main goal is to present existing methods in an integrated and formal setup, without compromising their mathematical accuracy or comprehensibility. We have done our best to fix the inconsistencies existing in notations and definitions appearing in different areas of the vast literature. We discuss dynamic programming methods, including policy iteration and value iteration and temporal difference methods as well as policy-based methods such as policy gradient, advantage actor-critic, TRPO, and PPO. Among value-based methods, we discuss Q-learning and C51 where we also review some intermediate methods which use the ideas of both approaches, such as DDPG And SAC. We use examples when necessary to clarify the concepts and methods. Finally, to summarize, we provide some comparative analysis of these methods discussed
Keywords:
Reinforcement Learning ; Deep Learning ; Machine Learning ; Dynamic Programming ; Discrete Reinforcement Learning