Estimating Stopping Time Using Function Approximation Algorithms in Reinforcement Learning

Please enable javascript in your browser.

Daei Naby, Ali | 2022

169 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 55257 (02)
University: Sharif University of Technology
Department: Mathematical Sciences
Advisor(s): Alishahi, Kasra; Haji Mirsadeghi, Mir omid
Abstract:
We study the expected value of stopping times in stochastic processes. Since there is no rigorous solution for computing stopping times in many processes, our approach is based on estimation using well-known methods in the Reinforcement Learning literature. The primary method in this research is the temporal difference algorithm. With some modifications, we can study the role of some state features in determining the stopping time. Moreover, without a complicated mathematical analysis, we can find functions closely enough to the goal function.Furthermore, we compare our proposed algorithm to the well-known regression methods and show our algorithm's advantages and disadvantages. The primary Markov process in this study is the Voter Model on various graphs. In some structures, we propose the exact solution of the stopping time and then evaluate the power of the algorithms in finding the goal function. In more complicated structures, in which there is no exact solution to the stopping time, we simulate the problem to understand essential features and estimate the stopping time.
Keywords:
Stochastic Process ; Markov Chain ; Reinforcement Learning ; Estimation ; Feature Vector ; Stopping Time ; Value Function ; Voter Model

No TOC