Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 55171 (02)
- University: Sharif University of Technolog
- Department: Mathematical Sciences
- Advisor(s): Alishahi, Kasra
- Abstract:
- Reinforcement learning (RL) is a subfield of machine learning that expresses how to learn optimal actions in a wide range of unknown environments. Reinforcement learning problems are often phrased in terms of Markov decision processes (MDPs). However, being restricted to Markov environments to solve problems with limited state space is not an unreasonable assumption, but the main challenge is to consider these problems in as large a class of environments as possible, which includes any challenges that an agent may face in real world. Such agents are able to learn to play chess, wash dishes, invest in financial markets, and do many tasks that an intelligent human being can learn and do. In this thesis we go beyond Markov decision processes and consider reinforcement learning in non-Markovian, non-ergodic and partially observable environments. Such environments are called general environments and the field of science that studies these problems is called General Reinforcement Learning. At first our focus is not on practical algorithms, but rather on the fundamental underlying problems: How can we measure intelligence? How do we balance exploration and exploitation? How do we explore optimally? When is an agent optimal? At the first part we express the theory of the sequence prediction problem that aims learning from data that is not independent and identically distributed. So we collect theorems from artificial intelligence and algorithmic information theory and place them in the context of reinforcement learning to demonstrate how an agent can learn the value of its policy. In the next section, we will introduce intelligent agents, and at the top of them, the Bayesian Reinforcement Learning agent, AIXI, and other agents, each of which somehow tries to cover the weaknesses of AIXI. After that, we will talk about the optimality of these agents and what optimality means. Then we turn to the negative results obtained for Bayesian agents, especially AIXI. We show that unlucky or adversarial choices of the prior cause the agent to misbehave drastically. But there are other Bayesian methods for general reinforcement learning that guarantee optimality in other aspects, such as asymptotic optimality. For this purpose, we show that Thomson sampling is asymptotically optimal in stochastic environments and the value of its policy converges to the optimal value.
- Keywords:
- Thompson Sampling ; General Reinforcement Learning ; Asymptotic Optimality ; Legg-Hotter Intelligence ; Markov Decision Making ; AIXI Agent