Automatic Skill Learning Using Community Detection Approach

Ghafoorian, Mohsen; Beigy, Hamid

Please enable javascript in your browser.

Automatic Skill Learning Using Community Detection Approach

Ghafoorian, Mohsen | 2012

634 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 43885 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Beigy, Hamid
Abstract:
Reinforcement learning is a learning method that uses reward and penalty feedbacks, having no information about the right action. In this method, agent gets the state of environment and selects an action among its permissible set of actions, regarding its policy and the given state. Environment, expresses an evaluation, in form of a reinforcement signal and a change in state, as a response for agent’s action. Afterward, the agent updates its policy considering received signal in order to maximize its long term reward. Reinforcement learning rapidly converges to the optimal solution, only if there are few states and actions, but there are lots of domains that consist of too many states and actions, which cause very slow convergence.
Using temporal abstraction can address this problem for large scale environments, and make the convergence much faster, compared to conventional methods. Temporal abstraction can be used through acquisition and utilization of skills. Briefly explained, skill can be defined as a sequence of primitive actions that can be applied to reach a suitable state in the environment. From another perspective, if the environment state transition is modeled as a graph, then the boundary points of communities of this graph may be regarded as sub-goals which the agent needs to pass over them, in order to reach the goal state.
In this thesis, an algorithm is presented which makes use of ant colony optimization methods to identify sub-goal states. Initially several paths from initial to goal state are generated by ants and then the alternation of pheromone deposited by ants on edges of shortest path is analyzed. Then edges with different distribution of pheromone over time are separated and known as bottleneck edges. Next, communities consisting of fragments of shortest path are detected. Finally, useful skills are learned on each detected community using option framework. To evaluate the results of the proposed method, its performance is compared with some other skill learning methods on 4 standard benchmarks, including Grid world, Taxi, Playroom and Hanoi environments. Results acquired from experimental results, shows improvements of performance in several environments
Keywords:
Reinforcement Learning ; Skill Acquisition ; Subgoal Discovery ; Options Framework ; Ant Colony Optimization (ACO) ; Community Detection

Digital Object List

محتواي پايان نامه
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code