Evaluation of NeuroEvolution of Augmenting Topologies in Cooperative Multi-Agent Learning

Iravanian, Sina; Mahdavi Amiri, Nezameddin Beigy, Hamid

Please enable javascript in your browser.

Evaluation of NeuroEvolution of Augmenting Topologies in Cooperative Multi-Agent Learning

Iravanian, Sina | 2011

941 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 41780 (02)
University: Sharif University of Technology
Department: Mathematical Sciences
Advisor(s): Mahdavi Amiri, Nezameddin; Beigy, Hamid
Abstract:
In multi-agent systems (MAS), collective behavior of autonomous agents and complexities arisen by their interactions are studied, while they are exploited to solve real-world complex problems. Machine learning methods are frequently used for problem solving in MAS, because complexities in these systems prevent a programmer to thoroughly describe the agents’ behaviors and the rules governing them.Reinforcement learning (RL) is one of the most commonly used learning methods for intelligent agents,because it does not need a model of the environment and learns agents’ policies through trial and error.Conventional RL algorithms store and update utilities for every possible state in a table. One condition for the RL algorithms to converge is that all states be visited infinitely often. Satisfying this condition and in general storing such a table is not possible for large or continuous state spaces, especially in multi-agent systems where the size of state space grows exponentially with the number of agents. For this reason, an approximation of the real table is often maintained. In this thesis, application of a family of methods called NeuroEvolution of Augmenting Topologies (NEAT) to cooperative MAS in which non-communicating agents decide independently is studied and evaluated. The evolved neural networks are used as function approximators in the agents’ RL algorithms. The topology and connection weights of the neural networks are evolved through NEAT. The algorithms are evaluated in two test-beds: predator-prey and grid-world soccer. Empirical results in the predator-prey environment convey that neural network controllers evolved by NEAT, through cooperative co-evolutionary learning, reaches the optimal policy faster than other methods, while HyperNEAT team learning with 3D substrate demonstrates a more reliable team behavior in on-line scenarios. Two methods are proposed in the grid-world soccer environment, which exploit geometrical properties of the game to learn team strategies. One method, which makes use of a 4D substrate to represent team strategies, learns the optimal team policies in a short time, and scores a significantly higher goal difference. Another interesting advantage of the proposed method is that it can be scaled to a larger environment, more players, and different team formations with no further learning required
Keywords:
Multiagent System ; Learning Coordination ; Multi-Agent Reinforcement Learning ; Neuro-Evolutionary Methods

Digital Object List

محتواي پايان نامه
view

Bookmark

مقدّمه
- سیستم‌های چندعاملی
- یادگیری چند عاملی
- راه‌کارهای رایج برای یادگیری عامل‌های همکار
- مساله‌ی تخصیص اعتبار
- جمع‌بندی و ساختار پایان‌نامه
یادگیری تقویتی
- مقدّمه
- یادگیری تقویتی
  - یادگیری تقویتی تک‌عاملی
  - یادگیری تقویتی چندعاملی
  - نظریه‌ی بازی‌ها و کاربرد آن در یادگیری تقویتی چندعاملی
- چالش‌های یادگیری تقویتی در محیط‌های چندعاملی
  - سودمندی‌های یادگیری تقویتی چندعاملی
  - چالش‌های یادگیری تقویتی چندعاملی
- انتخاب هدف
- مساله‌ی رشد ابعاد
  - یادگیری تقویتی پیمانه‌ای
  - معرّفی اجمالی شبکه‌های عصبی پیش‌خورد
  - Q-یادگیری با استفاده از شبکه عصبی پیش‌خورد
- الگوریتم‌های یادگیری تقویتی چندعاملی
  - دسته‌بندی الگوریتم‌های یادگیری تقویتی چندعاملی
- نتیجه گیری
تکامل شبکه‌های عصبی با استفاده از روش‌های عصبی-تکاملی
- مقدّمه
- الگوریتم‌های تکاملی
- NEAT
  - ساختار کروموزوم‌های NEAT
  - عملگرهای جهش در NEAT
  - عملگر ترکیب در NEAT
  - گونه‌بندی
  - الگوریتم NEAT
- HyperNEAT
  - شبکه‌های تولید الگوهای ترکیبی
  - نگاشتن الگوی فضایی به الگوی اتّصالات
  - پیکربندی شبکه‌ی بستر
  - تفکیک‌پذیری شبکه‌ی بستر
  - الگوریتم HyperNEAT
- نتیجه‌گیری
یادگیری هماهنگی با استفاده از خانواده الگوریتم‌های NEAT
- مقدّمه
- محیط‌های شبیه‌سازی
  - محیط شکار و شکارچی
  - محیط فوتبال دنیای مشبّک
- روش‌های یادگیری هم‌تکاملی همکارانه
- تکامل شبکه‌ی کنترل‌کننده‌ی چندلایه
  - پیکربندی ورودی‌ها و خروجی‌ها در محیط شکار و شکارچی
  - استفاده از یادگیری هم‌تکاملی همکارانه
  - درج شبکه‌های کنترل‌کننده در کنار یکدیگر در یک بستر
- تکامل شبکه‌ی کنترل‌کننده‌ی هندسی
  - شبکه‌ی منطبق بر پیکربندی محیط
- استفاده از شبکه‌ی بستر سه بعدی و بالاتر
  - یادگیری شبکه‌ی بستر 3 بعدی با CPPN 6 بعدی در محیط شکار و شکارچی
  - یادگیری شبکه‌ی بستر 4 بعدی با CPPN 8 بعدی در محیط فوتبال
- جمع‌بندی
آزمایش‌ها
- مقدّمه
- محیط شکار و شکارچی
  - پارامترهای مورد استفاده
  - تکامل شبکه‌های NEAT با استفاده از یادگیری هم‌تکاملی همکارانه
  - تکامل شبکه‌ی بستر کنترل‌کننده‌ی چندلایه از طریق HyperNEAT و یادگیری هم‌تکاملی همکارانه
  - درج شبکه‌های کنترل‌کننده عامل‌ها در کنار یکدیگر در یک بستر
  - یادگیری تیم با استفاده از شبکه‌ی بستر سه بعدی
  - جمع‌بندی
- محیط شبیه‌ساز فوتبال دنیای مشبّک
  - پارامترهای مورد استفاده
  - تکامل شبکه‌ی بستر منطبق بر پیکربندی محیط
  - تکامل شبکه‌ی بستر چهاربعدی، منطبق بر پیکربندی محیط و نقش بازیکنان
  - جمع‌بندی
جمع‌بندی و کارهای آینده
- مقدّمه
- نوآوری‌های پایان‌نامه
- پیشنهادها و کارهای آتی
شبیه‌ساز فوتبال دنیای مشبّک
- معرّفی
- اجزای محیط شبیه‌سازی
  - شبیه‌ساز مدل محیط
  - نمایشگر
  - پنل کنترلی
  - رابط شبکه
- قواعد شبیه‌ساز
  - از بین بردن تداخل در جای‌گیری بازیکنان
  - تعیین بازیکن صاحب توپ
  - از بین بردن تداخل در پاس دادن
- قراردادهای ارتباطی بین بازیکنان و شبیه‌ساز

Friend's email
Your name
Your email
enter code