Using Deep Neural Networks in Reinforcement Learning

Sahaf Naeini, Alireza; Soleymani Baghshah, Mahdieh Rabiei, Hamidreza

Please enable javascript in your browser.

Using Deep Neural Networks in Reinforcement Learning

Sahaf Naeini, Alireza | 2017

923 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 50968 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Soleymani Baghshah, Mahdieh; Rabiei, Hamidreza
Abstract:
Reinforcement learning is a field of machine learning which is more similar to human training procedures.It uses reward signals to train an agent designed to act in that environment. Deep neural networks enhance the agent’s ability to determine and act better in its complex environment. Most previous works have addressed model-free agents, which ignore modeling details of the environment that in turn can be used to achieve better results. On the other hand, humans utilize a model-based approach in their decision-making process. They use their knowledge to predict the future and choose the action that leads them to a better state. To combine the benefits of model-based and model-free designs, we propose a compound network of reward and video frame prediction in order to estimate the model of the environment. We use this model to predict the future based on current state and desired action of the agent. We show that our approach can model the environment with less error than the existing model-based approaches in Atari environment,paving the way for future innovation in model-based agent research
Keywords:
Reinforcement Learning ; Deep Neural Networks ; Feature Extraction ; Video Prediction

Digital Object List

محتواي کتاب
view

Bookmark

فهرست شکل‌ها
فهرست جدول‌ها
مقدمه
- تعریف مساله
- اهمیت
- کاربرد
- چالش‌ها
- معیارهای ارزیابی
- نوآوری‌های این پژوهش
- جمع‌بندی و ساختار پایان‌نامه
مفاهیم اولیه
- یادگیری تقویتی
- اجزاء مسئله یادگیری تقویتی
  - سیگنال امتیاز
  - عامل و محیط
  - حالات و تاریخچه
- بهره‌برداری و جست‌وجو
- روش‌های پاسخ به مسائل یادگیری تقویتی
  - روش تکرارشونده ارزش
  - روش تکرارشونده سیاستی
- تقریب توابع
- شبکه‌های عصبی عمیق
  - انواع شبکه‌های عمیق
- جمع‌بندی
روش‌های پیشین
- راهکارهای پیشین استفاده از شبکه‌های عصبی در یادگیری تقویتی
- روش‌های پیشین پیش‌بینی تصویر
- استفاده از روش‌های مبتنی بر پیش‌بینی تصویر در یادگیری تقویتی
- جمع‌بندی
راهکار پیشنهادی
- محیط و دادگان مورد استفاده
- پیش آموزش شبکه DQN
- عامل آینده‌نگر
  - شبکه پیش‌بینی تصویر
  - شبکه تخمین امتیاز
  - شبکه بدون مدل
  - شبکه عامل آینده‌نگر
- جمع‌بندی
آزمایش‌ها
- مجموعه‌دادگان
  - شبیه‌ساز
  - openAI Gym
  - بازی Freeway
  - بازی Breakout
  - دادگان تهیه شده از بازی‌ها
- معیار ارزیابی
  - معیار ارزیابی مورد استفاده در عامل‌ها
  - معیار ارزیابی مورد استفاده در شبکه پیش‌بینی امتیاز
- نتایج اعمال روش پیشنهادی
  - روش پیش آموزش شبکه DQN
  - نتایج آموزش شبکه پیش‌بینی تصویر
  - نتایج آموزش شبکه پیش‌بینی امتیاز
  - شبکه دسته‌بند
  - شبکه برچسب گذار امتیاز
- تحلیل نتایج
- جمع‌بندی
جمع‌بندی و کارهای آتی
- جمع‌بندی
- کارهای آتی
  - بهبود عملکرد جست‌وجوی عامل
  - تولید داده و کمتر نمودن نیاز به تعامل با محیط
  - استفاده از ویژگی‌های دیگر بازی
  - پیش‌بینی بازنمایی تصویر
  - عمومیت عامل
  - استفاده از روش‌های برنامه‌ریزی
مراجع

Friend's email
Your name
Your email
enter code