Improving Data Efficiency in Predictive Reinforcement Learning in Non-stationary Environments

Rimaz, Mohammad Sadra; Nobakhti, Amin

Please enable javascript in your browser.

Improving Data Efficiency in Predictive Reinforcement Learning in Non-stationary Environments

Rimaz, Mohammad Sadra | 2024

0 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 57570 (05)
University: Sharif University of Technology
Department: Electrical Engineering
Advisor(s): Nobakhti, Amin
Abstract:
One type of non-stationary environment studied in reinforcement learning involves environments where only one model from a limited set is valid at each time step. The weighted mixture policy method in these environments uses predictions of future model changes to increase cumulative rewards. This method creates a new policy by combining the weighted optimal policy of the current model with that of the future model, after receiving a prediction and before the model change. Obtaining the optimal policy for models is a time-consuming process requiring a large amount of data. This thesis examines existing reinforcement learning methods in non-stationary environments. It demonstrates that the weighted mixture policy can be trained with non-optimal policies and that data efficiency can be improved through simultaneous training. A new method for allocating training data to policies is proposed to achieve simultaneous training. This new method is then applied to the reference tracking problem in the cartpole system and the Van der Pol system, with its results statistically compared to previous methods. This comparison shows that using this method increases data efficiency
Keywords:
Reinforcement Learning ; Data Efficiency ; Predictive Policy ; Weighted Mixture Policy ; Non-Stationary Environments

Digital Object List

محتواي کتاب
view

Bookmark

مقدمه
- پیشگفتار
- تعریف مسئله
- مرور پژوهش‌های پیشین
- اهمیت موضوع
- نوآوری
- ساختار پایان‌نامه
مفاهیم اولیه
- چارچوب ریاضی یادگیری تقویتی
  - تعاریف
- روش‌های مورد استفاده در مسائل یادگیری تقویتی
- محیط‌های غیرایستا
- سیاست آمیخته وزن‌دار
  - تعریف مسئله سیاست آمیخته وزن‌دار
  - ساختار سیاست آمیخته وزن‌دار
  - پیاده سازی
الگوریتم آموزش سیاست آمیخته وزن‌دار پیش از دستیابی به سیاست بهینه مدل‌ها
- مقدمات شبیه‌سازی
  - مسئله شبیه‌سازی
  - چارچوب شبیه‌سازی
  - الگوریتم آموزش سیاست‌ها
  - شیوه ارزیابی
  - کاوش از طریق افزودن ترم انتروپی
  - سیستم‌های مورد استفاده
- شروع آموزش WMP پیش از دستیابی به سیاست بهینه مدل‌ها
- دلایل امکان‌پذیری آموزش WMP پیش از دستیابی به سیاست بهینه مدل‌ها
  - وابستگی آموزش WMP به سیاست مدل‌هایی با پاسخ همگرا
  - بهبود عملکرد سیاست ردیابی مرجع با تابع پاداش درجه ۲
- الگوریتم پیشنهادی برای تخصیص داده‌های آموزشی به سیاست‌ها
- شبیه‌سازی
  - شبیه‌سازی اول: کنترل ارابه در سیستم آونگ معکوس
  - شبیه‌سازی دوم: سیستم ون‌درپل با دینامیک یکسان
  - شبیه‌سازی سوم: سیستم ون‌درپول با دینامیک متفاوت
- جمع‌بندی
جمع بندی
مراجع
واژه‌نامه
آزمون آماری t

Friend's email
Your name
Your email
enter code