Loading...
Improving Data Efficiency in Predictive Reinforcement Learning in Non-stationary Environments
Rimaz, Mohammad Sadra | 2024
0
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 57570 (05)
- University: Sharif University of Technology
- Department: Electrical Engineering
- Advisor(s): Nobakhti, Amin
- Abstract:
- One type of non-stationary environment studied in reinforcement learning involves environments where only one model from a limited set is valid at each time step. The weighted mixture policy method in these environments uses predictions of future model changes to increase cumulative rewards. This method creates a new policy by combining the weighted optimal policy of the current model with that of the future model, after receiving a prediction and before the model change. Obtaining the optimal policy for models is a time-consuming process requiring a large amount of data. This thesis examines existing reinforcement learning methods in non-stationary environments. It demonstrates that the weighted mixture policy can be trained with non-optimal policies and that data efficiency can be improved through simultaneous training. A new method for allocating training data to policies is proposed to achieve simultaneous training. This new method is then applied to the reference tracking problem in the cartpole system and the Van der Pol system, with its results statistically compared to previous methods. This comparison shows that using this method increases data efficiency
- Keywords:
- Reinforcement Learning ; Data Efficiency ; Predictive Policy ; Weighted Mixture Policy ; Non-Stationary Environments
- محتواي کتاب
- view
- مقدمه
- مفاهیم اولیه
- الگوریتم آموزش سیاست آمیخته وزندار پیش از دستیابی به سیاست بهینه مدلها
- جمع بندی
- مراجع
- واژهنامه
- آزمون آماری t