Improving Data Efficiency in Predictive Reinforcement Learning in Non-stationary Environments

Please enable javascript in your browser.

Rimaz, Mohammad Sadra | 2024

0 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 57570 (05)
University: Sharif University of Technology
Department: Electrical Engineering
Advisor(s): Nobakhti, Amin
Abstract:
One type of non-stationary environment studied in reinforcement learning involves environments where only one model from a limited set is valid at each time step. The weighted mixture policy method in these environments uses predictions of future model changes to increase cumulative rewards. This method creates a new policy by combining the weighted optimal policy of the current model with that of the future model, after receiving a prediction and before the model change. Obtaining the optimal policy for models is a time-consuming process requiring a large amount of data. This thesis examines existing reinforcement learning methods in non-stationary environments. It demonstrates that the weighted mixture policy can be trained with non-optimal policies and that data efficiency can be improved through simultaneous training. A new method for allocating training data to policies is proposed to achieve simultaneous training. This new method is then applied to the reference tracking problem in the cartpole system and the Van der Pol system, with its results statistically compared to previous methods. This comparison shows that using this method increases data efficiency
Keywords:
Reinforcement Learning ; Data Efficiency ; Predictive Policy ; Weighted Mixture Policy ; Non-Stationary Environments