Loading...

Locomotion Control of Bipedal Robot Using Reinforcement Learning Based on Model Predictive Control

Dehghani, Mohsen | 2024

0 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 57356 (08)
  4. University: Sharif University of Technology
  5. Department: Mechanical Engineering
  6. Advisor(s): Taheri, Alireza
  7. Abstract:
  8. The exploration of bipedal robot locomotion is primarily motivated by a variety of sociological and commercial imperatives, such as the aspiration to substitute humans in hazardous occupations (e.g., mining, nuclear power plant inspections, military operations) and advancements in dynamic control for purposes like rehabilitation with robotic assistance and nerve stimulation. The complexity arising from a high number of degrees of freedom, intricate nonlinear dynamics, and persistent challenges in modeling ground interactions pose significant obstacles in the development of control strategies for bipedal systems. With the increasing integration of robots into diverse sectors like education and therapy, there is a growing demand to imbue robots with quasi-human capabilities, including bipedal movement proficiency. In this study, an algorithm called "MPC based TD3" was developed based on the combination of reinforcement learning and predictive control methods. This algorithm was used to control the movement of a seven-link bipedal robot. Comparative analysis between this algorithm and established reinforcement learning algorithms such as DDPG, SAC, and TD3 revealed notably superior outcomes in controlling bipedal robot movement. The results showed that the received reward under the trained policy using the developed algorithm increased by 5\% in a smooth path without obstacles and by 67\% in a path with obstacles. Furthermore, to validate the efficacy of the developed algorithm, its performance was assessed in addressing two additional challenges: controlling a two-link arm to reach a specified target and guiding a car along a mountainous trajectory. According to the obtained results, the proposed algorithm performed better than the benchmark algorithm TD3 in controlling the two-link arm, achieving maximum reward in half the number of training steps and reaching an optimal policy. Additionally, although the powerful algorithm TD3 was unable to solve the problem of controlling the movement of a car on a mountainous path, the proposed algorithm reached an optimal policy for transferring the car to the top of the mountain after 100 iterations
  9. Keywords:
  10. Nonlinear Control ; Hybrid Dynamical System ; Control Barrier Function ; Safe Policies ; Robust Model Predictive Control ; Reinforcement Learning ; Bipedal Robot ; Predictive Control

 Digital Object List

 Bookmark

...see more