Locomotion Control of Bipedal Robot Using Reinforcement Learning Based on Model Predictive Control

Dehghani, Mohsen; Taheri, Alireza

Please enable javascript in your browser.

Locomotion Control of Bipedal Robot Using Reinforcement Learning Based on Model Predictive Control

Dehghani, Mohsen | 2024

0 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 57356 (08)
University: Sharif University of Technology
Department: Mechanical Engineering
Advisor(s): Taheri, Alireza
Abstract:
The exploration of bipedal robot locomotion is primarily motivated by a variety of sociological and commercial imperatives, such as the aspiration to substitute humans in hazardous occupations (e.g., mining, nuclear power plant inspections, military operations) and advancements in dynamic control for purposes like rehabilitation with robotic assistance and nerve stimulation. The complexity arising from a high number of degrees of freedom, intricate nonlinear dynamics, and persistent challenges in modeling ground interactions pose significant obstacles in the development of control strategies for bipedal systems. With the increasing integration of robots into diverse sectors like education and therapy, there is a growing demand to imbue robots with quasi-human capabilities, including bipedal movement proficiency. In this study, an algorithm called "MPC based TD3" was developed based on the combination of reinforcement learning and predictive control methods. This algorithm was used to control the movement of a seven-link bipedal robot. Comparative analysis between this algorithm and established reinforcement learning algorithms such as DDPG, SAC, and TD3 revealed notably superior outcomes in controlling bipedal robot movement. The results showed that the received reward under the trained policy using the developed algorithm increased by 5\% in a smooth path without obstacles and by 67\% in a path with obstacles. Furthermore, to validate the efficacy of the developed algorithm, its performance was assessed in addressing two additional challenges: controlling a two-link arm to reach a specified target and guiding a car along a mountainous trajectory. According to the obtained results, the proposed algorithm performed better than the benchmark algorithm TD3 in controlling the two-link arm, achieving maximum reward in half the number of training steps and reaching an optimal policy. Additionally, although the powerful algorithm TD3 was unable to solve the problem of controlling the movement of a car on a mountainous path, the proposed algorithm reached an optimal policy for transferring the car to the top of the mountain after 100 iterations
Keywords:
Nonlinear Control ; Hybrid Dynamical System ; Control Barrier Function ; Safe Policies ; Robust Model Predictive Control ; Reinforcement Learning ; Bipedal Robot ; Predictive Control

Digital Object List

محتواي کتاب
view

Bookmark

مقدمه
- تعریف مسئله
- اهمیت موضوع
- ادبیات موضوع
- اهداف پژوهش
- ساختار پایان‌نامه
مفاهیم اولیه
- حرکت
- دینامیک حرکت
- قیود حاکم بر حرکت
- یادگیری تقویتی
- کنترل پیش‌بین مدل
کارهای پیشین
- تولید مسیر مبنا
  - تولید مسیر مبنا بر اساس داده‌های مبتنی بر حرکت انسان
  - تولید مسیر به کمک توابع چندجمله‌ای
  - تولید مسیر به روش نقطه گشتاور صفر
- روش‌های کنترل راه رفتن ربات دوپا بر مبنای مدل‌سازی
  - نقطه گشتاور صفر و مدل پاندول معکوس خطی
  - مدل پاندول معکوس غیرخطی
  - پاندول معکوس بارگذاری شده با فنر
- سیستم دینامیک ترکیبی
  - راه رفتن با دینامیک غیرفعال
  - دینامیک گسسته ضربات پا
- دینامیک صفر ترکیبی
- کنترل به روش یادگیری تقویتی
- جمع‌بندی
روش کنترلی
- معروف‌ترین الگوریتم‌های روش یادگیری تقویتی
  - الگوریتم سیاست قطعی عمیق
  - الگوریتم گرادیان سیاست قطعی عمیق جفت تأخیری
  - الگوریتم عملگر-نقاد نرم
- الگوریتم پیشنهادی این پژوهش
  - مرحله اول: جمع‌آوری اطلاعات طبق سیاست جاری
  - مرحله دوم: آموزش مدل و کنترل پیش‌بین
  - مرحله سوم: آموزش سیاست بر اساس الگوریتم TD3
  - بهبودهای الگوریتم پیشنهادی
نتایج شبیه‌سازی
- محیط مسئله راه‌رفتن ربات دوپا
- فضای حالات
- فضای عمل‌ها
- تابع پاداش
- نتایج حاصل از الگوریتم‌های یادگیری تقویتی
  - نتایج حاصل از الگوریتم DDPG
  - نتایج حاصل از الگوریتم SAC
  - نتایج حاصل از الگوریتم TD3
  - مقایسه نتایج الگوریتم‌های DDPG، SAC و TD3
- نتایج حاصل از الگوریتم پیشنهادی پژوهش
- مقایسه نتایج الگوریتم پیشنهادی پژوهش با الگوریتم TD3
- اعتبارسنجی الگوریتم
  - مسئله اول: کنترل بازوی دولینکی
  - مسئله دوم: کنترل حرکت ماشین در مسیر کوهی شکل
نتیجه‌گیری و پیشنهادات
- نتیجه‌گیری
- نوآوری‌ها
- پیشنهادات
مراجع

Friend's email
Your name
Your email
enter code