Optimal Control of a Robotic System Using Deep Reinforcement Learning

Khadem Haqiqiyan, Behrad; Sayyaadi, Hassan

Please enable javascript in your browser.

Optimal Control of a Robotic System Using Deep Reinforcement Learning

Khadem Haqiqiyan, Behrad | 2024

0 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 57178 (08)
University: Sharif University of Technology
Department: Mechanical Engineering
Advisor(s): Sayyaadi, Hassan
Abstract:
Robots were designed to aid humans in tasks that were repetitive and/or dangerous. Classical robotic control methods (such as PIDs) show little adaptability in difficult tasks. Deep reinforcement learning is a machine learning approach for finding an optimized agent via trial and error. This research explores the application of deep reinforcement learning (DRL) algorithms to perform a pick and place task with a robotic arm attached to a moving platform. The study focuses on the use of state-of-the-art RL algorithms, including Truncated Quantile Critics (TQC) and Hindsight Experience Replay (HER), to train an agent in a simulated environment. The paper discusses the robotic environment, the task, the training agent, and presents the results obtained. The findings demonstrate the effectiveness of the RL algorithms in enabling the agent to learn and execute the manipulation task successfully. The research also highlights the importance of the chosen reward function in enhancing the sample efficiency of the training algorithm. The paper concludes with proposed future works, including the use of non-holonomic bases for the mobile platform and the exploration of agents with recurrent neural networks for improved performance
Keywords:
Deep Reinforcement Learning ; Robotics ; Intelligent Robotics ; Artificial Intelligence (AI)in Robotics ; Robotic Systems ;

Digital Object List

محتواي کتاب
view

Bookmark

Binder1
- Thesis-final revision
- 2اظهارنامه
- Thesis-final revision
- Thesis-final revision
  - 1. پیش‌گفتار
    - 1-1. رباتیک و هوشمند‌سازی
      - 1-1-1. پیش‌گفتاری بر رباتیک
      - 1-1-2. ربات‌های هوشمند
      - 1-1-3. یادگیری عمیق (DL)
    - 1-2. یادگیری تقویتی
      - 1-2-1. یادگیری تقویتی عمیق18F (DRL)
      - 1-2-2. روش‌های یادگیری تقویتی
    - 1-3. چالش‌های یادگیری تقویتی در رباتیک
    - 1-4. پژوهش پیشنهادی
  - 2. مفاهیم پایه
    - 2-1. فرایند تصمیم‌گیری مارکوف محدود 32F
      - 2-1-1. خاصیت مارکوف33F
      - 2-1-2. زنجیره یا فرایند مارکوف
      - 2-1-3. پاداش و نتایج
      - 2-1-4. فرایند پاداش مارکوف38F
      - 2-1-5. تابع ارزش39F و تابع سیاست40F
      - 2-1-6. معادلهٔ بلمن42F برای تابع ارزش
      - 2-1-7. فرایند تصمیم‌گیری مارکوف
      - 2-1-8. جمع‌بندی
    - 2-2. مروری بر الگوریتم‌های مهم در یادگیری تقویتی عمیق
      - 2-2-1. الگوریتم‌های خانوادهٔ 47F Deep Deterministic Policy Gradient (DDPG)
      - 2-2-2. الگوریتم Soft Actor-Critic (SAC)
- Thesis-final revision
  - 2. مفاهیم پایه
    - 2-2. مروری بر الگوریتم‌های مهم در یادگیری تقویتی عمیق
      - 2-2-3. الگوریتمTruncated Quantile Critics (TQC)
      - 2-2-4. الگوریتم Hindsight Experience Replay (HER)
    - 2-3. شبکه‌های عصبی
      - 2-3-1. پیش‌گفتار
      - 2-3-2. شیوهٔ کلی کارکرد شبکه‌های عصبی
- Thesis-final revision
  - 2. مفاهیم پایه
    - 2-3. شبکه‌های عصبی
      - 2-3-3. بهینه‌ساز ADAM
  - 3. مرور ادبیات
    - 3-1. پیش‌گفتار
    - 3-2. کاربردهای یادگیری تقویتی عمیق در رباتیک
    - 3-3. جمع‌بندی
  - 4. مدل‌سازی محیط و عامل یادگیری تقویتی
    - 4-1. پیش‌گفتار
    - 4-2. محیط یادگیری تقویتی
      - 4-2-1. تعریف ربات URDF
      - 4-2-2. ربات پاندا
      - 4-2-3. شبیه‌ساز فیزیکی Bullet (کتاب‌خانهٔ PyBullet)
      - 4-2-4. کتاب‌خانهٔ OpenAI GYM
      - 4-2-5. ربات تعریف‌شده برای پژوهش
      - 4-2-6. تعریف سناریو و تابع پاداش
      - 4-2-7. جمع‌بندی
    - 4-3. عامل یادگیری تقویتی
- Thesis-final revision
  - 5. نتیجه‌گیری
    - 5-1. پیش‌گفتار
    - 5-2. نتایج به دست آمده
      - 5-2-1. میانگین پاداش
      - 5-2-2. درصد موفقیت
      - 5-2-3. عملکرد تابع پاداش
      - 5-2-4. بررسی خروجی عامل
      - 5-2-5. نقاط ضعف
    - 5-3. پیشنهاد‌های پژوهشی آتی
  - 6. مراجع و منابع
- Binder1.pdf
  - rl-paper-004
    - I. Introduction
    - II. Preliminaries
      - A. Reinforcement Learning (RL)
      - B. Deep Reinforcement Learning (DRL)
    - III. Environment
      - A. Environemnt properties
      - B. Task and Reward
    - IV. Algorithm
      - A. Truncated Quantile Critics (TQC)
      - B. Hindsight Experience Replay (HER)
      - C. Implementation
    - V. Results
    - VI. Discussion
    - VII. Conclusions and Future Works
      - References

Friend's email
Your name
Your email
enter code