Developing a Vision-Based Continuous Iranian Sign Language Translation System

Ghadami, Ali; Taheri, Alireza Meghdari, Ali

Please enable javascript in your browser.

Developing a Vision-Based Continuous Iranian Sign Language Translation System

Ghadami, Ali | 2023

65 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 56238 (08)
University: Sharif University of Technology
Department: Mechanical Engineering
Advisor(s): Taheri, Alireza; Meghdari, Ali
Abstract:
Sign language is an essential means of communication for millions of people around the world and serves as their primary language. However, most communication tools and technologies are designed for spoken and written languages, which can create barriers and limitations for the deaf community. By creating a sign language recognition system, we can bridge this communication gap and enable people who use sign language as their primary mode of expression to communicate better with people and their surroundings. This sign language recognition system increases the quality of education, the quality of health services, improves public interactions and creates equal opportunities for the deaf community. In this research, an attempt will be made to continuously recognize Iranian sign language with the help of the latest machine learning tools such as transformer networks. The first step in this research is to collect Iranian sign language data at the word and sentence level, which is very valuable for Iranian sign language due to the lack of these data. The translation and recognition of sign language sentences has been investigated through two ways. The first path is sentence recognition through single word recognition model and adaptive windowing technique, in which genetic algorithm is used to find the optimal architecture and fuzzy controller is used to change the window length. The second path is the direct recognition of the sentence in one place. The implementing and training the models led to 90.2% accuracy for single word recognition and acceptable performance in the sentence recognition section with windowing and direct methods, so that 17 sentences out of 20 test sentences in the windowing method and 115 sentences out of 150 test sentences in the direct method are detected as completely correct or with only one mistake in the words. Finally, the sign language training software that allows real-time feedback to users with the help of developed models is introduced. This software, and this research in general, is an important step in the practical implementation of sign language recognition models in the real world, which can greatly help the deaf
Keywords:
Deep Learning ; Artificial Intelligence ; Machine Vision ; Genetic Algorithm ; Sequence to Sequence Translation ; Persian Sign Language

Digital Object List

محتواي کتاب
view

Bookmark

چکیده
فهرست جدول‌ها
فهرست تصویرها
فصل1 مقدمه
- 1-1 اهمیت موضوع و پژوهش
- 1-2 اهداف پژوهش
- 1-3 ساختار گزارش
فصل2 مفاهیم پایه
- 2-1 تعریف ایما
- 2-2 تقسیم بندی تشخیص ایما بر مبنای ماهیت داده
  - 2-2-1 تشخیص به کمک سنسورها
  - 2-2-2 تشخیص به کمک تصویر
  - 2-2-3 تشخیص ترکیبی
- 2-3 انواع پیش‌پردازش داده‌ها
  - 2-3-1 نرمال‌سازی و فیلتر کردن
  - 2-3-2 استخراج ویژگی
  - 2-3-3 انتخاب ویژگی
- 2-4 شبکه‌ها و معماری‌های مورد استفاده
  - 2-4-1 شبکه‌های پرسپترون چند لایه12F12F
  - 2-4-2 شبکه‌ عصبی پیچشی
    - 2-4-2-1 لایه کانولوشن
    - 2-4-2-2 لایه ادغام
  - 2-4-3 شبکه‌های بازگشتی19F19F
    - 2-4-3-1 معماری حافظه طولانی کوتاه مدت21F21F
    - 2-4-3-2 معماری واحد برگشتی دروازه‌ای27F27F
  - 2-4-4 شبکه ترنسفورمر28F28F
  - 2-4-5 شبکه مولد متخاصم36F36F
  - 2-4-6 شبکه ترنسفورمر تصویری41F41F
  - 2-4-7 تابع هزینه42F42F CTC
- 2-5 منطق فازی
  - 2-5-1 توابع تعلق فازی
  - 2-5-2 پایگاه قوانین فازی
- 2-6 الگوریتم ژنتیک
- 2-7 معیارهای ارزیابی
  - 2-7-1 WER
  - 2-7-2 BLEU
فصل3 مرور ادبیات
- 3-1 مقدمه
- 3-2 انواع ترجمه زبان اشاره
  - 3-2-1 ترجمه ایزوله زبان اشاره
  - 3-2-2 ترجمه پیوسته زبان اشاره
- 3-3 ساختارهای مورد استفاده برای تشخیص و مدلسازی زبان اشاره
  - 3-3-1 یادگیری ماشین
    - 3-3-1-1 ماشین بردار پشتیبان
    - 3-3-1-2 آنالیز مولفه‌های اصلی
  - 3-3-2 مدل‌های مخفی مارکوف53F53F
  - 3-3-3 یادگیری عمیق
    - 3-3-3-1 شبکه‌های باور عمیق
    - 3-3-3-2 شبکه‌های عصبی پیچشی
    - 3-3-3-3 شبکه‌های بازگشتی
    - 3-3-3-4 شبکه‌های عصبی پیچشی بازگشتی
    - 3-3-3-5 شبکه PCANet
    - 3-3-3-6 شبکه SubUNet
    - 3-3-3-7 شبکه‌های ترنسفورمر
  - 3-3-4 مدل‌های ترکیبی
- 3-4 مروری بر تلاش‌های صورت گرفته برای زبان اشاره ایرانی
- 3-5 جمع بندی
فصل4 مجموعه دادگان مورد استفاده
- 4-1 مجموعه داده زبان اشاره در سطح کلمه
- 4-2 مجموعه داده زبان اشاره در سطح جمله
فصل5 تشخیص جمله به کمک تشخیص تک کلمات
- 5-1 مقدمه
- 5-2 پیش پردازش
  - 5-2-1 تشخیص دست و صورت در فیلم
  - 5-2-2 استخراج نقاط کلیدی دست
  - 5-2-3 استخراج نقاط لب‌ها
  - 5-2-4 استخراج ویژگی از موقعیت دست‌ها
- 5-3 مدل‌ پیشنهادی برای تشخیص تک کلمه
  - 5-3-1 ساختار داده‌ها
  - 5-3-2 مدل ادغام دیرهنگام76F76F
    - 5-3-2-1 پارامترهای مدل
  - 5-3-3 مدل ادغام زودهنگام85F85F
    - 5-3-3-1 پارامترهای مدل
  - 5-3-4 مدل نهایی ترکیبی
- 5-4 بهینه‌سازی مدل‌ نهایی با الگوریتم ژنتیک
  - 5-4-1 پارامترهای بهینه‌سازی
  - 5-4-2 تابع هدف
  - 5-4-3 پارامتر‌های الگوریتم
    - 5-4-3-1 جمعیت اولیه
    - 5-4-3-2 ساختار کروموزوم‌ها
    - 5-4-3-3 فرآیند انتخاب والدین
    - 5-4-3-4 نحوه تولید نسل جدید
    - 5-4-3-5 جهش
    - 5-4-3-6 فرایند جستجو
    - 5-4-3-7 تعداد نسل‌ها
- 5-5 ماژول تشخیص جمله
  - 5-5-1 تکنیک پنجره‌زنی
  - 5-5-2 تعیین طول پنجره با کنترلر فازی
- 5-6 نتایج
  - 5-6-1 تشخیص تک‌کلمه
  - 5-6-2 تشخیص جمله
- 5-7 جمع‌بندی
فصل6 تشخیص جمله به صورت مستقیم
- 6-1 مقدمه
- 6-2 پیش پردازش
- 6-3 مدل پیشنهادی
  - 6-3-1 پارامتر‌های مدل
- 6-4 نتایج
- 6-5 جمع بندی
فصل7 پیاده سازی نسخه اولیه نرم افزار آموزش زبان اشاره ایرانی
- 7-1 مقدمه
- 7-2 توضیحات پیاده‌سازی
- 7-3 جمع‌بندی
فصل8 جمع‌بندی و نتیجه‌گیری
فصل9 محدودیت‌ها و اقدامات پیشنهادی
منابع یا مراجع
پیوست1

Friend's email
Your name
Your email
enter code