Context-based Persian Grapheme-to-Phoneme Conversion using Sequence-to-Sequence Models

Rahmati, Elnaz; Sameti, Hossein

Please enable javascript in your browser.

Context-based Persian Grapheme-to-Phoneme Conversion using Sequence-to-Sequence Models

Rahmati, Elnaz | 2022

107 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 56283 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Sameti, Hossein
Abstract:
Many Text-to-Speech (TTS) systems, particularly in low-resource environments, struggle to produce natural and intelligible speech from grapheme sequences. One solution to this problem is to use Grapheme-to-Phoneme (G2P) conversion to increase the information in the input sequence and improve the TTS output. However, current G2P systems are not accurate or efficient enough for Persian texts due to the language’s complexity and the lack of short vowels in Persian grapheme sequences. In our study, we aimed to improve resources for the Persian language. To achieve this, we introduced two new G2P training datasets, one manually-labeled and the other machine-generated, containing over five million sentences and their corresponding phoneme sequences. Additionally, we proposed two new evaluation datasets for Persian sub-tasks such as Kasre-Ezafe detection, homograph disambiguation, and out-of-vocabulary words. Finally, we developed a new sentence-level end-to-end model to address the challenges of the Persian language. This model was trained using a two-step method, introduced in this thesis, to maximize the impact of manually-labeled data. Our results showed that our model outperformed the state-of-the-art by 0.04% in PER, 1.86% in WER, 4.03% in Kasre-Ezafe Recall, and 3.42% in homograph disambiguation accuracy using the data and metrics proposed in this work
Keywords:
Semi-Supervised Learning ; Converter ; Grapheme to Phoneme Transform ; End-to-End Modeling ; Text-to-Speech Converter ; Kasre-e-Ezafe

Digital Object List

محتواي کتاب
view

Bookmark

مقدمه
- تعریف مسئله
- اهمیت موضوع
- ادبیات موضوع
- اهداف پژوهش
- ساختار پایان‌نامه
مفاهیم اولیه
- مقدمه
- زبان فارسی در تبدیل نویسه به واج
  - هم‌نویسه‌های زبان فارسی
  - کسره اضافه
- معماری‌های مورد استفاده برای تبدیل نویسه به واج
  - شبکه‌ی عصبی بازگشتی
  - شبکه‌ی عصبی پیچشی
  - مدل دنباله-به-دنباله
  - مبدل
  - مدل زبانی بزرگ
- دادگان این پژوهش
  - پیکره‌ی متنی «ناب»
  - پیکره‌ی متنی «میراث»
  - پیکره‌ی متنی «پیکره»
  - مجموعه داده‌ی «فارس‌دات»
  - مجموعه داده‌ی «جهان‌بخش»
- روش‌های ارزیابی
- جمع‌بندی
کارهای پیشین
- مقدمه
- قاعده-محور
- مدل احتمالاتی
- شبکه عصبی بازگشتی
- شبکه عصبی پیچشی
- مبدل
- افزایش داده
- یادگیری انتقالی و چندزبانه
- بهبود کارایی مدل
- مدل مبتنی بر بافت
- زبان فارسی
  - تشخیص کسره اضافه
  - کلمات خارج-از-واژگان
  - سیستم‌های تبدیل نویسه به واج سر-به-سر
- جمع‌بندی
راهکار پیشنهادی
- مقدمه
- مشکلات کار‌های پیشین
- داده
  - اصلاح داده‌ی فارس‌دات بزرگ
  - تولید خودکار داده‌ی نویسه به واج فارسی
  - طراحی داده‌ی ارزیابی متناسب با نیازهای زبان فارسی
- مدل‌های بررسی شده در این پژوهش
  - مدل سر-به-سر معرفی شده
  - مدل پایه‌ی نویسه به واج چند-بخشی فارسی
  - مدل پایه‌ی کلمات خارج-از-واژگان چندزبانه
- آموزش دو مرحله‌ای معرفی شده در این پژوهش
- روش‌های ارزیابی مخصوص زبان فارسی
- ‌جمع‌بندی
آزمایش‌ها و نتایج جدید
- مقدمه
- آزمایش‌های اولیه
  - آزمایش روی داده‌های آموزشی تبدیل نویسه به واج
  - آزمایش انواع معماری‌های ممکن برای مدل مبدلی ByT5
  - آزمایش روی تاثیر اندازه‌ی پرتوها در تولید خروجی
- آزمایش نهایی مدل سر-به-سر معرفی شده
- نتایج ارزیابی و قیاس راهکار پیشنهادی با مدل‌های پایه
  - مقایسه با مدل پایه‌ی «چند-بخشی»
  - مقایسه با مدل پایه‌ی «چندزبانه»
- تبدیل نویسه به واج به عنوان یک محصول
- جمع‌بندی
نتیجه‌گیری و کارهای آتی
- خلاصه‌ی فعالیت‌های انجام شده و نتیجه‌گیری
- پیشنهادها برای کارهای آتی
مراجع
واژه‌نامه
مطالب تکمیلی

Friend's email
Your name
Your email
enter code