Loading...
Normalization of Non-standard Texts for Persian language Using Neural
Networks
Seyyedi, Javad | 2017
1528
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 49425 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Sameti, Hossein
- Abstract:
- The purpose of this research is to normalize non-standard persian texts. We proposed a method to transfigure the texts with any non-standard structure into a formal and standard form. One of the major complications of the text normalization is the large variety of non-standard structures, and the fact that these diversities could not be classified in one constructional pattern. Furthermore, the concept of text normalization, in different situations, has multiple different definitions, and any of this settings needs a distinct normalization method. Supervised learning methods are not suitable for normalization due to variety of both standard and non-standard texts as well as the absence of parallel corpus in such texts. In this work, a normalization method is introduced which is capable of translating different texts using various normalization criteria. The method suggested in this research does not rely on parallel corpus as this technique is based on unsupervised learning. This technique, which is based on Semantic Distance, is capable of achieving higher accuracy compared to similar methods by only using a few of the translated words. Our experiments suggest that the proposed method can approximately make other prominent techniques such as Semantic Similarity, irrelvant. Our method has an accuracy of 73% for normalizing non-standard words and creating a list of 5 equivalent standard words, which is a 38% improvement compared to the previous state-of-the-art method
- Keywords:
- Unsupervised Learning ; Natural Language Processing ; Persian Text Processing ; Neural Networks ; Deep Networks ; Normalization Persian Texts ; Nonstrand Texts
-
محتواي کتاب
- view
- 1 مقدمه
- 2 پژوهشهای پیشین
- 3 روش پیشنهادی جهت هنجارسازی متون غیراستاندارد
- 4 آزمایشها و نتایج
- 4.1 مقدمه
- 4.2 نحوه ارزیابی
- 4.3 ارزیابی روشهای پایه
- 4.4 روش پیشنهادی این پژوهش
- 4.4.1 نحوه پیادهسازی بازنمایی کلمات
- 4.4.2 روش پیشنهادی اولیه
- 4.4.3 استفاده از روش دسته پیوستهای از کلمات
- 4.4.4 استفاده از تکنیک یک به چند
- 4.4.5 هنجارسازی با استفاده از عبارات
- 4.4.6 هنجارسازی با استفاده از پیکره متون استاندارد
- 4.4.7 محاسبه مقادیر بهینه برای ضرایب امتیازدهی
- 4.4.8 محاسبه حدی جهت تعیین عدم نیاز به ترجمه
- 4.4.9 استفاده از خوشهبندی جهت تعیین عدم نیاز به ترجمه
- 4.5 استفاده از فنون پیشنهادی
- 4.6 جمعبندی
- 5 جمعبندی و کارهای آتی
- مراجع
- واژهنامهی فارسی به انگلیسی
- واژهنامهی انگلیسی به فارسی
