Loading...

Conversion of Persian Colloquial Texts into Official Texts using Unsupervised Learning Methods

Akhavan Azari, Karim | 2022

92 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 56213 (31)
  4. University: Sharif University of Technology
  5. Department: Languages and Linguistics Center
  6. Advisor(s): Sameti, Hossein
  7. Abstract:
  8. Today, the production of colloquial texts in messengers, search engines, and question and answer systems has increased significantly, while text documents in other fields have a formal tone and style. Thus, there is a need for a system to convert these texts from colloquial form to the formal style. Attention to this need in non-Persian languages has also been recently and seriously felt, but almost at the time of writing, an efficient system has not been offered, and this issue requires more work in Persian than in languages such as English. In general, transferring texts from one form to another falls into the category of natural language processing applications and is called "style transfer". In this research, an attempt is made to design and implement a system that translates Persian colloquial texts into formal texts, using unsupervised deep learning networks, and in particular Transformer models. Due to the nature of this research and the lack of appropriate data, non-parallel data will be used to train the system. Various methods are used for evaluation, such as the similarity of the meaning of the input and output text, the recognition of the style of the produced text, as well as the BLEU and F1 criteria. Some other evaluation methods will also be used, which include the three areas of semantic accuracy, style transfer accuracy, and fluency of text. In the end, a two-step method consisting of the use of rewriting models and language models is presented, which provides a system that can convert the input text style from colloquial to formal without extensive changes in meaning and sentence structure, along with a complete evaluation framework. In total, the results of six unsupervised and one supervised models are presented, and the style transfer accuracy of the best methods in the unsupervised fashion reaches 40% and 65%
  9. Keywords:
  10. Persian Language Generation ; Unsupervised Learning ; Transformers ; Text Generation ; Text Paraphrasing ; Text Conversion ; Colloquial Texts

 Digital Object List