Loading...

Normalization of Non-standard Texts for Persian language Using Neural
Networks

Seyyedi, Javad | 2017

1528 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 49425 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Sameti, Hossein
  7. Abstract:
  8. The purpose of this research is to normalize non-standard persian texts. We proposed a method to transfigure the texts with any non-standard structure into a formal and standard form. One of the major complications of the text normalization is the large variety of non-standard structures, and the fact that these diversities could not be classified in one constructional pattern. Furthermore, the concept of text normalization, in different situations, has multiple different definitions, and any of this settings needs a distinct normalization method. Supervised learning methods are not suitable for normalization due to variety of both standard and non-standard texts as well as the absence of parallel corpus in such texts. In this work, a normalization method is introduced which is capable of translating different texts using various normalization criteria. The method suggested in this research does not rely on parallel corpus as this technique is based on unsupervised learning. This technique, which is based on Semantic Distance, is capable of achieving higher accuracy compared to similar methods by only using a few of the translated words. Our experiments suggest that the proposed method can approximately make other prominent techniques such as Semantic Similarity, irrelvant. Our method has an accuracy of 73% for normalizing non-standard words and creating a list of 5 equivalent standard words, which is a 38% improvement compared to the previous state-of-the-art method
  9. Keywords:
  10. Unsupervised Learning ; Natural Language Processing ; Persian Text Processing ; Neural Networks ; Deep Networks ; Normalization Persian Texts ; Nonstrand Texts

 Digital Object List

 Bookmark

...see more