Loading...

Designing a deep neural network model for finding semantic similarity between short persian texts using a parallel corpus

Hosseini Moghadam Emami, Z. S ; Sharif University of Technology | 2021

266 Viewed
  1. Type of Document: Article
  2. DOI: 10.1109/ICWR51868.2021.9443108
  3. Publisher: Institute of Electrical and Electronics Engineers Inc , 2021
  4. Abstract:
  5. Text processing, as one of the main issues in the field of artificial intelligence, has received a lot of attention in recent decades. Numerous methods and algorithms are proposed to address the task of semantic textual similarity which is one of the sub-branches of text processing. Due to the special features of the Persian language and its non-standard writing system, finding semantic similarity is an even more challenging task in Persian. On the other hand, producing a proper corpus that can be used for training a model for finding semantic similarities, is of great importance. In this study, the main purpose is to propose a method for measuring the semantic similarity between short Persian texts. To do so, first, we try to build an appropriate corpus, and then propose an efficient approach based on neural networks. The proposed method involves three steps. The first step is data collection and building a parallel corpus. In the next step, namely the pre-processing step, the data is normalized. Finally, Semantic similarity recognition is done by the neural network using vector representations of the words. The suggested model is built upon the produced corpus made of movie and tv show subtitles containing 35266 sentence pairs. The F-measure of the proposed approach on PAN2016 is 75.98% with 4 tags and 98.87% with 2 tags. We also achieved an F-measure of 98.86% for our model tested on the parallel corpus with 2 tags. © 2021 IEEE
  6. Keywords:
  7. Deep neural networks ; Semantic Web ; Semantics ; Text processing ; Neural network model ; Parallel corpora ; Persian languages ; Pre-processing step ; Semantic similarity ; Textual similarities ; Vector representations ; Writing systems ; Neural networks
  8. Source: 7th International Conference on Web Research, ICWR 2021, 19 May 2021 through 20 May 2021 ; 2021 , Pages 91-96 ; 9781665404266 (ISBN)
  9. URL: https://ieeexplore.ieee.org/document/9443108