Loading...

Generating Text from Abstract Meaning Representation in Persian

Kakaei, Farokh | 2020

585 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 52718 (31)
  4. University: Sharif University of Technology
  5. Department: Languages and Linguistics Center
  6. Advisor(s): Rahimi, Saeed; Bahrani, Mohammad
  7. Abstract:
  8. This research mainly aims to propose, for the first time, a way of generating text from Abstract Meaning Representation (AMR) in Persian. AMR is a rather new way of representing the meaning of natural language sentences, that captures the various semantic components in a rooted, directed, acyclic graph. Generating text from AMR is a challenging task in natural language processing as some syntactic constructs are abstracted away from the representation, resulting in one single AMR having multiple translations. Considering many applications of generating text from meaning representations in natural language processing it seems inevitable to design some methods for converting such representations to text. Several statistical methods have been proposed to generate text from AMR in English. More recent methods that are based on deep neural networks have shown higher performance. Inspired by them, in this research three models based on deep neural networks - including a recurrent sequence-to-sequence model, a Transformer sequence-to-sequence model, and a graph-to-sequence model with Graph Convolutional Network encoder model - are proposed for generating Persian text from Persian AMRs. Two linearization methods, a BFS traverse and a DFS traverse, are used to linearize input graphs in sequence-to-sequence models. The neural models require large enough AMR train datasets to generalize well on unseen data. Considering the lack of such large enough AMR dataset in Persian to train these models, a rule-based model for generating text recursively from (sub)graphs in a DFS traverse was also developed. Additionally, a simple, innovative algorithm for automatically augmenting Persian AMR datasets was invented and applied in two ways to the main dataset. First it was applied just to the train partition, and then, to the entire data. Thus, two datasets were created from the main dataset to help study the impact of augmentation on neural models’ performance. The results showed that the neural models’ performance was low understandably due to the small size of the train dataset, and was not comparable with results of previous models in English. But the rule-based model achieved 15.30 BLEU, 46.07 METEOR, and 48.10 CHRF++ scores on the main dataset. Applying the augmentation algorithm to train dataset slightly improved the results of some neural models, but applying it to the entire dataset caused significant improvement in all neural models’ performance, showing that with sufficient data, neural models could achieve the highest performance in generating text from AMR in Persian
  9. Keywords:
  10. Deep Neural Networks ; Natural Language Generation System ; Abstract Meaning Representation ; Persian Language Generation ; Data Augmentation

 Digital Object List

 Bookmark

No TOC