Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 51295 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Ghassem-Sani, Gholamreza
- Abstract:
- Grammar induction is an important area of natural language processing. There are two general methods for recognizing the syntactic structure: constituency and dependency parsing. The unique nature of the dependency parsing, in which the word order does not affect the syntactic structure of the sentence, make it an appropriate option for parsing free-word-order languages such as the Persian. In this thesis, dependency-based methods are used to parse Persian sentences. Manual induction of the grammar is a time-consuming and tedious task. However, machine learning algorithms facilitated this task to a great deal. One of the most effective algorithms in this field is the deep neural networks that are used in this thesis for feature extraction. This thesis takes advantage of some of the specific characteristics of Persian to improve the accuracy of parsing. Firstly, Persian is a morphologically rich language, in which creating word embeddings capable of identifying semantic relationships between words is hard. To mitigate this issue, subword information is used to create word embeddings for the Persian. Secondly, there are a considerable number of long dependencies in Persian sentences. Also, as the number of the long dependencies increases, the accuracy of the transition-based parsers decreases. Thus, in this thesis, a new transition system, which is capable of identifying non-projective trees and directly attaching long dependencies, is proposed. In the last step, this method is evaluated using two Persian dependency datasets: Dadegan and UPDT. The LAS and UAS achieved through running on Dadegan are 90.66% and 87.55%, respectively, which are higher than the accuracy of several other methods. The LAS and UAS achieved through running on UPDT are 83.08% and 79.13%, respectively, which are lower than the accuracy of other existing parsers
- Keywords:
- Natural Language Processing ; Deep Learning ; Grammar Induction ; Parsing Algorithms ; Transion System
-
محتواي کتاب
- view
