Semantic Clustering of Persian Verbs

Aminian, Maryam; Sameti, Hossein

Please enable javascript in your browser.

Semantic Clustering of Persian Verbs

Aminian, Maryam | 2012

647 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 43312 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Sameti, Hossein
Abstract:
Semantic classification of words based on unsupervised learning methods is a challenging issue in computational lexical semantics. The goal of this field of study is to recognize the words that are in the same semantic classes; i.e., can have the same set of arguments. Among all word categories, verb is known as one the most important and is assumed as the central part of the sentence in certain linguistic theories such as case grammar and dependency grammar. Based on Levin’s idea, diathesis alternations and the similarity between these alternations are the clues for the semantic classification of verbs. This idea is verified in languages such as English and German with promising results. However, there are many challenges and ambiguities such as verb inflections, training data sparsity and ambiguities in verb complement recognition. Those challenges have made the task very hard and the difficulties are known to be more in Persian language because of the lack of sufficient training data and the large number of verb inflections. In this thesis, after reviewing the challenges in the task of semantic classification of Persian verbs, previous works in this field is investigated and reviewed. Finally, methods of semantic clustering of verbs in Persian are tested by incorporating the use of Persian dependency treebank. Among all methods, spectral clustering algorithm outperformed KMeans algorithm with 3 units of F-score. In addition to the syntactic information, the semantic information in FarsNet is also used in our experiments. Since there were no evaluation data, a manual classification of 265 common Persian verbs into 30 classes is built based on Levin’s idea. Furthermore, a language model is built based on the results of the semantic clustering of Persian verbs. The achieved language model has a lower perplexity than the baseline language model. This model is designed in order to improve an ASR system for Persian language. In the evaluation section, we show that the best performing language model with the lowest perplexity is achieved by spectral clustering algorithm.
Keywords:
Persian Language ; Unsupervised Learning ; Semantic Classification ; Language Model ; Lexical Semantics ; Persian Verbs ; Levin Theory

Digital Object List

محتواي پايان نامه
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code