Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 53994 (31)
- University: Sharif University of Technology
- Department: Languages and Linguistics Center
- Advisor(s): Rahimi, Saeed; Bahrani, Mohammad
- Abstract:
- The emergence of lexical knowledge bases such as WordNet and FarsNet foregrounded the importance of semantic annotation of words in the areas of natural language processing and corpus linguistics. The methodology in these knowledge bases is based on semantic relations and dictionary definitions of the words in coverage. Another efficient way to perform semantic annotation is by semantically classifying the lexicon of a language in a taxonomy. In this research, we build a semantic annotation system for the semantic tagging of Persian texts. This system can be used for building tools and softwares for natural language processing in applications such as text summarization, plagiarism detection and conceptual and lexical information retrieval. The USAS taxonomy contains 21 coarse-grained and 232 fine-grained semantic fields. Since this taxonomy is based on semantic information, it can be used as a universal common ground between two or more languages. Hence, so far, it has been used to construct lexical knowledge bases, and in some cases, annotation systems, for 12 languages other than English. In this research, the categories of this taxonomy will be used as semantic tags for both the Persian lexical knowledge base and Persian semantic annotation system. This knowledge base contains 5,389 words annotated by a human expert. In another part of this research, the categories and tags from the USAS will be mapped to the categories and category codes of The Persian Thesaurus for the purpose of connecting the two lexical resources. In addition to the tagged words in the lexical knowledge base, 315 sentences containing 3,199 word types were also manually annotated with attention to the textual context to be used as input for the semantic annotation system. This system is based on a bi-LSTM deep neural network model. Eventually, the model was evaluated using the k-fold cross-validation model and gained an accuracy of 65%
- Keywords:
- Natural Language Processing ; Semantic Annotation ; Semantic Role Labeling ; Lexical Taxonomy ; USAS Tagger ; Semantic Tagger ; Lexical Knowledge Base
- محتواي کتاب
- view