Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 43972 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Ghassem Sani, Gholamreza
- Abstract:
- Event extraction is one of the important tasks in Natural Language Processing (NLP). Many NLP applications like question answering, information extraction and summarization need to have some knowledge about events of input documents. There are several definitions for events in NLP domains. In this dissertation, the event is viewed as an element in a network of temporal information. Therefore, the project has been based on the ISO-TimeML specification language, which is the standard scheme for temporal information processing in natural texts. Event extraction based on ISO-TimeML has been performed for a number of languages including English, French, Spanish, and Korean. However, for Persian language, there has not been any prior effort. In this project, we developed a system for event extraction in Persian text based on the ISO-TimeML scheme. For this purpose, a corpus of events has been produced. Before developing the corpus, we adapted and translated the ISO-TimeML guideline for Persian. In our system for event extraction, two methods for detecting events and identifying their semantic class, which is the most important attribute of events, has been developed: a rule-based and a learning-based method. Considering these methods are the first effort toward event extraction in Persian, the experimental results show that they are comparable to the successful methods in other languages. We achieved 83.1% F1-measure with the learning-based method in event detection
- Keywords:
- Persian Language ; Event Extraction ; Classification ; Natural Language Processing ; Temporal Information Extraction ; Time Markup Language
-
محتواي پايان نامه
- view