Loading...
Towards Unsupervised Temporal Relation Extraction Between Events
Mirroshandel, Abolghasem | 2012
804
Viewed
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 43768 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Ghassem-Sani, Gholamreza
- Abstract:
- Temporal relation classification is one of the contemporary demanding tasks in natural language processing. This task can be used in various applications such as question answering, summarization, and language specific information retrieval. Temporal relation classification methods can be categorized into three main groups of supervised, semi-supervised, and unsupervised (based on the type of the training data that they need). In this thesis, we have two main goals: first, improving accuracy of temporal relation learning, and second, decreasing supervision of algorithm as much as possible. For achieving these goals, three main steps are proposed. In the first step, we propose an improved algorithm for classifying temporal relations, using support vector machines (SVM). Along with the gold-standard corpus features, the proposed method aims at exploiting useful syntactic features, which are automatically generated, to improve the accuracy of classification. Accordingly, a number of kernel functions are introduced and evaluated for temporal relation classification. The result of experiments clearly shows that adding syntactic features results in a notable improvement in performance over the state of the art method, which merely employs gold-standard features. The second step contains two semi-supervised methods: bootstrapped cross-document classifier and active learning strategy. Bootstrapped cross-document algorithm is a weakly-supervised machine learning approach for classification of temporal relation between events. In the first stage, this algorithm learns a general classifier from an annotated corpus. Then, it applies the hypothesis of “one type of temporal relation per discourse”, and expands the scope of “discourse” from a single document to a cluster of topically-related documents. By combining global information from the cluster with local decisions from the general classifier, our novel bootstrapping generative classifier works even better than the state of the art method. In active learning strategy, we tried to reduce the annotation effort by efficiently selecting the most informative samples for labeling. This algorithm presents novel active learning strategies based on support vector machines (SVM) for temporal relation classification. A large number of empirical comparisons of different active learning algorithms and various kernel functions in SVM shows that proposed activelearning strategies are effective for the given task. Finally, in the third step, we proposed a fully generative model for temporal relation extraction based on the expectation maximization (EM) algorithm. In the EM algorithm, we used different techniques such as a greedy best-first search and the integer linear programming for temporal inconsistency removal. Our experiments show that the performance of the proposed algorithm, which is the first unsupervised algorithm in temporal relation learning, is notable
- Keywords:
- Information Retrieval ; Active Learning ; Expectation Maximazation Algorithm ; Bootstrapping ; Support Vector Machine (SVM) ; Temporal Relation Classification ; Automatic Extraction ; Event Extraction
- محتواي پايان نامه
- view