Loading...

Persian Sentence Compression Using Constituency Analysis of Sentences

Tavakoli, Mohammad | 2018

593 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 50797 (31)
  4. University: Sharif University of Technology
  5. Department: Languages and Linguistics Center
  6. Advisor(s): Izadi, Mohammad
  7. Abstract:
  8. The amount of ever-growing generated data increases the need for automatic methods in text processing. As a solution, sentence compression and text summarization are considered as methods of managing and dealing with this huge amount of data. Additionally, Sentence compression is also useful in different text processing systems such as text summarization, news condensation, information extraction and question-answering systems. A sentence compression system takes a long sentence as input and returns a shorter form of the sentence in a way that its important information and grammaticality is preserved. In this thesis, we devise a rule-based extractive sentence compression method. This method, involves extracting deletion rules from constituency parse tree of sentences. After extracting deletion rules, we try to sort them in our system in a way that it produces the best possible results. As the final outcome, we present a Persian compression system, which is able to take any standard sentence in Persian as input and produce its compressed form with an acceptable accuracy. As the presented system performs based on finding deletion rules in the constituency parse format of the sentence, the user has an authority over the input compression rate, however, this freedom of choice over input compression rate does not necessarily mean it is always achievable. In the end, after evaluation and comparison of results, we concluded that using a pure rule-based method for achieving ideal results in a syntactically rich language like Persian, is feasible
  9. Keywords:
  10. Syntax Analysis ; Sentence Compression ; Sentence Constituency Analysis ; Rule-Based Sentences Compression ; Extractive Compression

 Digital Object List

 Bookmark

No TOC