Loading...

Automatic Headline Generation for Persian News Texts

Afrasiabi, Shayan | 2013

560 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 45205 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Ghassem-Sani, Gholamreza
  7. Abstract:
  8. The news headlines should represent the main and the most important topics of their stories. The task of selecting an appropriate headline for news stories is mainly done by journalists. The goal of this project has been the design and implementation of a system to automate this task, that is generating headlines for news. This task has been done for Persian news stories. There are various methods for automatic headline generation in English and some other languages, but no work has been done for Persian, yet. Thus, we have adopted some of the ideas from those methods, and do the remaining by our initiation. Our proposed method consists of three main parts: keyword extraction, most important sentence selection, and sentence edition. Keyword extraction is the most important part among these subtasks, and is done by the means of SVM and kNN learning methods. Using machine learning methods needs a corpus including a large number of documents (news stories) and their corresponding headlines. Thus, we have built a corpus of about 6,000 documents, and trained our system on this corpus. After extracting keywords, we should indicate the most important sentence using extracted keywords. Finally, we focus on editing the chosen sentence in order to shorten it as much as possible. The headlines generated by our system have been evaluated in order to find out how accurate the system has performs its task. We have evaluated the output of the system by comparing it against main headline set by some journalists, with the so-called BLEU and ROUGE metrics. The results of testing our system on about 1,000 new documents show that the SVM-based method has genereted more acceptable headlines than that of other implemented methods
  9. Keywords:
  10. Natural Language Processing ; Summarization ; Support Vector Machine (SVM) ; Automatic Headline Generation ; Keyword Extaction

 Digital Object List

 Bookmark

No TOC