Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 57907 (52)
- University: Sharif University of Technology, International Campus, Kish Island
- Department: Science and Engineering
- Advisor(s): Hemmatyar, Ali Mohammad Afshin; Ghafourian Ghahramani, Amir Ali
- Abstract:
Text summarization and classification are two important tasks in natural language processing. Text summarization involves condensing a piece of text into its main points, making it easier to understand. On the other hand, text classification involves categorizing a text into predefined categories based on its content. Text summarization can be achieved through various methods, such as extractive summarization, where key sentences or phrases are extracted from the original text. The present research aims to classify and summarize news texts in the Persian Daily News dataset. This objective is carried out in two stages. First, the texts in this dataset are classified using the ParsBERT model. Then, the SBERT model handles rank of these texts in order to do the extractive summarization method. According to the obtained results, the ParsBERT model achieves an accuracy of over 98% in news classification. Additionally, based on the findings from the ROUGE metric related to the implementation of the SBERT model, the summarized text and the original text in all four news categories—scientific, economic, political, and sports—significantly overlap. This substantial overlap demonstrates the effectiveness of the model used. The results of the current research on the Pasokh dataset, which serves as a baseline for testing, demonstrate an improvement in the performance of extractive text summarization. Specifically, there is an approximate increase of 9% and 12% in F1-score for ROUGE-1 and ROUGE-2 metrics, respectively, compared to previous works. Also it is considerable that common sentences between main summary and predicted ones over Pasokh dataset is between 69 to 89 % for four category.
- Keywords:
- Extractive Summarization ; Bidirectional Encoder Representations from Transformers (BERT)Model ; Pasokh Dataset ; Persian Daily News ; ParsBERT Model ; News Classification
-
محتواي کتاب
- view
