Multidocument Keyphrase Extraction Using Recurrent Neural Networks

Doostmohammadi, Ehsan; Sameti, Hossein Bokaei, Mohammad Hadi

Please enable javascript in your browser.

Multidocument Keyphrase Extraction Using Recurrent Neural Networks

Doostmohammadi, Ehsan | 2019

498 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 51848 (31)
University: Sharif University of Technology
Department: Languages and Linguistics Center
Advisor(s): Sameti, Hossein; Bokaei, Mohammad Hadi
Abstract:
Keyphrase extraction, as an important open problem of Natural Language Processing (NLP), is useful as a stand-alone task in the field of Information Extraction and as an upstream task for Information Retrieval, text summarization and classification,etc. In this study, regarding the needs in Persian NLP, artificial neural networks are adopted to extract keyphrases from single documents and a graph-based re-scoring method is proposed for multidocument keyphrase extraction. The proposed method for extracting keyphrases from multiple documents consists of two steps: (1) extracting keyphrases of each document in a cluster using a sequence to sequence model with attention, and (2) re-scoring the extracted keyphrases using an unsupervised graph-based method in a way that the keyphrases related to all of the documents score higher. The main problem with neural networks is their need for a huge amount of training data, which is solved using relatively high-quality keyphrases from news websites and agencies. Another corpus of 101 clusters of news is additionally labeled for measuring the performance of the multidocument phase. Since sequence to sequence models are able to capture absent keyprhases, the problem of keyphrase generation is addressed in this research as well. In the single-document phase, the deep model has obtained an F1-score of 50.59%, while the best baseline model could only achieve 21.73%. The deep model has also performed well in the task of keyphrase generation. The proposed re-scoring method has resulted in 4.1% increase in F1-score in the multidocument phase with k of 10
Keywords:
Multidocument Keyphrase Extraction ; Keyphrase Extraction ; Keyphrase Generation ; Recurrent Neural Networks ; Sequence to Sequence Learning ; Deep Learning

Digital Object List

محتواي کتاب
view

Bookmark

مقدمه و معرفی
- استخراج و تولید عبارت‌های کلیدی
- بیان مسئلهٔ پژوهش
- اهداف پژوهش و ایده‌های جدید
- چالش های پیش رو
- فصل‌های پایان‌نامه
- خلاصهٔ فصل
پیشینهٔ پژوهش و بحث‌های نظری
- دیباچه
- استخراج عبارت‌های کلیدی
  - رویکردهای نظارت‌شده
  - رویکردهای نظارت‌نشده
- بحث‌های نظری شبکه‌های عصبی
  - شبکه‌های عصبی تمام‌متصل
  - جاسازی واژگان
  - یادگیری شبکه
  - شبکه‌های عصبی بازگشتی
  - مدل رمزنگار-رمزگشا
  - سازوکار توجه
- استخراج و تولید عبارت‌های کلیدی با استفاده از شبکهٔ عصبی
- معرفی روش‌های پایه
- استخراج عبارت‌های کلیدی در زبان فارسی
- خلاصهٔ فصل
روش پیشنهادی
- دیباچه
- استخراج و تولید تک‌سنده
  - نحوهٔ خوراندن داده به شبکه در این پژوهش
- استخراج و تولید چندسنده
  - بازامتیازدهی عبارت‌های کلیدی تولیدشده
  - الگوریتم پیج‌رنک
  - تشکیل گراف بازامتیازدهی
  - حذف خبرها و عبارت‌های کلیدی تکراری
- کوتاه درمورد پیاده‌سازی
- خلاصهٔ فصل
تجزیه و تحلیل داده‌ها
- دیباچه
- دادهٔ استخراج تک‌سنده
  - تهیه و پاکسازی دادگان
  - کیفیت‌سنجی انسانی
  - توصیف آماری دادهٔ زیرمجموعه
- دادهٔ استخراج چندسنده
- خلاصهٔ فصل
آزمایش‌ها و نتایج
- دیباچه
- معیار ارزیابی
- نتایج روش‌های پایه برروی کل داده
  - روش‌های پایه
  - نتایج برروی عبارت‌های کلیدی حاضر و غایب
  - دلایل ادامهٔ کار برروی دادهٔ زیرمجموعه
- نتایج استخراج و تولید تک‌سنده برروی دادهٔ زیرمجموعه
  - تنظیمات شبکهٔ عصبی
  - نتایج با معیار دقت، بازخوانی و امتیاز اف-۱
  - ایرادهای وارد بر این معیار ارزیابی
  - نتایج با معیار ROUGE
- نتایج استخراج و تولید چندسنده
- خلاصهٔ فصل
جمع‌بندی و پیشنهادها
- دیباچه
- جمع‌بندی کار کنونی
  - خلاصهٔ کار و فرضیات پژوهش
  - نتیجه‌گیری
- پیشنهادها برای کارهای آینده
کتاب‌نامه
واژه‌نامهٔ فارسی به انگلیسی
واژه‌نامهٔ انگلیسی به فارسی
پیوست: ریز نتایج

Friend's email
Your name
Your email
enter code