Loading...
- Type of Document: M.Sc. Thesis
- Language: Farsi
- Document No: 52770 (19)
- University: Sharif University of Technology
- Department: Computer Engineering
- Advisor(s): Soleymani, Mahdieh
- Abstract:
- By the improvement of machine learning methods specially the Deep Learning in the last decade, there were expanding usage of these methods in Language Modeling task. As the essence of a language model is more basic, recently huge networks are trained with language model objective but fine-tuned on target tasks such as Question Answering, Sentiment Analysis and etc. which is a promising sign of its importance and usage in even other NLP tasks. However, this task still has severe problems. The Teacher Forcing based methods, suffer from the so-called exposure bias problem which is due to the train/test procedure discrepancy. Some solutions such as using Reinforcement Learning which has high variance or other approximate ones have been introduced. On the side of models with latent space, the ignorance of the latent space by the decoder is reported.The more practical task, Conditional Text Generation including determination of the tense of the output sentence or more complex conditions such as context or topic, has great importance. The usage of these models is not restricted to the text generation and they can be taken in account in creating drug molecules with specific characteristics, musics with specific genres or generating graphs with specific features. In conditional text generation, there will be additional problem which is the mismatch of generated sentence with the desired condition.In this project, two latent based perspectives in conditional text generation with discrete conditions is discussed. In the first one the latent space is completely determined by the condition value. The proposed method is a model with latent space while controlled the latent space ignorance problem and also partitioned the prior distribution on latent space with respect to the condition values. By incorporating the Normalizing Flow Networks, which have been recently in attention, the distribution of latent space of each condition is learnt. On the other perspective, the latent space can be independent from the condition. As a result it only contains the content of sentences but not the condition and so the latent space won’t be separated with respect to condition values.At the end, the baseline methods and the proposed methods are evaluated against various metrics including quality, diversity and the match percentage of generated text with the desired condition. The first proposed method outperforms other methods in the match percentage of generated text with the desired condition while also keeping the quality and diversity of samples close to the baselines. The second method have results similar to the latent based baseline while a new approach in training is incorporated and also have superior quality and diversity in some datasets
- Keywords:
- Neural Networks ; Deep Learning ; Generative Models ; Conditional Text Generation ; Generative Models with Latent Space
-
محتواي کتاب
- view
- 1 مقدمه
- 1-1 اهمیت و کاربرد
- 1-2 رویکردهای کلی مدلهای زبانی و تعریف مساله
- 1-3 مقایسه مدلهای زبانی با فضای نهان و بدون فضای نهان
- 1-4 رویکردهای آموزشی
- 1-5 چالشها
- 1-5-1 رفتار میانگین جویانه روش بیشینه درستنمایی
- 1-5-2 ناپایداری و مشکلات شبکههای تخاصمی مولد
- 1-5-3 عدم توجه به فضای نهان
- 1-5-4 گذر گرادیان در نمونهبرداری و آرگومان بیشینه یابی
- 1-5-5 عدم تطابق شرط با جمله تولیدی
- 1-6 هدف پژوهش
- 1-7 ساختار پایاننامه
- 2 پژوهشهای پیشین
- 2-1 مقدمه
- 2-1-1 پیشنیازها
- 2-2 مدلهای زبانی بدون فضای نهان
- 2-2-1 مدل زبانی پایه (جبر معلم)
- 2-2-2 مدل زبانی با استفاده از شبکههای تخاصمی مولد
- 2-3 مدلهای زبانی با فضای نهان
- 2-3-1 خودکدنگار وَردِشی
- 2-3-2 مدلهای ارائه شده برای رفع مشکل صفر شدن KL
- 2-4 تولید متن به صورت شرطی
- 2-4-1 شبکه عصبی خودبازگشتی شرطی
- 2-4-2 مدلهای مبتنی بر خودکدنگار وردشی
- 2-4-3 مدلهای مبتنی بر شبکههای تخاصمی مولد
- 2-5 برخی مفاهیم دیگر
- 2-5-1 شبکههای مبتنی بر جریان نرمالکننده
- 2-5-2 معماری Transformer
- 2-1 مقدمه
- 3 راهکار پیشنهادی
- 3-1 مقدمه
- 3-1-1 مشاهدات رفتار مدلهای پیشین
- 3-1-2 آموزش خودکدنگار
- 3-2 مدل پیشنهادی ۱: شرط به طور کامل فضای نهان را تعیین کند
- 3-2-1 آموزش خودکدنگار واسرشتاین
- 3-2-2 آموزش مولد شرطی
- 3-3 مدل پیشنهادی ۲: فضای نهان و شرط از یکدیگر مستقل باشند
- 3-1 مقدمه
- 4 پیادهسازی، آزمایشها و ارزیابی
- 4-1 مقدمه
- 4-2 دادگان آموزشی
- 4-3 معیارهای ارزیابی
- 4-3-1
- 4-3-2
- 4-3-3
- 4-3-4 درصد رعایت شرط
- 4-4 آموزش مدل
- 4-4-1 معماریهای متفاوت در کدگذار و کدگشا
- 4-5 نتایج و مقایسه با سایر مدلها
- 5 جمعبندی و کارهای آتی
- آ نمودارهای آموزش شبکه
- مراجع
- واژهنامه فارسی به انگلیسی
- واژهنامه انگلیسی به فارسی