A Solution to Exploration/Exploitation Trade-off in Recommender Systems

Feyzabadi Sani, Mohammad Javad; Rabiee, Hamid Reza Hosseini, Abbas

Please enable javascript in your browser.

A Solution to Exploration/Exploitation Trade-off in Recommender Systems

Feyzabadi Sani, Mohammad Javad | 2021

976 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 54588 (19)
University: Sharif University of Technology
Department: Computer Engineering
Advisor(s): Rabiee, Hamid Reza; Hosseini, Abbas
Abstract:
The growing use of the Internet has led to the creation of new businesses around it. Traditional businesses have to use the Internet in order to maintain their competitive conditions. One of the most important strategies for developing sales on the Internet is the proper use of recommendation systems.With the advent of businesses in cyberspace, the way has been paved for the use of recommendation systems in this space.Recommendation systems should exploit their knowledge about users’ preferences and explore their new preferences simultaneously. Establish a balance between exploring users’ new interests and exploiting known interests is key to build a good recommendation system. Existing data for training recommendation systems are biased towards recommendation policies that gathered them. This is an important challenge. Also most previous work does not consider recommendation systems as interactive systems and model them in supervised learning paradigm.In this thesis, we formulate the problem through the contextual multi-arm bandit framework and propose a solution to this trade-off using uniform gathered data and bayesian neural networks. At last, we show our method’s superiority over similar basic methods through various experiments on synthetic and real data. To show this superiority we use AUC and IPS as evaluation metrics. In different experiments we have seen 2-3% increase in AUC and about 30% increase in accumulative IPS
Keywords:
Recommender System ; Reinforcement ; Reinforcement Learning ; Multi-Armed Bandit Problem ; Exploration/Exploitation Trade-Off ; Bayesian Neural Networks

Digital Object List

محتواي کتاب
view

Bookmark

فصل مقدمه
- تعریف مسئله
  - تعریف‌های اولیه
  - صورت‌بندی مسئله
  - مالتی‌آرم بندیت
  - بندیت زمینه‌ای
- داده‌ها و روش ارزیابی
  - روش ارزیابی
  - داده‌ها
- رویکرد راه‌حل
  - ایده‌ی اصلی
  - مدل پیشنهادی
- هدف پژوهش
- نوآوری‌های رساله
- ساختار رساله
فصل پژوهش‌های پیشین
- مقدمه
- تئوری بندیت
  - تاریخچه
  - صورت‌بندی مسئله‌ی بندیت
- رویکردهای موجود برای حل مسئله
  - رویکرد مبتنی بر -greedy
  - رویکرد مبتنی بر مدل‌سازی عدم اطمینان
  - رویکرد مبتنی بر نمونه‌برداری
- رویکرد مبتنی بر نمونه‌برداری، پژوهش های پیشین
  - شبکه‌های عصبی بیزی
  - سامانه‌های توصیه‌گر مبتنی بر شبکه‌های عصبی بیزی
  - تابع هزینه‌ی مناسب در سامانه‌های توصیه‌گر
- ارزیابی
  - ارزیابی تئوری
  - ارزیابی کاربردی
- جمع‌بندی
فصل مدل پیشنهادی
- مقدمه
- متغیر هدف
- مسئله‌ی اریبی در مجموعه‌ی داده
- مدل پیشنهادی
  - سازوکار بهره‌برداری
  - سازوکار اکتشاف
- مقایسه با کارهای پیشین
- جمع‌بندی
فصل آزمایش‌ها
- مقدمه
  - مدل‌های به کار رفته در آزمایش‌ها
  - معیارهای استفاده شده
- آزمایش روی داده‌های ساختگی
  - آزمایش یادگیری مدل
  - آزمایش اکتشاف مدل
- آزمایش روی داده‌های واقعی
  - آزمایش یادگیری مدل
  - آزمایش در شرایط واقعی
- جمع‌بندی
فصل جمع بندی
- نتیجه‌گیری
- کارهای پیش‌رو
  - آزمایش در محیط برخط
  - مقایسه‌ی با روش‌های نزدیک
  - انواع انتقال اطلاعات
  - مجموعه‌ی داده
پیوست الگوریتم‌های پایه‌ای برای مسئله‌ی بندیت
- بندیت تصادفی
- الگوریتم‌های پایه‌ای بندیت تصادفی با تعداد حرکات متناهی
  - اکتشاف سپس تعهد
  - الگوریتم‌های Upper Confidence Bound
  - Asymptotically optimal UCB
  - Minimax Optimal Strategy in the Stochastic case (MOSS)
- بندیت‌های زمینه‌ای
  - بندیت زمینه‌ای تصادفی
- بندیت تصادفی خطی
- نمونه‌برداری تامپسون
  - نگاه بیزین به مسئله‌ی بندیت
  - نمونه‌برداری تامپسون
  - نمونه‌برداری تامپسون برای بندیت خطی
مراجع
واژه‌نامه فارسی به انگلیسی
واژه‌نامه انگلیسی به فارسی

Friend's email
Your name
Your email
enter code