Adversarial Convex Bandit

Ohadi, Amir Mohammad; Alishahi, Kasra

Please enable javascript in your browser.

Adversarial Convex Bandit

Ohadi, Amir Mohammad | 2023

90 Viewed

Type of Document: M.Sc. Thesis
Language: Farsi
Document No: 55770 (02)
University: Sharif University of Technology
Department: Mathematical Sciences
Advisor(s): Alishahi, Kasra
Abstract:
Multi armed bandit is a simple framework for modeling sequential decision making problems. A learner should choose between some arms at every time step and gains the reward of corresponding chosen arm. The environment is unknown to the learner, so he should make a balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future known as exploration vs exploitation dilemma. The goal is finding a policy that minimizes the regret, which is a performance measure of the learner policy. We can make assumptions on how the rewards are generated, like stationary stochastic model, but we abandon almost all of them and consider adversarial bandit model that an adversary chooses the rewards. In this thesis we prove an upper bound O ̃(d^2.5 √n) of regret for the case loss functions are convex.(rewards are concave), which d is the dimension of the action set and n is time horizon. Convex bandit is a generalization of linear and finite-armed bandits which proved upper and lower bounds of regret are tight up to a logarithm factor in terms of n and d The best known lower bound for convex bandit is Ω(√n) which holds even when the function class is restricted to linear functions.
Keywords:
Regret Minimization ; Multi-Armed Bandit Problem ; Exploration/Exploitation Dilemma ; Bayesian Bandit ; Adversarial Bandit ; Convex Bandit

Digital Object List

محتواي کتاب
view

Bookmark

No TOC

Friend's email
Your name
Your email
enter code