Views Navigation

Event Views Navigation

Beyond UCB: statistical complexity and optimal algorithm for non-linear ridge bandits

Yanjun Han, MIT
E18-304

Abstract: Many existing literature on bandits and reinforcement learning assume a linear reward/value function, but what happens if the reward is non-linear? Two curious phenomena arise for non-linear bandits: first, in addition to the "learning phase" with a standard \Theta(\sqrt(T)) regret, there is an "initialization phase" with a fixed cost determined by the reward function; second, achieving the smallest cost of the initialization phase requires new learning algorithms other than traditional ones such as UCB. For a special family of…

Find out more »


MIT Statistics + Data Science Center
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764