Stochastics and Statistics Seminar Series Yanjun Han
Beyond UCB: statistical complexity and optimal algorithm for non-linear ridge bandits
Abstract: Many existing literature on bandits and reinforcement learning assume a linear reward/value function, but what happens if the reward is non-linear? Two curious phenomena arise for non-linear bandits: first, in addition to the "learning phase" with a standard \Theta(\sqrt(T)) regret, there is an "initialization phase" with a fixed cost determined by the reward function;…