Views Navigation

Event Views Navigation

Variational methods in reinforcement learning

Martin Wainwright (MIT)
E18-304

Abstract: Reinforcement learning is the study of models and procedures for optimal sequential decision-making under uncertainty.  At its heart lies the Bellman optimality operator, whose unique fixed point specifies an optimal policy and value function.  In this talk, we discuss two classes of variational methods that can be used to obtain approximate solutions with accompanying error guarantees.  For policy evaluation problems based on on-line data, we present Krylov-Bellman boosting, which combines ideas from Krylov methods with non-parametric boosting.  For policy optimization problems based on…

Find out more »


MIT Institute for Data, Systems, and Society
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764