MIT Stochastics & Statistics Seminar: Stefan Wager
Title: Causal Inference with Random Forests
Abstract: Many scientific and engineering challenges—ranging from personalized medicine to customized marketing recommendations—require an understanding of treatment heterogeneity. We develop a non-parametric causal forest for estimating heterogeneous treatment effects that is closely inspired by Breiman’s widely used random forest algorithm. Given a potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also propose a practical estimator for the asymptotic variance of causal forests. In both simulations and an empirical application, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially as the number of covariates increases. Our theoretical results rely on a generic asymptotic normality theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows random forests, including classification and regression forests, to be used for valid statistical inference.
This talk is based on joint work with Susan Athey, Bradley Efron, and Trevor Hastie.
Bio: Stefan is a fifth-year PhD Student in the Stanford Statistics Department, advised by Professors Brad Efron and Guenther Walther. He received a B.S. in mathematics from Stanford in 2011, with undergraduate advisers Persi Diaconis and Ravi Vakil. In 2013 he did a summer internship as a statistician on Google’s Ads Quality team. He is the recipient of a BC and EJ Eaves Stanford Graduate Fellowship. Stefan’s work is concentrated at the intersection of theoretical and applied statistics. He is interested in non-parametric statistics, uses of subsampling for data analysis, empirical Bayes methods and extreme value theory. Stefan particularly enjoys casting real-world problems into a statistical framework, and figuring out how techniques developed from a non-statistical point of view can be described and analyzed using classical methods.
For complete series listing please click here.