Optimal testing for calibration of predictive models

Name: Optimal testing for calibration of predictive models
Start: 2022-03-04T11:00:00-05:00
End: 2022-03-04T12:00:00-05:00
Location: E18-304

March 4, 2022 @ 11:00 am - 12:00 pm

Edgar Dobriban (University of Pennsylvania)

E18-304

Event Navigation

Abstract: The prediction accuracy of machine learning methods is steadily increasing, but the calibration of their uncertainty predictions poses a significant challenge. Numerous works focus on obtaining well-calibrated predictive models, but less is known about reliably assessing model calibration. This limits our ability to know when algorithms for improving calibration have a real effect, and when their improvements are merely artifacts due to random noise in finite datasets. In this work, we consider the problem of detecting mis-calibration of predictive models using a finite validation dataset. Due to the randomness in the data, plug-in measures of calibration need to be compared against a proper background distribution to reliably assess calibration. Thus, detecting mis-calibration in a classification setting can be formulated as a statistical hypothesis testing problem. The null hypothesis is that the model is perfectly calibrated, while the alternative hypothesis is that the deviation from calibration is sufficiently large.

We find that detecting mis-calibration is only possible when the conditional probabilities of the classes are sufficiently smooth functions of the predictions. When the conditional class probabilities are H\”older continuous, we propose a minimax optimal test for calibration based on a debiased plug-in estimator of the $\ell_2$-Expected Calibration Error (ECE). We further propose a version that is adaptive to unknown smoothness. We verify our theoretical findings with a broad range of experiments, including with several popular deep neural net architectures and several standard post-hoc calibration methods. Our algorithm is a general-purpose tool, which—combined with classical tests for calibration of discrete-valued predictors—can be used to test the calibration of virtually any classification method.

—

Bio: Edgar Dobriban is an assistant professor of statistics & computer science at the University of Pennsylvania. He obtained a PhD in statistics from Stanford University in 2017, and a BA in Mathematics from Princeton University in 2012. His research interests include the statistical analysis of large datasets, and the theoretical analysis of machine learning. He has received a Theodore W. Anderson award for the best PhD in theoretical statistics from Stanford University, and an NSF CAREER award. More information is available at his website

https://statistics.wharton.upenn.edu/profile/dobriban/.

—

A full schedule for Spring 2022 Stochastics and Statistics Seminars can be found here:https://stat.mit.edu/seminars/upcoming/

News & Events

Optimal testing for calibration of predictive models

March 4, 2022 @ 11:00 am - 12:00 pm

Event Navigation