Views Navigation

Event Views Navigation

Besting Good-Turing for probability estimation over large domains

Yihong Wu (Yale University)
E18-304

Abstract: When faced with a small sample from a large universe of possible outcomes, scientists often turn to the venerable Good-Turing estimator. Despite its pedigree, however, this estimator comes with considerable drawbacks, such as the need to hand-tune smoothing parameters and the lack of a precise optimality guarantee. We introduce a tuning-parameter-free estimator that bests Good-Turing in both theory and practice. Our method marries two classic ideas, namely Robbins' empirical Bayes and Kiefer-Wolfowitz's nonparametric maximum likelihood, to learn an implicit…

Find out more »

How to Use Synthetic Data for Improved Statistical Inference?

Edgar Dobriban (University of Pennsylvania - Wharton School)
E18-304

Abstract: The rapid proliferation of high-quality synthetic data -- generated by advanced AI models or collected as auxiliary data from related tasks -- presents both opportunities and challenges for statistical inference. Here, we introduce the GEneral Synthetic-Powered Inference (GESPI) framework that wraps around any statistical inference procedure to safely enhance sample efficiency by combining synthetic and real data.  Our framework leverages high-quality synthetic data to boost statistical power, yet adaptively defaults to the standard inference method using only real data…

Find out more »

Formal Models of Language Generation

Jon Kleinberg (Cornell University)
E18-304

Abstract: The emergence of large language models has prompted a surge of interest into theoretical models that might give us insight into both their successes and their shortcomings. We'll give an overview of recent work in this direction, focusing on a surprising line of positive results that shows it is possible to give guarantees for language-generation algorithms even in the absence of any probabilistic assumptions, in a framework known as "language generation in the limit". These results suggest interesting notions…

Find out more »


MIT Institute for Data, Systems, and Society
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764