Views Navigation

Event Views Navigation

How to Use Synthetic Data for Improved Statistical Inference?

Edgar Dobriban (University of Pennsylvania - Wharton School)
E18-304

Abstract: The rapid proliferation of high-quality synthetic data -- generated by advanced AI models or collected as auxiliary data from related tasks -- presents both opportunities and challenges for statistical inference. Here, we introduce the GEneral Synthetic-Powered Inference (GESPI) framework that wraps around any statistical inference procedure to safely enhance sample efficiency by combining synthetic and real data.  Our framework leverages high-quality synthetic data to boost statistical power, yet adaptively defaults to the standard inference method using only real data…

Find out more »

Formal Models of Language Generation

Jon Kleinberg (Cornell University)
E18-304

Abstract: The emergence of large language models has prompted a surge of interest into theoretical models that might give us insight into both their successes and their shortcomings. We'll give an overview of recent work in this direction, focusing on a surprising line of positive results that shows it is possible to give guarantees for language-generation algorithms even in the absence of any probabilistic assumptions, in a framework known as "language generation in the limit". These results suggest interesting notions…

Find out more »


MIT Institute for Data, Systems, and Society
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764