Views Navigation

Event Views Navigation

Calendar of Events

S Sun

M Mon

T Tue

W Wed

T Thu

F Fri

S Sat

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Yuejie Chi

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Weijie Su

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Navid Azizan

0 events,

0 events,

0 events,

1 event,

IDSS Distinguished Seminar Series Emily Black

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Stefan Wager

0 events,

0 events,

0 events,

1 event,

IDSS Academic Programs Fotini Christia, Jessika Trancik

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Vardan Papyan

0 events,

Transformers Learn Generalizable Chain-of-Thought Reasoning via Gradient Descent

Yuejie Chi (Yale University)
E18-304

Abstract: Transformers have demonstrated remarkable chain-of-thought reasoning capabilities, yet, the underlying mechanisms by which they acquire and extrapolate these capabilities remain limited. This talk presents a theoretical analysis of transformers trained via gradient descent for symbolic reasoning and state tracking tasks with increasing problem complexity. Our analysis reveals the coordination of multi-head attention to solve…

Find out more »

Do Large Language Models (Really) Need Statistical Foundations?

Weijie Su (University of Pennsylvania)
E18-304

Abstract: In this talk, we advocate for developing statistical foundations for large language models (LLMs). We begin by examining two key characteristics that necessitate statistical perspectives for LLMs: (1) the probabilistic, autoregressive nature of next-token prediction, and (2) the inherent complexity and black box nature of Transformer architectures. To demonstrate how statistical insights can advance…

Find out more »

Hard-Constrained Neural Networks

Navid Azizan (MIT)
E18-304

Abstract: Incorporating prior knowledge and domain-specific input-output requirements, such as safety or stability, as hard constraints into neural networks is a key enabler for their deployment in high-stakes applications. However, existing methods often rely on soft penalties, which are insufficient, especially on out-of-distribution samples. In this talk, I will introduce hard-constrained neural networks (HardNet), a…

Find out more »

On Generative AI Harms: Evaluating them, and Relevant Law

Emily Black (New York University)
45-102

Abstract: In this talk, I’ll present my recent work on technical pitfalls and legal tensions around the evaluation of GenAI harms. Through four case studies, I’ll show how misalignment between regulatory goals and fairness testing techniques can lead to regulation that admits discriminatory behavior. For example, I’ll show how different forms of GenAI evaluation instability---for…

Find out more »

Learning to Price Electricity for Optimal Demand Response

Stefan Wager (Stanford University)
E18-304

Abstract: The time at which renewable (e.g., solar or wind) energy resources produce electricity cannot generally be controlled. In many settings, however, consumers have some flexibility in their energy consumption needs, and there is growing interest in demand-response programs that leverage this flexibility to shift energy consumption to better match renewable production — thus enabling…

Find out more »

SES Admissions Q&A

Fotini Christia, Jessika Trancik (IDSS)
Zoom

Learn about the Social and Engineering Systems Doctoral Program by attending one of SES’s 2026 Admissions Q&A sessions. These are virtual question & answer sessions hosted by a member of the IDSS faculty as a follow-up to the pre-recorded SES Admissions Webinar. The SES Admissions Webinar (33 mins) should be viewed prior to attending the…

Find out more »

Attention Sinks: A ‘Catch, Tag, Release’ Mechanism for Embeddings

Vardan Papyan (University of Toronto)
E18-304

Abstract: Large language models (LLMs) often concentrate their attention on a small set of tokens—referred to as attention sinks. Common examples include the first token, a prompt-independent sink, and punctuation tokens, which are prompt-dependent. Although these tokens often lack inherent semantic meaning, their presence is critical for model performance, particularly under model compression and KV-caching.…

Find out more »


MIT Institute for Data, Systems, and Society
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764