Views Navigation

Event Views Navigation

Calendar of Events

S Sun

M Mon

T Tue

W Wed

T Thu

F Fri

S Sat

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Yuejie Chi

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Weijie Su

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Navid Azizan

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Stefan Wager

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Vardan Papyan

0 events,

Transformers Learn Generalizable Chain-of-Thought Reasoning via Gradient Descent

Yuejie Chi (Yale University)
E18-304

Abstract: Transformers have demonstrated remarkable chain-of-thought reasoning capabilities, yet, the underlying mechanisms by which they acquire and extrapolate these capabilities remain limited. This talk presents a theoretical analysis of transformers trained via gradient descent for symbolic reasoning and state tracking tasks with increasing problem complexity. Our analysis reveals the coordination of multi-head attention to solve…

Find out more »

Do Large Language Models (Really) Need Statistical Foundations?

Weijie Su (University of Pennsylvania)
E18-304

Abstract: In this talk, we advocate for developing statistical foundations for large language models (LLMs). We begin by examining two key characteristics that necessitate statistical perspectives for LLMs: (1) the probabilistic, autoregressive nature of next-token prediction, and (2) the inherent complexity and black box nature of Transformer architectures. To demonstrate how statistical insights can advance…

Find out more »

Hard-Constrained Neural Networks

Navid Azizan (MIT)
E18-304

Abstract: Incorporating prior knowledge and domain-specific input-output requirements, such as safety or stability, as hard constraints into neural networks is a key enabler for their deployment in high-stakes applications. However, existing methods often rely on soft penalties, which are insufficient, especially on out-of-distribution samples. In this talk, I will introduce hard-constrained neural networks (HardNet), a…

Find out more »

Learning to Price Electricity for Optimal Demand Response

Stefan Wager (Stanford University)
E18-304

Abstract: The time at which renewable (e.g., solar or wind) energy resources produce electricity cannot generally be controlled. In many settings, however, consumers have some flexibility in their energy consumption needs, and there is growing interest in demand-response programs that leverage this flexibility to shift energy consumption to better match renewable production — thus enabling…

Find out more »

Attention Sinks: A ‘Catch, Tag, Release’ Mechanism for Embeddings

Vardan Papyan (University of Toronto)
E18-304

Abstract: Large language models (LLMs) often concentrate their attention on a small set of tokens—referred to as attention sinks. Common examples include the first token, a prompt-independent sink, and punctuation tokens, which are prompt-dependent. Although these tokens often lack inherent semantic meaning, their presence is critical for model performance, particularly under model compression and KV-caching.…

Find out more »


MIT Institute for Data, Systems, and Society
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764