Views Navigation

Event Views Navigation

Calendar of Events

S Sun

M Mon

T Tue

W Wed

T Thu

F Fri

S Sat

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Vardan Papyan

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Tatsunori Hashimoto

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Sewoong Oh

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Christos Thrampoulidis

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

Attention Sinks: A ‘Catch, Tag, Release’ Mechanism for Embeddings

Vardan Papyan (University of Toronto)
E18-304

Abstract: Large language models (LLMs) often concentrate their attention on a small set of tokens—referred to as attention sinks. Common examples include the first token, a prompt-independent sink, and punctuation tokens, which are prompt-dependent. Although these tokens often lack inherent semantic meaning, their presence is critical for model performance, particularly under model compression and KV-caching.…

Find out more »

Back to the future – data efficient language modeling

Tatsunori Hashimoto (Stanford University)
E18-304

Abstract: Compute scaling has dominated the conversation with modern language models, leading to an impressive array of algorithms that optimize performance for a given training (and sometimes inference) compute budget. But as compute has grown cheaper and more abundant, data is starting to become a bottleneck, and our ability to exchange computing for data efficiency…

Find out more »

Private statistical estimation via robustness and stability

Sewoong Oh (University of Washington)
E18-304

Abstract: Privacy enhancing technologies, such as differentially private stochastic gradient descent (DP-SGD), allow us to access private data without worrying about leaking sensitive information. This is crucial in the modern era of data-centric AI, where all public data has been exhausted and the next frontier models rely on access to high-quality data. A central component…

Find out more »

The Implicit Geometry of Deep Representations: Insights From Log-Bilinear Softmax Models

Christos Thrampoulidis (University of British Columbia)
E18-304

Abstract: Training data determines what neural networks can learn—but can we predict the geometry of learned representations directly from data statistics? We  present a framework that addresses this question for sufficiently large, well-trained neural networks. The key idea is a coarse but predictive abstraction of such networks as log-bilinear softmax models, whose implicit regularization we…

Find out more »


MIT Institute for Data, Systems, and Society
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764