Views Navigation

Event Views Navigation

Calendar of Events

S Sun

M Mon

T Tue

W Wed

T Thu

F Fri

S Sat

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Vardan Papyan

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

Statistics and Data Science Seminar Series Tatsunori Hashimoto

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

0 events,

1 event,

0 events,

Attention Sinks: A ‘Catch, Tag, Release’ Mechanism for Embeddings

Vardan Papyan (University of Toronto)
E18-304

Abstract: Large language models (LLMs) often concentrate their attention on a small set of tokens—referred to as attention sinks. Common examples include the first token, a prompt-independent sink, and punctuation tokens, which are prompt-dependent. Although these tokens often lack inherent semantic meaning, their presence is critical for model performance, particularly under model compression and KV-caching.…

Find out more »

Back to the future – data efficient language modeling

Tatsunori Hashimoto (Stanford University)
E18-304

Abstract: Compute scaling has dominated the conversation with modern language models, leading to an impressive array of algorithms that optimize performance for a given training (and sometimes inference) compute budget. But as compute has grown cheaper and more abundant, data is starting to become a bottleneck, and our ability to exchange computing for data efficiency…

Find out more »


MIT Institute for Data, Systems, and Society
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764