Statistics and Data Science Seminar Series Vardan Papyan
Attention Sinks: A ‘Catch, Tag, Release’ Mechanism for Embeddings
Abstract: Large language models (LLMs) often concentrate their attention on a small set of tokens—referred to as attention sinks. Common examples include the first token, a prompt-independent sink, and punctuation tokens, which are prompt-dependent. Although these tokens often lack inherent semantic meaning, their presence is critical for model performance, particularly under model compression and KV-caching.…