Views Navigation

Event Views Navigation

Hard-Constrained Neural Networks

Navid Azizan (MIT)
E18-304

Abstract: Incorporating prior knowledge and domain-specific input-output requirements, such as safety or stability, as hard constraints into neural networks is a key enabler for their deployment in high-stakes applications. However, existing methods often rely on soft penalties, which are insufficient, especially on out-of-distribution samples. In this talk, I will introduce hard-constrained neural networks (HardNet), a general framework for enforcing hard, input-dependent constraints by appending a differentiable enforcement layer to any neural network. This approach enables end-to-end training and, crucially, is…

Find out more »

SES Admissions Q&A

Fotini Christia, Jessika Trancik (IDSS)
Zoom

Learn about the Social and Engineering Systems Doctoral Program by attending one of SES’s 2026 Admissions Q&A sessions. These are virtual question & answer sessions hosted by a member of the IDSS faculty as a follow-up to the pre-recorded SES Admissions Webinar. The SES Admissions Webinar (33 mins) should be viewed prior to attending the Q&A. Register!

Find out more »

Attention Sinks: A ‘Catch, Tag, Release’ Mechanism for Embeddings

Vardan Papyan (University of Toronto)
E18-304

Abstract: Large language models (LLMs) often concentrate their attention on a small set of tokens—referred to as attention sinks. Common examples include the first token, a prompt-independent sink, and punctuation tokens, which are prompt-dependent. Although these tokens often lack inherent semantic meaning, their presence is critical for model performance, particularly under model compression and KV-caching. Yet, the function, semantic role, and origin of attention sinks—especially those beyond the first token—remain poorly understood. In this talk, I’ll present a comprehensive investigation…

Find out more »

Back to the future – data efficient language modeling

Tatsunori Hashimoto (Stanford University)
E18-304

Abstract: Compute scaling has dominated the conversation with modern language models, leading to an impressive array of algorithms that optimize performance for a given training (and sometimes inference) compute budget. But as compute has grown cheaper and more abundant, data is starting to become a bottleneck, and our ability to exchange computing for data efficiency may be crucial to future model scaling. In this talk, I will discuss some of our recent work on synthetic data and algorithmic approaches to…

Find out more »


MIT Institute for Data, Systems, and Society
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764