Views Navigation

Event Views Navigation

When do spectral gradient updates help in deep learning?

Dmitriy Drusvyatskiy (University of California, San Diego)
E18-304

Abstract: Spectral gradient methods, such as the recently proposed Muon optimizer, are a promising alternative to standard gradient descent for training deep neural networks and transformers. Yet, it remains unclear in which regimes these spectral methods are expected to perform better. In this talk, I will present a simple condition that predicts when a spectral update yields a larger decrease in the loss than a standard gradient step. Informally, this criterion holds when, on the one hand, the gradient of the…

Find out more »

WiDS Cambridge 2026

For the tenth year, MIT and Microsoft New England are proud to collaborate with Women in Data Science (WiDS) Worldwide to bring the WiDS regional conference to Cambridge, Massachusetts. This one-day conference will feature an all-female lineup of speakers and panelists from academia and industry to talk about the latest data science-related research in a number of domains, and to learn how leading-edge researchers and companies are leveraging data science for success.

Find out more »


MIT Institute for Data, Systems, and Society
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764