Loading Events
  • This event has passed.
Statistics and Data Science Seminar Series

Transformers Learn Generalizable Chain-of-Thought Reasoning via Gradient Descent

October 3, 2025 @ 11:00 am - 12:00 pm

Yuejie Chi (Yale University)

E18-304

Abstract:
Transformers have demonstrated remarkable chain-of-thought reasoning capabilities, yet, the underlying mechanisms by which they acquire and extrapolate these capabilities remain limited. This talk presents a theoretical analysis of transformers trained via gradient descent for symbolic reasoning and state tracking tasks with increasing problem complexity. Our analysis reveals the coordination of multi-head attention to solve multiple subtasks in a single autoregressive path, and the bootstrapping of inherently sequential reasoning through recursive self-training curriculum. Our optimization-based guarantees demonstrate that even shallow multi-head transformers, with chain-of-thought, can be trained to effectively solve problems that would otherwise require deeper architectures.

Biography:
Dr. Yuejie Chi is the Charles C. and Dorothea S. Dilley Professor of Statistics and Data Science at Yale University, with a secondary appointment in Computer Science. She received her Ph.D. and M.A. from Princeton University, and B. Eng. (Hon.) from Tsinghua University, all in Electrical Engineering. Her research interests lie in the theoretical and algorithmic foundations of data science, generative AI, reinforcement learning, and signal processing, motivated by applications in scientific and engineering domains. Among others, Dr. Chi received the Presidential Early Career Award for Scientists and Engineers (PECASE), SIAM Activity Group on Imaging Science Best Paper Prize, IEEE Signal Processing Society Young Author Best Paper Award, and the inaugural IEEE Signal Processing Society Early Career Technical Achievement Award for contributions to high-dimensional structured signal processing. She is an IEEE Fellow (Class of 2023) for contributions to statistical signal processing with low-dimensional structures.


MIT Institute for Data, Systems, and Society
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764