Loading Events
IDSS Academic Programs

SES + Stats Dissertation Defense

April 23, 2026 @ 10:00 am - 12:00 pm

Xinyi Wu (IDSS)

E18-304

Xinyi Wu

 

Understanding and Improving Modern Learning Systems Through Graphs

ABSTRACT

Modern learning systems—from graph neural networks (GNNs) powering recommender platforms to transformer-based large language models (LLMs)—learn from high-dimensional, structured data. Yet their empirical success is accompanied by fragile emergent behaviors, such as oversmoothing in GNNs, rank collapse in attention layers, and position bias in transformers, whose mechanisms remain only partially understood. This thesis develops a unified graph-theoretic perspective on these phenomena, drawing on tools from dynamical systems, graph theory, and probabilistic models to model, explain, and ultimately improve modern learning systems.

The first part of the thesis studies oversmoothing in GNNs through the lens of random graph models and linearized dynamics. GNNs are neural architectures for learning from relational data, where entities are represented as nodes, their relationships as edges, and information is propagated through message-passing. We introduce a non-asymptotic framework that decomposes the effects of message-passing into a desirable denoising effect within classes and an undesirable mixing effect across classes, showing that oversmoothing arises precisely when the latter dominates and quantifying the depth at which this transition occurs. In addition, we prove that residual connections and normalization layers can provably prevent a complete collapse of node representations, constraining the limiting embedding space to a higher-dimensional subspace and explaining why deep GNN architectures remain trainable in practice.

The second part extends these ideas to attention-based architectures. Attention, the core mechanism underlying transformers and LLMs, enables a model to adaptively weight and aggregate information from different parts of the input, rather than relying on a fixed connectivity pattern. Viewing attention-based GNNs and transformers as nonlinear, time-varying dynamical systems, we show that attention does not eliminate oversmoothing: attention-based GNNs still lose expressive power exponentially with depth, despite their adaptive aggregation. We then characterize how attention masks and LayerNorm jointly govern rank collapse in transformers, proving that masked self-attention alone still collapses to a rank-one subspace, while the inclusion of LayerNorm can generate a rich family of equilibria with ranks ranging from one to full, revealing a far more expressive long-term behavior than previously believed.

The final part turns these theoretical insights into design principles for scalable and reliable systems. We show that modularity-based bipartite graph clustering provides a coarse but efficient proxy for message-passing in large-scale recommender systems, enabling our proposed method, GraphHash, to drastically reduce embedding table sizes while preserving recommendation quality. In parallel, we develop a graph-theoretic framework for analyzing position bias in transformers, explaining how causal masks and relative positional encodings jointly induce systematic preferences for particular regions of the input sequence. Beyond analyzing these effects, this framework also reveals how different combinations of attention masks, positional encodings, and training data can amplify or mitigate position bias, thereby offering principled guidance for designing attention architectures whose inductive biases are better aligned with the structure of the data and the task.

Together, these results show that graphs offer a powerful language for modeling both data and computation in modern learning systems, yielding principled explanations for emergent behaviors and guiding the design of more robust, efficient, and controllable architectures.

BIOGRAPHY

Xinyi Wu is a final-year PhD student at MIT in Social and Engineering Systems (SES) and Statistics IDPS, advised by Ali Jadbabaie. She is a recipient of the Michael Hammer Fellowship. Before joining MIT, she earned her bachelor’s degree in mathematics from Washington University in St. Louis. Her research lies at the intersection of dynamical systems, network science, and machine learning on relational data. Her recent work studies attention mechanisms in graph neural networks and transformers through a graph-theoretic lens, and develops scalable graph-based methods for large-scale recommender systems.

COMMITTEE

Ali Jadbabaie (advisor), Navid Azizan, Stefanie Jegelka

EVENT INFORMATION

Hybrid event. To attend virtually, please contact the IDSS Academic Office (idss_academic_office@mit.edu) for connection information.


MIT Institute for Data, Systems, and Society
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764