Loading Events
  • This event has passed.
Stochastics and Statistics Seminar Series

Structured Topic Modeling: Leveraging Sparsity and Graphs for Improved Inference

March 21, 2025 @ 11:00 am - 12:00 pm

Claire Donnat (University of Chicago)

E18-304

Abstract:
Classical topic modeling approaches, such as Latent Dirichlet Allocation (LDA) and probabilistic Latent Semantic Indexing (pLSI), decompose a document-term matrix into a mixture of topics, offering a powerful tool for uncovering latent thematic structures from document corpora or compositional data at large. However, these methods generally assume document independence, overlooking potential relationships or additional structural information that could improve inference—especially in contexts with short documents or large vocabulary sizes.
In this talk, we will consider two new structured approaches to topic modeling that enhance inference. The first extends pLSI by incorporating weak sparsity to manage large vocabularies effectively. The second leverages document-level relationships encoded as a graph, introducing a graph-aligned singular value decomposition of the empirical frequency matrix to improve the estimation of document-topic and topic-word matrices. This method is especially advantageous in applications where document similarities are well-defined, such as spatial transcriptomics, microbiome studies, and scientific abstract analysis.
By establishing high-probability error bounds for estimating topic proportions and word distributions, our work attempts to begin bridging the gap between topic modeling and structured inference. Our examples demonstrate that this flexible, theoretically grounded framework can be effectively applied across diverse data modalities.

Bio:
Claire Donnat is an Assistant Professor of Statistics at the University of Chicago, specializing in high-dimensional data analysis, network-based methods, and structured inference. Her work sits at the intersection of theory and applications, focusing on statistical frameworks for multimodal data integration and unsupervised learning. She applies these methods to problems in neuroscience, spatial transcriptomics, and plant microbiology.

 


MIT Institute for Data, Systems, and Society
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge, MA 02139-4307
617-253-1764