Loading Events

Past Events › Stochastics and Statistics Seminar Series

The MIT Statistics and Data Science Center hosts guest lecturers from around the world in this weekly seminar.

Events Search and Views Navigation

Event Views Navigation

September 2020

Causal Inference and Overparameterized Autoencoders in the Light of Drug Repurposing for SARS-CoV-2

September 18, 2020 @ 11:00 am - 12:00 pm

Caroline Uhler (MIT)


Abstract:  Massive data collection holds the promise of a better understanding of complex phenomena and ultimately, of better decisions. An exciting opportunity in this regard stems from the growing availability of perturbation / intervention data (drugs, knockouts, overexpression, etc.) in biology. In order to obtain mechanistic insights from such data, a major challenge is the development of a framework that integrates observational and interventional data and allows predicting the effect of yet unseen interventions or transporting the effect of interventions…

Find out more »

Stein’s method for multivariate continuous distributions and applications

September 11, 2020 @ 11:00 am - 12:00 pm

Gesine Reinert (University of Oxford)


Abstract: Stein’s method is a key method for assessing distributional distance, mainly for one-dimensional distributions. In this talk we provide a general approach to Stein’s method for multivariate continuous distributions. Among the applications we consider is the Wasserstein distance between two continuous probability distributions under the assumption of existence of a Poincare constant. This is joint work with Guillaume Mijoule (INRIA Paris) and Yvik Swan (Liege). – Bio: Gesine Reinert is a Research Professor of the Department of Statistics and…

Find out more »

May 2020

Naive Feature Selection: Sparsity in Naive Bayes

May 1, 2020 @ 11:00 am - 12:00 pm

Alexandre d'Aspremont (ENS, CNRS)


Abstract: Due to its linear complexity, naive Bayes classification remains an attractive supervised learning method, especially in very large-scale settings. We propose a sparse version of naive Bayes, which can be used for feature selection. This leads to a combinatorial maximum-likelihood problem, for which we provide an exact solution in the case of binary data, or a bound in the multinomial case. We prove that our bound becomes tight as the marginal contribution of additional features decreases. Both binary and…

Find out more »

April 2020

On Using Graph Distances to Estimate Euclidean and Related Distances

April 17, 2020 @ 11:00 am - 12:00 pm

Ery Arias-Castro (University of California, San Diego)


Abstract: Graph distances have proven quite useful in machine learning/statistics, particularly in the estimation of Euclidean or geodesic distances. The talk will include a partial review of the literature, and then present more recent developments on the estimation of curvature-constrained distances on a surface, as well as on the estimation of Euclidean distances based on an unweighted and noisy neighborhood graph. – About the Speaker: Ery Arias-Castro received his Ph.D. in Statistics from Stanford University in 2004. He then took…

Find out more »

February 2020

Tales of Random Projections

February 28, 2020 @ 11:00 am - 12:00 pm

Kavita Ramanan (Brown University)


Abstract: Properties of random projections of high-dimensional probability measures are of interest in a variety of fields, including asymptotic convex geometry, and potential applications to high-dimensional statistics and data analysis. A particular question of interest is to identify what properties of the high-dimensional measure are captured by its lower-dimensional projections. While fluctuations of these projections have been well studied over the past decade, we describe more recent work on the tail behavior of such projections, and various implications. This talk…

Find out more »

Predictive Inference with the Jackknife+

February 21, 2020 @ 11:00 am - 12:00 pm

Rina Foygel Barber (University of Chicago)


Abstract: We introduce the jackknife+, a novel method for constructing predictive confidence intervals that is robust to the distribution of the data. The jackknife+ modifies the well-known jackknife (leaveoneout cross-validation) to account for the variability in the fitted regression function when we subsample the training data. Assuming exchangeable training samples, we prove that the jackknife+ permits rigorous coverage guarantees regardless of the distribution of the data points, for any algorithm that treats the training points symmetrically (in contrast, such guarantees…

Find out more »

Diffusion K-means Clustering on Manifolds: provable exact recovery via semidefinite relaxations

February 14, 2020 @ 11:00 am - 12:00 pm

Xiaohui Chen (University of Illinois at Urbana-Champaign)


Abstract: We introduce the diffusion K-means clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion K-means constructs a random walk on the similarity graph with vertices as data points randomly sampled on the manifolds and edges as similarities given by a kernel that captures the local geometry of manifolds. Thus the diffusion K-means is a multi-scale clustering tool that is suitable for data with non-linear and non-Euclidean geometric features in mixed dimensions. Given…

Find out more »

Gaussian Differential Privacy, with Applications to Deep Learning

February 7, 2020 @ 11:00 am - 12:00 pm

Weijie Su (University of Pennsylvania)


Abstract: Privacy-preserving data analysis has been put on a firm mathematical foundation since the introduction of differential privacy (DP) in 2006. This privacy definition, however, has some well-known weaknesses: notably, it does not tightly handle composition. This weakness has inspired several recent relaxations of differential privacy based on the Renyi divergences. We propose an alternative relaxation we term “f-DP”, which has a number of nice properties and avoids some of the difficulties associated with divergence based relaxations. First, f-DP preserves…

Find out more »

December 2019

Inferring the Evolutionary History of Tumors

December 6, 2019 @ 11:00 am - 12:00 pm

Simon Tavaré (Columbia University)


Abstract: Bulk sequencing of tumor DNA is a popular strategy for uncovering information about the spectrum of mutations arising in the tumor, and is often supplemented by multi-region sequencing, which provides a view of tumor heterogeneity. The statistical issues arise from the fact that bulk sequencing makes the determination of sub-clonal frequencies, and other quantities of interest, difficult. In this talk I will discuss this problem, beginning with its setting in population genetics. The data provide an estimate of the…

Find out more »

November 2019

Automated Data Summarization for Scalability in Bayesian Inference

November 22, 2019 @ 11:00 am - 12:00 pm

Tamara Broderick (MIT)


Abstract: Many algorithms take prohibitively long to run on modern, large data sets. But even in complex data sets, many data points may be at least partially redundant for some task of interest. So one might instead construct and use a weighted subset of the data (called a “coreset”) that is much smaller than the original dataset. Typically running algorithms on a much smaller data set will take much less computing time, but it remains to understand whether the output…

Find out more »
+ Export Events

© MIT Institute for Data, Systems, and Society | 77 Massachusetts Avenue | Cambridge, MA 02139-4307 | 617-253-1764 |