Abstract:
The problem of transfer and domain adaptation is ubiquitous in machine learning and concerns situations where predictive technologies, trained on a given source dataset, have to be transferred to a new target domain that is somewhat related. For example, transferring voice recognition trained on American English accents to apply to Scottish accents, with minimal retraining. A first challenge is to understand how to properly model the ‘distance’ between source and target domains, viewed as probability distributions over a feature space.
In this talk we will argue that various existing notions of distance between distributions turn out to be pessimistic, i.e., these distances might appear high in many situations where transfer is possible, even at fast rates. Instead we show that some new notions of distance tightly capture a continuum from easy to hard transfer, and furthermore can be adapted to, i.e., do not need to be estimated in order to perform near-optimal transfer. Finally we will discuss near-optimal approaches to minimizing sampling of target data (e.g. sampling Scottish speech), when one already has access to a given amount of source data (e.g. American speech).
This talk is based on some joint work with G. Martinet, and ongoing work with S. Hanneke.
Biography:
Samory Kpotufe is an Associate Professor in Statistics at Columbia University. He works in machine learning, with an emphasis on nonparametric methods and high dimensional statistics. Generally, his interests are in understanding basic learning scenarios under practical constraints from modern application domains. He has previously held positions at the Max Planck Institute in Germany, the Toyota Technological Institute at Chicago, and Princeton University.
The MIT Statistics and Data Science Center hosts guest lecturers from around the world in this weekly seminar.