University of Washington
Ramya Korlakai Vinayak is a postdoctoral researcher at the Paul G. Allen School of Computer Science and Engineering at the University of Washington, working with Sham Kakade. Her research interests broadly span the areas of machine learning, statistical inference, and crowdsourcing. She received a Ph.D. from Caltech where she was advised by Babak Hassibi. She is a recipient of the Schlumberger Foundation Faculty of the Future fellowship from 2013-15. She obtained her Masters from Caltech and Bachelors from IIT Madras.
Optimal Learning from Sparse Data
In this era of big data, the promise of machine learning and data science is to find patterns to make novel scientific discoveries, aid policy making, and predict user behaviors. However, big data comes with big caveats. While data is plenty, in many cases, it is sparse and heterogeneous. Not accounting for these issues can lead to inaccurate estimates and false conclusions. Another challenge is that the data available is often unlabeled. Labels are generally obtained via crowdsourcing, which is often noisy and expensive. To address these issues, my work has focused on both statistical inference and optimization-based approaches for the design and analysis of algorithms for unsupervised learning and crowdsourcing. I envision a collaborative system where humans and machine learning algorithms enable each other to leverage large datasets and scientific knowledge, leading to novel discoveries and informed policy making.
In many scientific domains, such as epidemiology, psychology, health-care, and social sciences, the number of individuals in the population under study is often very large, while the number of observations available per individual is very limited (sparse). In this sparse data regime, the key question is: how accurately can we estimate the distribution of parameters over the population? For the Binomial observation setting, the maximum likelihood estimator (MLE) was proposed in the 1960’s, but it was not known how accurately it recovers the true distribution. Our work closes this gap: we show that the MLE is minimax optimal in the sparse regime. Our analysis involves novel bounds on the coefficients of Bernstein polynomials approximating Lipschitz-1 functions.