This comes from a Statistics PhD student of ours who is also a Data Scientist for a computing company in Research Park. Here are his thoughts on what students can do to prepare.
“I would recommend taking Java courses which is relative simple and more widely used these days, then maybe can take some training about Hadoop and Hive. There are also some good Books, like : Think in Java, Hadoop Definitive guide, Programming Hive etc. And there are maybe some open source project using “Big Data”, usually we can learn a lot from other’s design and code.”
One of the most notable activities in our department for graduate students is the Bohrer Workshop. A subset of the graduate students are selected to give a quality presentation of their current research in statistics. This is an excellent opportunity for undergraduate students to get a peek at what might lie ahead if you too choose to pursue an advanced degree. You don’t have to be there the whole day. Just pick one or maybe two that sound interesting. Schedule below.
Bohrer Workshop Schedule
A post from one of your fellow Statistics students…
I thought it was very relevant and interesting as this just shows the power of numbers. I think some of our grad students (even undergrad) would be interested in it so feel free to pass this on. I followed that website for the last 6 months, and it was really cool to see how his numbers and projections changed. He sure did stand his ground with his methods.
This week we are hosting Prof. John Lafferty (http://www.cs.cmu.edu/~lafferty/.) He’ll deliver a talk at the AIIS seminar (http://cogcomp.cs.illinois.edu/sites/aiis/). Please note that the talk venue has been moved from 3405 SC to 2405 SC as we are expecting a larger audience. Following are the details:
Nov 9, Friday. 4 pm.
2405, Siebel Center
Graphical Model Estimation
The graphical model has proven to be a useful abstraction in statistics and machine learning. The starting point is the graph of a distribution. While often the graph is assumed given, we have been studying the problem of estimating the graph from data. In this talk we present several nonparametric and semi-parametric methods for graph estimation. One approach is a nonparametric extension of the Gaussian graphical model that allows arbitrary graphs. For the discrete Gaussian (Ising model), we use parallel neighborhood selection with L1-regularized logistic regression. Alternatively, we can restrict the family of graphs to spanning forests, enabling the use of fully nonparametric density estimation in high dimensions. When additional covariates are available, we propose a framework for graph-valued regression. The resulting methods are easy to understand and use, theoretically well supported, and effective for modeling and exploring high dimensional data. Joint work with Han Liu, Pradeep Ravikumar, Martin Wainwright, and Larry Wasserman.
John Lafferty is the Louis Block Professor in the Departments of Statistics, Computer Science, and the College at The University of Chicago. His research area is machine learning, with a focus on computational and statistical aspects of nonparametric methods, high-dimensional data, graphical models, and applications. An associate editor of the Journal of Machine Learning Research, Dr. Lafferty served as program co-chair and general co-chair of the Neural Information Processing Systems Foundation conferences in 2009 and 2010. Dr. Lafferty received his doctoral degree in mathematics from Princeton University, where he was a member of the Program in Applied and Computational Mathematics. Prior to joining the University of Chicago in 2011, he was Professor of Computer Science, Machine Learning, and Statistics at Carnegie Mellon University, where he is currently an Adjunct Professor.