This comes from a Statistics PhD student of ours who is also a Data Scientist for a computing company in Research Park. Here are his thoughts on what students can do to prepare.
“I would recommend taking Java courses which is relative simple and more widely used these days, then maybe can take some training about Hadoop and Hive. There are also some good Books, like : Think in Java, Hadoop Definitive guide, Programming Hive etc. And there are maybe some open source project using “Big Data”, usually we can learn a lot from other’s design and code.”