Opening Plenary

Dr. Andy Feng

VP Architecture, Yahoo!

Large-Scale Machine Learning: Use Cases, Datasets and Technology

Venue: NCSA Auditorium
Time: Feb 17th, 4:30-5:30pm

Joint event with
Urbana-Champaign Hadoop User Group

**There will be a reception outside the auditorium after the talk. Food will be provided.**

Abstract

Recently, Yahoo has brought big data ecosystem and machine learning (ML) together to discover mathematical models for search ranking, online advertisement, content recommendation and mobile applications. We use distributed computing clusters with CPUs and GPUs to learn these models from 100’s petabytes data.

A collection of distributed algorithms have been developed to achieve 10-1000x scale-up and speed-up comparing with alternative solutions.  Our algorithms construct regression/classification models and semantic vectors within hours even for billions of training examples and parameters.

To advance academic research of large-scale ML, Yahoo recently released the largest-even ML dataset through our webscope program. It contains 3-month interaction data of 20 million anonymized users. Yahoo also plans to make our distributed deep learning solution CaffeOnSpark available as open source.

In this talk, we illustrate Yahoo use cases and datasets, and explain the evolution of big-data technology stack. We walk through 2 algorithms (word2vec and deep neural network) to highlight algorithm/system challenges, and give an overview of our solutions based on open source technologies. We discuss how you could leverage the datasets and software released from Yahoo for large-scale machine learning.

Speaker Bio

Dr. Andy Feng is a VP Architecture at Yahoo leading the architecture and design of big data and machine learning initiatives. He architected major platforms for personalization, ads serving, NoSQL, and cloud infrastructure. Andy is a PPMC member and committer of the Apache Storm project and a contributor to the Apache Spark project. He served as a track chair and program committee member at Hadoop Summit (2013-2016) and Spark Summit (2013-2014). Prior to Yahoo, he was a Chief Architect at AOL/Netscape, and Principal Scientist at Xerox.