Category Archives: Department of Statistics

SAS Day – February 5

University of Illinois WebStore, SAS, and State Farm invite you to SAS Days to explore SAS technologies and real-world application.
Schedule of Events:
SAS DAY 2013
February 5 | Illini Union Ballroom A
(It is actually not necessary to register for any events. Just feel free to show up to the sessions you wish to attend.)

10-10:45am | State Farm Research & Development Center
presented by Scott Farris and Bill Messner – State Farm Insurance
The State Farm Research & Development Center employs approximately 90 interns in its facility on the U of I campus. Our presentation will give an overview of two of the functions; the actuarial research function and the MAGNet function.
The Actuarial Function of the State Farm Research and Development Center (RDC) focuses on four objectives; providing meaningful research and development for the organization, intern development, collaboration on research and analytic efforts, and recruiting strong candidates to the organization. Central to these objectives is developing skills that allow interns to become successful analytic professionals. We will present a high level look at State Farm’s undergraduate program for developing well rounded analytic professionals in the areas of analytic training, statistical programming and development, leadership training and technical presentation.
11am-11:45am | Reinsurance Pricing Application in SAS
presented by Blake Konrardy – State Farm Insurance
State Farm interns at the Research and Development Center on the U of I campus have partnered with the SAS institute to test functionality of a new SAS statistical procedure called PROC SEVERITY. This procedure was integrated into past and current work on loss distribution modeling and reinsurance pricing. Blake will discuss the underlying methodology of a reinsurance project that includes the following steps; data extraction and transformation, loss development of current claims, trending, distributional modeling and loss simulation. Particular attention will be paid to spliced and finite mixture modeling techniques in SAS.
1-1:45pm | SAS Global Academic Program: Helping Students Succeed with SAS
presented by Julie Pettlick– SAS Institute
SAS is used in virtually all sectors of the US economy. SAS professionals are always in demand, the work is interesting and the disciplines SAS is used in are always evolving to better solve the business problems companies’ face as they strive to be successful. This presentation will provide an overview of the programs and initiatives available to students through SAS. Helping students succeed in their academic and career pursuits is the focus of SAS Student Programs. Opportunities for students to learn SAS, use SAS to conduct research, get certified, attend SAS conferences and showcase their work and network with industry professionals will be presented.
2-2:45pm | Economic Capital Model in SAS
presented by Alan Kessler and Carol Frigo – State Farm Insurance
Interns at the Research and Development Center on the U of I campus are responsible for investigating new methods and techniques used to build Economic Capital Models. Alan Kessler has lead teams over the past several semesters building ARIMA models, distribution models and simulations processes to support that work. Carol and Alan will discuss their recent work on this topic which was presented in an exhibition at the SAS analytics conference 2012 in Las Vegas.
The discussion will contain an overview of the overall process and highlight major components of the their work. They will discuss the forecasting process, distribution modeling process and simulation methods used to create the model.
Please share this invitation with anyone that might be interested in attending SAS Day 2013.
Presenter Biographies:
Scott Faris
Research Manager
P & C Acturial Department
State Farm Research and Development Center
Scott Farris is the Research Manager for the Actuarial function at the State Farm Research & Development Center on the University of Illinois campus at Urbana-Champaign. He supervises undergraduate and graduate research related directly to P&C Actuarial work. His current staff includes 3 actuaries and 25 interns. Research projects include; predictive modeling, pricing & reserving models, research on new techniques and methods, and dynamic risk modeling.
Scott has 30 years of experience at State Farm. His early work includes personal lines and commercial lines pricing work. Later he moved on to supervise the Statistical Analysis and Research Unit in the Fire Actuarial department. There, he co-authored two papers on catastrophe ratemaking, oversaw the implementation of State Farm’s ratemaking warehouse and many of the current fire ratemaking applications. More recently, he guided a System related effort in cost optimization of catastrophe claim systems.
Bill Messner
Statistical Research Analyst
Strategic Resources
State Farm Research and Development Center
Bill Messner manages the Modeling and Analytics Graduate Network (MAGNet) program at the State Farm Research & Development Center at the University of Illinois at Urbana-Champaign. He supervises graduate-level predictive modeling and statistical research projects involving several areas of the enterprise: Strategic Resources, P&C Actuarial, SF Bank, and Claims, to name a few.
After working five years as a trade operations analyst at Chicago Equity Partners, Bill pursued master’s degrees in statistics and quantitative psychology at Illinois. While a graduate student, Bill interned at the State Farm Research & Development Center. Bill joined the Strategic Resources Predictive Analytics team in 2011 and is responsible for supporting advanced analytic efforts across a variety of internal business areas. He is one of the company’s emerging leaders in the area of text analytics, and enjoys predictive modeling and optimization problems. In 2012, Bill assumed leadership of the MAGNet intern program, directly leading much of the students’ work & development.
Blake Konrady
Acturial Intern – graduate in statistics
State Farm Research and Development Center
Blake is a currently a graduate intern at the research and development center. Blake recently completed his undergraduate degree in Statistics and will attend law school next year. Blake leads a team of interns investigating loss distributions for reinsurance pricing. His team builds data extraction and transformation programs, loss development models and distribution models as part of their pricing work. Blake’s group worked extensively with SAS PROC SEVERITY, a new distribution modeling procedure developed by SAS. As part of his work, Blake interacts with SAS developers and statisticians regarding SAS procedures and their performance in insurance pricing. Blake’s work was presented in an exhibit last fall at the SAS analytics 2012 conference in Las Vegas.
Julie Petlick, Ph.D
Student Programs Manager, SAS
Dr. Julie H. Petlick is the student programs manager for SAS where she specializes in facilitating student knowledge, skills and opportunities related to SAS technologies. She regularly attends conferences, gives presentations, develops learning assets, and leverages social media to help students continually develop their SAS skill. She also works with students to help connect them with potential employers looking for new SAS talent. She previously worked as an analytical consultant for SAS where she provided training to faculty, staff and students for integrating SAS into their coursework and research.
Dr. Petlick earned a Ph.D. in Experimental Psychology from North Carolina State University where she studied learning and cognition. She was an assistant professor in the Graphic Communications Program at North Carolina State University where her research interests included visualization, learning and cognitive processes. She was a co-developer for the standards based Visualization in Technology Education (VisTE) curriculum. She has given numerous presentations and workshops as well as authored or co-authored numerous papers in various journals and conference proceedings such as Brain Research Bulletin, American Society for Engineering Education, Extreme Programming, Animation World Network, and SAS Global Forum.
Alan Kessler
Acturial Intern – majoring in acturial science
State Farm Research and Development Center
Alan is a senior at the University of Illinois majoring in Actuarial Science and current intern at the research and development center. Alan leads the Economic Capital Modeling team who supports the P&C Underwriting model. His team develops data extract and transformation processes as well as building ARIMA forecasts models, distributional models and Monte Carlo simulation models. Alan’s work was presented in an exhibit last fall at the SAS analytics 2012 conference in Las Vegas where he won first prize for his work. This summer Alan will be working at the SAS Institute in Cary, North Carolina under a joint State Farm/SAS internship. Alan will start work full time next fall at State Farm headquarters in Bloomington.
Carol Frigo
Acturial Intern – majoring in statistics and economics
State Farm Research and Development Center
Carol is a junior at the University of Illinois majoring in statistics and economics. She is a current intern at the research and development center. Carol was instrumental in developing a loss reserve application that is used to set reserves at State Farm. Recently, Carol joined Alan’s team and performed much of the distributional modeling for the ARIMA forecast simulations in our Economic Capital Model. Carol was invited to present her work in an exhibit at the SAS 2012 analytic conference in Las Vegas.

Midwest Statistics Research Colloquium

The sixth Midwest Statistics Research Colloquium will take place on March 15-16, at Madison, Wisconsin. The conference program is now available at
This colloquium is intended to give visibility to quality research by students and faculty at all levels in statistics department in the Midwest and also to give everyone a chance to form new friendships, new collaborations, and obtain new ideas without having to travel a significant distance. We also want participants to enjoy a day of relaxation in a scholarly environment.
Students are encouraged to present a poster at the colloquium.  To apply to present a poster, please email PDF(s) containing a CV, a well-written abstract, and a one-page outline of the results to Yuguo Chen at  The deadline for applications is February 15, 2013. Some of the best posters will receive a cash award of $100.00 each.
Given the space constraint, the number of participants is capped at 100. Please register as soon as possible.

Statistics Seminar Series

Remember that the Department of Statistics hosts many seminars on emerging topics and research in statistics. The full schedule is available here.

Additionally, we are welcoming candidates for prospective faculty positions, and they will also be sharing their research with us. Each of these seminars will take place from 4:00pm – 4:50pm at the locations listed below. A list of seminar titles and abstracts is available here.

Tue 1/22/2013   Gongjun Xu  (Columbia U)  — Location: 165 Everitt

Thu 1/24/2013   Avishek Chakraborty  (Texas A&M) — Location: 269 Everitt

Tue 1/29/2013   Pavel Krivitsky (Penn State) — Location: 151 Everitt

Thu 1/31/2013   Xiaochui Chen  (U. of Chicago)  — Location: 165 Everitt

Tue 2/5/2013    Mladen Kolar (CMU)  — Location: 165 Everitt

Thu 2/7/2013    Runlong Tang (Princeton)  — Location: 165 Everitt

Tue 2/12/2013   Jeremy Gaskins (U. of Florida)  — Location: 165 Everitt

Tue 2/19/2013   Georgios Fellouris   (U. of Southern California)  — Location: 165 Everitt


Department of Statistics weekly seminar – 2 next week

Please note that for the next few weeks we are having 2 seminars per week (Tues & Thur). Below is the information for next weeks 2 seminars. Please also note the location of the seminars.
Statistical Inference for Diagnostic Classification Models, Gongjun Xu (Columbia University)
Speaker       Gongjun Xu, Columbia University
Date            January 22, 2013
Time            4:00 pm – 4:50 pm 
Location     165 Everitt
Sponsor      Department of Statistics
Event type   Seminar
Diagnostic classification models (DCM) are an important recent development in psychological/educational testing. Instead of an overall test score, a diagnostic test provides each subject with a profile detailing the concepts and skills (often called “attributes”) that he/she has mastered. Central to many DCMs is the so-called Q-matrix, an incidence matrix specifying the item-attribute relationship.  It is common practice for the Q-matrix to be specified by experts when items are written, rather than through data-driven calibration. Such a non-empirical approach may lead to misspecification of the Q-matrix and substantial lack of model fit, resulting in erroneous interpretation of testing results. This talk is concerned with data-driven construction (estimation) of the Q-matrix and related statistical issues of DCMs. I will first give an introduction to DCMs and an overview of recent developments, followed by a discussion of key issues and challenges. I will then present some fundamental results on the learnability of the Q-matrix, including sufficient and necessary conditions for it to be identifiable from data. I will also present a data-driven construction of the Q-matrix and estimation of other model parameters, and show that they are consistent under certain identifiability conditions.
Exploring Spatial Heterogeneity in Species Prevalence, Avishek Chakraborty (Texas A&M University)
Speaker       Avishek Chakraborty, Texas A&M University
Date            January 24, 2013
Time            4:00 pm – 4:50 pm 
Location     269 Everitt
Sponsor      Department of Statistics
Event type   Seminar
In present day ecological studies, complex models for analyzing species abundance patterns are gaining importance because of their multiple utilities — to understand the response of the species to climate variation, to predict its prevalence in remote areas, to measure the impact of human activities and, most crucially, to design effective strategies for conservation. The Cape Floristic Region (CFR) in South
Africa, a well-known biodiversity hotspot, provides a rich class of such species data for analysis. However, any usual regression model would be inadequate due to multiple sources of bias and imprecision in the data — irregular sampling effort, unobserved explanatory features, removal of forest cover, missing a presence and over/under reporting of abundance concentration. Additionally, depending on how the data have been recorded, the response can be a complete collection of abundance counts (presence-absence data) or just a set of locations where the species has been observed (presence-only data). In this talk, I am going to present flexible spatial models based on hierarchical Bayesian approach that can account for these challenges leading to meaningful inference. Given the fine scale resolution of the observed variation over a typically huge study region, implementation of such models can be resource and time-expensive. I shall discuss ways of efficient parallelization or approximation within the estimation scheme, in context of these models that make it feasible to carry out the computation in reasonable time. Further analysis of such datasets presents significant research opportunities in developing models for flexible correlated processes as well as in addressing questions relevant to ecology and environmental science.

Big Data Articles

Why Data Scientists Matter: Explore how the convergence of Big Data, computer science and statistics is leading to a revolution and a demand for people who can make sense of it all.

Food Analytics? : Illustrating that the Big Data trend is leaving no area untouched, researcher Lada Adamic has developed an algorithm to predict how successful a recipe will turn out. She can predict with 80% accuracy how many stars a recipe will receive on a popular recipe site. Until she develops a technique to predict the likelihood of an inept chief being able to successfully prepare a recipe, some of us may want to just stick to admiring her research papers and leave the cooking to the pros.

Nate Silver Answers Your Questions on Reddit: Everyone’s favorite prognosticator took to the popular website Reddit to answer reader submitted questions. Read this article to find out his thoughts on teacher evaluations and why politics is harder to forecast than baseball.



2013 – The International Year of Statistics

Welcome to 2013-the International Year of Statistics!

After months of anticipation, the International Year of Statistics (Statistics2013) is finally here! This worldwide celebration is supported by nearly 1,400 organizations in 105 countries.

Visit the Statistics2013 website to learn how you can get involved and watch the Statistics2013 video. Share it with your coworkers, local college statistics departments, and high schools; promote it to the public, media, and other groups; show it at your organization’s meetings; and post it to your organization’s website.

Learn more and spread the word at

The International Year of Statistics is a worldwide celebration and recognition of the contributions of statistical science. Through the combined energies of organizations worldwide, Statistics2013 will promote the importance of statistics to the broader scientific community, business and government data users, the media, policymakers, employers, students, and the general public.




Dept of Statistics Seminar (Tomorrow)

The Department of Statistics will be hosting Chenlei Leng (National University of Singapore) tomorrow. Please see information below…
Speaker        Chenlei Leng (National University of Singapore)
Date               January 8, 2012
Time               4:00 pm – 4:50 pm 
Location        122 Illini Hall
Sponsor        Department of Statistics
Event type    Seminar

Department of Statistics Seminars

We will be having 2 seminars the week of December 10 – 14 and a seminar on December 17, below are the abstracts for these seminars. Please check the dates & locations as they vary.
Nonparametric Estimation of Spatial Covariance Function and Its Applications, Bo Li (Purdue University)
Speaker        Bo Li, Purdue University
Date               December 11, 2012
Time               4:00 pm – 4:50 pm 
Location        165 Noyes Lab
Sponsor        Department of Statistics
Event type    Seminar
Covariance structure modeling plays a key role in the spatial data analysis. Various parametric models have been developed to accommodate the idiosyncratic features of a given data set. However, the parametric models may impose unjustified restrictions to the covariance structure and the procedure of choosing a specific model is often ad-hoc. To avoid the choice of parametric forms, we propose a nonparametric covariance estimator for the spatial data, as well as its extension to the spatio-temporal data based on the class of space-time covariance models developed by Gneiting (2002). Our estimator is obtained via a nonparametric approximation of completely monotone functions. It is easy to implement and our simulation shows it outperforms the parametric models when there is no clear information on model specification. Two real data examples are analyzed to illustrate our approach and provide further comparison between the nonparametric estimator and parametric models. Finally, in this talk I will also discuss the potential application of our nonparametric covariance estimator to paleoclimatology.
Feature Screening for Ultrahigh Dimensional Data, Runze Li (Pennsylvania State University)
Speaker        Runze Li, Pennsylvania State University
Date               December 13, 2012
Time               4:00 pm – 4:50 pm
Location        156 Henry
Sponsor        Department of Statistics
Event type    Seminar
Ultrahigh dimensional data analysis has become increasingly important in diverse scientific fields. Fan and Lv (2008) proposed sure independence screening (SIS) procedure based on the Pearson correlation and established the sure screening property for the SIS based on Gaussian linear models. In a subsequent works by Fan and colleagues, various model-based feature screening procedures have been developed for pre-specified regression models with ultrahigh dimensional predictors.  In this talk, I plan to introduce model free feature screening procedures for ultrahigh dimensional data. By model free, it means that the implementation of such feature screening procedures do not require to pre-specify model structure on regression functions. Thus, the model free feature screening procedures are particularly appealing to ultrahigh-dimensional models, where there are a huge number of candidate predictors but little information about the actual model forms. I will focus on a sure independence screening procedure based on the distance correlation (DC-SIS, for short). The DC-SIS is a model free screening procedure and can be implemented as easily as the SIS. However, the DC-SIS can significantly improve the SIS. In particular, the sure screening property is valid for the DC-SIS under more general settings including linear models. Moreover, the DC-SIS can be used directly to screen grouped predictor variables and for multivariate response variables. We establish the sure screening property for the DC-SIS, and conduct simulations to examine its finite sample performance. Numerical comparison indicates that the DC-SIS performs much better than the SIS in various models. We also illustrate the DC-SIS through a real data example.
Statistical Significance of Clustering for High Dimensional Data, Yufeng Liu (University of North Carolina-Chapel Hill)
Speaker        Yufeng Liu, University of North Carolina-Chapel Hill
Date               December 17, 2012
Time               4:00 pm – 4:50 pm 
Location        1027 Lincoln Hall
Sponsor        Department of Statistics
Event type    Seminar
Clustering methods provide a powerful tool for the exploratory analysis of high dimensional datasets, such as gene expression microarray data. A fundamental statistical issue in clustering is which clusters are “really there,” as opposed to being artifacts of the natural sampling variation. In this talk, I will present Statistical Significance of Clustering (SigClust) as a cluster evaluation tool. In particular, we define a cluster as data coming from a single Gaussian distribution and formulate the problem of assessing statistical significance of clustering as a testing procedure. Under this hypothesis testing framework, the cornerstone of our SigClust analysis is accurate estimation of those eigenvalues of the covariance matrix of the null multivariate Gaussian distribution. A likelihood based soft thresholding approach is proposed for the estimation of the covariance matrix eigenvalues. Our theoretical work and simulation studies show that our proposed SigClust procedure works remarkably well. Applications to some cancer microarray data examples demonstrate the usefulness of SigClust.