Category Archives: Department of Statistics

Winona State University hosts MUDAC 2016, April 2-3, 2016

The Department of Mathematics and Statistics at Winona State University will host the Midwest Undergraduate Data Analytics Competition (MUDAC) 2016 on April 2-3, 2016.  MUDAC consists of teams of undergraduate students from around the Midwest who are given 24 hours to solve an analytics problem motivated by our corporate sponsor.  Teams share their recommendations with representatives from our corporate sponsor and other professional data analysts.  MUDAC 2016 will provide professional development and networking opportunities between working professionals in attendance and your students.

Faculty mentors from each participating institution are strongly encouraged to attend.   Teams will be able to seek advice from their on-site faculty advisors throughout the competition.  Over $1,000 in cash prizes will be given away again this year. Students and faculty alike find this to be a wonderful learning experience.  Please encourage your students to consider participating in MUDAC 2016 on April 2-3, 2016 at Winona State University in Winona, MN. 

To learn more about the competition, please visit our website:  www.MUDAC.ORG.  Registration information will be available in the coming weeks.   If you have any questions, please feel free to contact me or any of the other conference organizers.

Christopher J. Malone, PhD
Director, MUDAC
Department of Mathematics and Statistics
Winona State University

2016 Summer Undergraduate Research Fellowshp – Abroad

2016 Summer Undergraduate Research Fellowship – Abroad

The Office of Undergraduate Research (OUR) sponsors a competitive Summer Undergraduate Research Fellowship for undergraduate students at UIUC who will be conducting research abroad. This initiative provides students with funds designed to subsidize travel and housing costs associated with the research.

Eligibility Requirements:

  • Students must be current UIUC undergraduate students at the time the research will be conducted, and be in good academic standing. GPA will be considered during the selection process.
  • Funding is provided for students who are either pursuing their own research projects or working on a faculty-led project.
  • The research must be conducted at a location outside of the United States and in conjunction with a university or research institute in that location.

 Additional Requirements:

  • A letter of support must be provided by the supervising UIUC faculty member. The letter need not be long, but should contain the following elements:
    • The title of the project and a brief description of the activities the student will engage in, the projected length of the project effort (i.e., 4 weeks, 5-6 weeks, etc), and the hoped-for outcomes.
    • The name of the co-supervising individual(s) at the host institution, title(s), core expertise, and the current state of the collaboration (e.g., new, growing as of the last year, long-standing with several personnel exchanged, etc.)   The idea of this description is simply to ensure that the applicant will enter a hospitable academic environment that is conducive to the proposed research, and wherein both the UIUC supervisor and the host co-supervisor have truly connected.
    • An explicit statement that the host co-supervisor has agreed to supervise the student on this particular.
  • If students use of the fruits of their work result in a presentation at a conference or campus event, there must be an acknowledgement of OUR for its support on the poster or in the paper.
  • Students must indicate any other sources of funds either applied for or received, including department support or support from the UIUC supervisor or host co-supervisor. Failure to disclose additional support from another source will result in a denial of the request or the revocation of the award.
  • Students who receive awards must enroll in an appropriate UIUC course. Many departments, schools and colleges offer courses with the title “Undergraduate Research Abroad.” If such an option is available, students should enroll in that course for the credit hours agreed upon between the student and the supervising UIUC faculty member. If such a course is not available, the student should consult with the advisor about which UIUC course number to use.

 Funding Restrictions:

  • Funding will normally be given directly to the student as a lump sum. Alternatively, arrangements can be made for the Undergraduate Research Office to reimburse expenses for travel.


Students are encouraged to consult their college or department advisors and individual faculty members for information about research abroad opportunities. OUR does not maintain a list of such opportunities.



Fifth Floor Illini Union Bookstore

807 S. Wright Street, M/C 317

Urbana, IL 61801

Office phone: (217) 300-5453 


Statistics Seminar – Tuesday, February 02, 2016 – Dr. Alex Petersen

“Representation of Samples of Density Functions and Regression for Random Objects”

Dr. Alex Petersen, University of CA at Davis


Date: Tuesday, February 02, 2016

Time: 3:30 PM – 4:30 PM

Location: Engineering Hall Room 106B1

Sponsor: Department of Statistics



In the first part of this talk, we will discuss challenges associated with the analysis of samples of one-dimensional density functions.  Due to their inherent constraints, densities do not live in a vector space and therefore commonly used Hilbert space based methods of functional data analysis are not appropriate.  To address this problem, we introduce a transformation approach, mapping probability densities to a Hilbert space of functions through a continuous and invertible map. Basic methods of functional data analysis, such as the construction of functional modes of variation, functional regression or classification, are then implemented by using representations of the densities in this linear space.  Transformations of interest include log quantile density and log hazard transformations, among others.  Rates of convergence are derived, taking into account the necessary preprocessing step of density estimation.  The proposed methods are illustrated through applications in brain imaging.


The second part of the talk will address the more general problem of analyzing complex data that are non-Euclidean and specifically do not lie in a vector space.  To address the need for statistical methods for such data, we introduce the concept of Fr\’echet regression. This is a general approach to regression when responses are complex random objects in a metric space and predictors are in $\mathcal{R}^p$. We develop generalized versions of both global least squares regression and local weighted least squares smoothing.  We derive asymptotic rates of convergence for the corresponding sample based fitted regressions to the population targets under suitable regularity conditions by applying empirical process methods.  Illustrative examples include responses that consist of probability distributions and correlation matrices, and we demonstrate the proposed Fr\’echet regression for demographic and brain imaging data.

Statistics Seminar – Thursday, February 04, 2016 – Dr. Abhra Sarkar

“Novel Statistical Frameworks for Analysis of Structured Sequential Data”

Dr. Abhra Sarkar, Duke University


Date: Thursday, February 04, 2016

Time: 3:30 PM – 4:30 PM

Location: Engineering Hall Room 106B1

Sponsor: Department of Statistics



We are developing a broad array of novel statistical frameworks for analyzing complex sequential data sets. Our research is primarily motivated by a collaboration with neuroscientists trying to understand the neurological, genetic and evolutionary basis of human communication using bird and mouse models. The data sets comprise structured sequences of syllables or `songs’ produced by animals from different genotypes under different experimental conditions. The primary goal is then to elucidate the roles of different genotypes and experimental conditions on animal vocalization behaviors and capabilities. We have developed novel statistical methods based on first order Markovian dynamics that help answer these important scientific queries. First order dynamics is, however, insufficiently flexible to learn complex serial dependency structures and systematic patterns in the vocalizations, an important secondary goal in these studies. To this end, we have developed a sophisticated nonparametric Bayesian approach to higher order Markov chains building on probabilistic tensor factorization techniques. Our proposed method is of very broad utility, with applications not limited to analysis of animal vocalizations, and provides new insights into the serial dependency structures of many previously analyzed sequential data sets arising from diverse application areas. Our method has appealing theoretical properties and practical advantages, and achieves substantial gains in performance compared to previously existing methods. Our research also paves the way to advanced automated methods for more sophisticated dynamical systems, including higher order hidden Markov models that can accommodate more general data types.

Statistics Seminar – Tuesday, January 26, 2016

“Computationally efficient high-dimensional variable selection via Bayesian procedures”

Dr. Yun Yang, University of California at Berkeley


Date: Tuesday, January 26, 2016

Time: 3:30 PM – 4:30 PM

Location: Engineering Hall Room 106B1

Sponsor: Department of Statistics



Variable selection is fundamental in many high-dimensional statistical problems with sparsity structures. Much of the literature is based on optimization methods, where penalty terms are incorporated that yield both convex and non-convex optimization problems. In this talk, I will take a Bayesian point of view on high-dimensional regression, by placing a prior on the model space and performing the necessary integration so as to obtain a posterior distribution. In particular, I will show that a Bayesian approach can consistently select all relevant covariates under relatively mild conditions from a frequentist point of view.


Although Bayesian procedures for variable selection are provably effective and easy to implement, it has been suggested by many statisticians that Markov Chain Monte Carlo (MCMC) algorithms for sampling from its posterior need a long time to converge, as sampling from an exponentially large number of sub-models is an intrinsically hard problem. Surprisingly, our work shows that this plausible “exponentially many model” argument is misleading. By introducing a truncated sparsity prior for variable selection, we provide a set of conditions that guarantee the rapid mixing of a particular Metropolis-Hastings algorithm. The number of iterations for this Markov chain to reach stationarity is linear in the number of covariates up to a logarithmic factor.

Statistics Seminar – Thursday, January 28, 2016

“Robust causal inference with continuous exposures”

Dr. Edward Kennedy, University of Pennsylvania


Date: Thursday, January 28, 2016

Time: 3:30 PM – 4:30 PM

Location: Engineering Hall Room 106B1

Sponsor: Department of Statistics



Continuous treatments (e.g., doses) arise often in practice, but standard causal effect estimators are limited: they either employ parametric models for the effect curve, or else do not allow for doubly robust covariate adjustment. Double robustness allows one of two nuisance estimators to be misspecified, and is important for protecting against model misspecification as well as reducing sensitivity to the curse of dimensionality. In this work we develop a novel approach for causal dose-response curve estimation that is doubly robust without requiring any parametric assumptions, and which naturally incorporates general off-the-shelf machine learning. We derive asymptotic properties for a kernel-based version of our approach and propose a method for data-driven bandwidth selection. The methods are illustrated via simulation and in a study of the effect of hospital nurse staffing on excess readmissions penalties.


Top 10 Tips for Getting Successful Statistical Internships

Top 10 Tips for Getting Successful Statistical Internships


“If you are a graduate or advanced undergraduate student in statistical sciences and related fields, these 10 tips may increase your likelihood of getting an internship. Once you have secured and completed an internship, it will be a valuable lifelong experience….”

You are welcome to share with your students and to view internship opportunities listed via and from other sources.

Here is the full list:

Statistics Seminar

“High dimensional spatio-temporal modeling with matrix variate distributions”

Dr. Shuheng Zhou, University of Michigan, Ann Arbor


Date: Thursday, December 03, 2015

Time: 3:30 PM – 4:30 PM

Location: 269 Everitt

Sponsor: Department of Statistics



In the first part of this talk, I will discuss new methods for estimating the graphical structures and underlying parameters, namely, the row and column covariance and inverse covariance matrices from the matrix variate data. Under sparsity conditions, we show that one is able to recover the graphs and covariance matrices with a single random matrix from the matrix variate normal distribution. Our method extends, with suitable adaptation, to the general setting where replicates are available.


In the second part of talk, I will discuss an errors-in-variables model where the covariates in the data matrix are contaminated with random noise. Under sparsity and restrictive eigenvalue type of conditions, we show that one is able to recover a sparse vector $\beta \in \mathbb{R}^m$ from the model given a single observation matrix X and the response vector y. This model is significantly different from those analyzed in the literature in the sense that we allow the measurement error for each covariate to be a dependent vector across its n observations. Such error structures appear in the science literature, for example, when modeling the trial-to-trial fluctuations in response strength shared across a set of neurons.


We provide a real-data example and simulation evidence showing that we can recover graphical structures as well as estimating the precision matrices and the regression coefficients for these two classes of problems.