Date  Speaker  Seminar Title 

Sept. 10, 2012 
Yi Jiang
Georgia State University 
Can Cell Morphology tell a story?
[abstract]
The question can be rephrased as: What can we learn from analyzing and modeling the morphology of cells? I will discuss our recent work on statistical analysis and mathematical modeling of cell morphology in retinal pigment epithelium. The story begins as we age... Agerelated macular degeneration (AMD) is the main cause of vision loss in the elderly and is a looming epidemic in our aging society. Presently there is no way to determine how a patient's eye will progress, and no effective treatment for AMD. To tackle this problem, our eyes rested on retinal pigment epithelium, because it is a crucial site of AMD pathology and it undergoes morphological changes as the eye ages and AMD progresses. We collected retinal pigment epithelium morphological data from mouse eyes. Statistical analysis on the morphometric data established that we can discriminate the genotypes of the eyes despite aging as a confounding factor. This work is the first step toward developing the relationship between the cell morphology in the epithelium and the age and disease status of the eye. We also developed a mathematical model of twodimensional epithelium morphological dynamics. Simulations suggested clustered cell death could cause normal retinal pigment epithelium to develop into those seen in AMD patients. This work provides a foundation for a potential diagnostic and prognostic tool for AMD.

Sept. 17, 2012 
Nicolai Meinshausen
University of Oxford 
Regularization for largescale regression
[abstract]
Many recent applications in the physical sciences generate largescale datasets and modern regression techniques are routinely applied to these data for a wide range of purposes. Many of these approaches require a careful choice of a tuning parameter. Often though, simple qualitative and signconstraints can be imposed on grounds of physical considerations. These constraints simplify the estimation problem as the tuning parameter becomes superfluous. We show the perhaps unexpected effectiveness of this approach for examples in climate science and beyond. Predictive accuracy is not compromised in general and we examine under which assumptions optimal convergence rates can be achieved.

Sept. 24, 2012 
Regina Liu
Rutgers University 
Combining nonparametric inferences using data depth, bootstrap and confidence distribution
[abstract]
We apply the concepts of confidence distribution and data depth together with bootstrap to develop a new methodology for a combined inference from several nonparametric studies for a common hypothesis. A confidence distribution (CD) is a sampledependent distribution function that can be used to estimate parameters of interest. It can be viewed as a "distribution estimator" of the parameter of interest. Examples of CDs include Efron's bootstrap distribution and Fraser's significance function (also referred to as pvalue function). Although the concept of CD has natural links to concepts of Bayesian inference and the fiducial arguments of R. A. Fisher, it is a purely frequentist concept and has attracted renewed interest in recent years. CDs have shown high potential to be effective tools in statistical inference. We discuss a new approach to combining the test results from several independent studies for a common multivariate nonparametric hypothesis. Specifically, in each study we apply data depth and bootstraps to obtain a pvalue function for the common hypothesis. The pvalue functions are then combined under the framework of combining confidence distributions. This approach has several advantages. First, it allows us to resample directly from the empirical distribution, rather than from the estimated population distribution satisfying the null constraints. Second, it enables us to obtain test results directly without having to construct an explicit test statistic and then establish or approximate its sampling distribution. The proposed method provides a valid inference approach for a broad class of testing problems involving multiple studies where the parameters of interest can be either finite or infinite dimensional. The method will be illustrated using simulations and flight data from the Federal Aviation Administration (FAA).
This is joint work with Dungang Liu (School of Public Health, Yale University) and Minge Xie (Department of Statistics, Rutgers University). 
Oct. 1, 2012 
David Madigan
Columbia University 
Observational studies in healthcare: are they any good?
[abstract]
Observational healthcare data, such as administrative claims and electronic health records, play an increasingly prominent role in healthcare. Pharmacoepidemiologic studies in particular routinely estimate temporal associations between medical product exposure and subsequent health outcomes of interest, and such studies influence prescribing patterns and healthcare policy more generally. Some authors have questioned the reliability and accuracy of such studies, but few previous efforts have attempted to measure their performance. The Observational Medical Outcomes Partnership (OMOP,http:
omop.fnih.org) has conducted a series of experiments to empirically measure the performance of various observational study designs with regard to predictive accuracy for discriminating between true drug effects and negative controls. In this talk, I describe the past work of the Partnership, explore opportunities to expand the use of observational data to further our understanding of medical products, and highlight areas for future research and development. (on behalf of the OMOP investigators) 
Oct. 8, 2012 
Yixin Fang
New York University School of Medicine 
Stability Selection in Cluster Analysis
[abstract]
Recently, the concept of clustering stability has become popular for selecting the number of clusters in cluster analysis. We develop a method for estimating the clustering instability based on the bootstrap, and propose to choose the number of clusters as the one minimizing the clustering instability. The idea can also be applied to select tuning parameters in some regularized cluster analysis procedures.

Oct. 15, 2012 
Tiefeng Jiang
University of Minnesota 
Distributions of Angles in Random Packing on Spheres.
[abstract]
We study the asymptotic behaviors of the pairwise angles among n randomly and uniformly distributed unit vectors in pdimensional spaces as the number of points n goes to infinity, while the dimension p is either fixed or growing with n. For both settings, we derive the limiting empirical distribution of the random angles and the limiting distributions of the extreme angles. The results reveal interesting differences in the two settings and provide a precise characterization of the folklore that ``all highdimensional random vectors are almost always nearly orthogonal to each other". Applications to statistics and connections with some open problems in physics and mathematics are also discussed. This is a joint work with Tony Cai and Jianqing Fan.

Oct. 22, 2012 
Hongzhe Li
University of Pennsylvania 
Robust Detection and Identification of Sparse Segments in UltraHigh Dimensional Data Analysis
[abstract]
Copy number variants (CNVs) are alternations of DNA of a genome that results in the cell having a less or more than two copies of segments of the DNA. CNVs correspond to relatively large regions of the genome, ranging from about one kilobase to several megabases, that are deleted or duplicated. Motivated by CNV analysis based on next generation sequencing data, we consider the problem of detecting and identifying sparse short segments hidden in a long linear sequence of data with an unspecified noise distribution. We propose a computationally efficient method that provides a robust and nearoptimal solution for segment identification over a wide range of noise distributions. We theoretically quantify the conditions for detecting the segment signals and show that the method nearoptimally estimates the signal segments whenever it is possible to detect their existence. Simulation studies are carried out to demonstrate the efficiency of the method under different noise distributions. We present results from a CNV analysis of a HapMap Yoruban sample to further illustrate the theory and the methods.

Oct. 29, 2012 
Dan Nettleton
Iowa State University 
Testing UnionofCones Hypotheses for the Identification of Traits that Exhibit Heterosis
[abstract]
Heterosis, also known as hybrid vigor, occurs when the mean trait value of offspring is more extreme than that of either parent. Well before heterosis was first scientifically described by Darwin in 1876, humans had been using heterosis for various practical purposes. Within the last century, heterosis has been used to improve many crop species for food, feed, and fuel industries. Despite intensive study and successful utilization of heterosis, the basic molecular genetic mechanisms responsible for heterosis remain unclear. In an effort to better understand the underlying mechanisms, researchers have begun to measure the expression levels of thousands of genes in parental maize lines and their hybrid offspring. The expression level of each gene can be viewed as a trait alongside more traditional traits like plant height, grain yield, or drought tolerance. This talk will describe statistical methods that can be used to identify traits that exhibit heterosis. The testing problem is nonstandard because the null hypothesis of no heterosis constrains a parameter vector to a union of two essentially disjoint closed convex cones that is neither a cone nor convex. We will present the likelihood ratio test for heterosis and discuss challenges that arise when attempting to apply it simultaneously to data from thousands of traits. We will also propose an alternative strategy that involves hierarchical modeling and empirical Bayesian inference for simultaneous estimation and identification of heterosis for multiple traits. This talk covers joint work with Tieming Ji, Peng Liu, and Heng Wang.

Nov. 5, 2012 
Mya BarHillel

The Bible Code  Riddle and Solution
[abstract]
In 1995, Statistical Science published a paper purporting to prove the existence of a code in the book of Genesis that predicts future events. Some of the world's leading statisticians and mathematicians had not managed to find the flaw in this work. In 1999, Statistical Science published a refutation of the socalled Bible Code proof, by a team that included the present speaker. This lecture will relate the story of the rise and fall of the Bible Code  a statistical riddle and its solution.

Nov. 12, 2012 
Wenbo Li
University of Delaware 
Gaussian inequalities and conjectures
[abstract]
Gaussian inequalities play a fundamental role in the study of high dimensional probability. We first provide an overview of various Gaussian inequalities and then present several recent results and conjectures for Gaussian measure/vectors, together with various applications.

Nov. 19, 2012 
No Speaker

Thanksgiving Break 
Nov. 26, 2012 
Joel Zinn
Texas AM University 
Functional Depth
[abstract]
In the last several years there has been interest in extending the various notions of statistical depth and quantiles to the functional and infinite dimensional setting. We will present some of these notions and indicate both positive and negative aspects. We will also discuss one approach which bypasses depth and goes directly to quantile functions. This is joint work with J. Kuelbs.

Dec. 3, 2012 
Wei Biao Wu
University of Chicago 
Convariance and Precision Matrix Estimation for HighDimentional Time Series.
[abstract]
I will consider estimation of covariance matrices and their inverses (a.k.a. precision matrices) for highdimensional stationary and locally stationary time series. In the latter case the covariance matrices evolve smoothly in time, thus forming a covariance matrix function. Using the functional dependence measure of Wu (2005), we obtain the rate of convergence for the thresholded estimate and illustrate how the dependence affects the rate of convergence. Asymptotic properties are also obtained for the precision matrix estimate which is based on the graphical Lasso principle. Our theory substantially generalizes earlier ones by allowing dependence, by allowing nonstationarity and by relaxing the associated moment conditions.
