Yale Statistics Department Seminars: 2009-10

Date Speaker Seminar Title
Sept 8, 2009 Tuesday - special seminar Arup Bose
Indian Statistical Institute, Kolkata
Limiting spectral distribution of large dimensional random matrices:Another look at the moment method
Sept 14, 2009 Bodhisattva Sen
Columbia University, Department of Statistics
Bootstrap in some Non-standard Problems
Sept 21, 2009 David Pollard
Statistics Department, Yale University
Lurking convexity
The solutions to seemingly difficult statistical or probabilistic problems sometimes turn out to involve little more than disguised applications of convexity. This talk will present a few examples from my own work where convexity took me by surprise.

The talk will be accessible to all Statistics graduate students.
Sept 28, 2009 Ioannis Karatzas
Columbia University, Mathematics Department
Probabilistic Aspects of Arbitrage
Oct 5, 2009 Antony Unwin
Augsburg University
Graphics of Large Data Sets
Oct 12, 2009 Venu Veeravalli
University of Illinois, Department of Electrical and Computer Engineering
Quickest Change Detection in Sensor Networks
The problem of detecting an abrupt change in a system based on stochastic observations of the system has a variety of applications, including critical infrastructure monitoring, quality control engineering, and intrusion detection. The centralized version of this problem, where all the information about the change is available at a single location, is well-understood and has been solved under a variety of criteria since the seminal works of Page and Shiryaev. This talk will cover recent results on the quickest change detection problem in the context of sensor networks, where the information available for decision-making is obtained through measurements taken at a set of distributed sensors, and a central entity (fusion center) must detect the change as soon as possible based on information received from the sensors. A particular focus of the talk will be on tests that are designed to detect a change process across the sensors.

(This talk describes joint work with Alexander Tartakovsky and Vasanth Raghavan.)
Oct 19, 2009 Rick A. Vitale
University of Connecticut, Department of Statistics
Geometric Gaussiana
Notions from geometric convexity have been effectively applied to various aspects of Gaussian measure. I will discuss some instances including bounds, singularities, and an intriguing class of functionals.
Oct 26, 2009 Balaji Raman
Yale University, Department of Statistics
On Gaussian HJM Framework for Eurodollar Futures
Nov 2, 2009 Miroslav Dudik
Carnegie Mellon University, School of Computer Science
Maximum entropy density estimation and modeling species habitats
Maximum entropy (maxent) approach, equivalent to maximum likelihood, is a widely used density-estimation technique. However, when trained on small datasets, maxent frequently overfits. Therefore, many smoothing techniques were proposed to mitigate overfitting. We propose a unified treatment for a large and general class of smoothing techniques including L1 and L2 regularization and total-variation regularization. As a result, we can easily prove non-asymptotic performance guarantees and derive novel regularizations based on structure of the sample space. Our approach can also be naturally extended to the problem of multiple-density estimation. To obtain solutions for a large class of maxent problems, we propose new algorithms derived from boosting.

As an application of maxent, I discuss an important problem in ecology: modeling distributions of biological species. Regularized maxent fits this problem well and offers several advantages over previous techniques. In particular, it addresses the problem in a statistically sound manner and allows principled extensions to situations when the data-collection process is biased or when we have access to data on many related species. I demonstrate the utility of maxent on large real-world datasets.

Joint work with Robert Schapire, Steven Phillips, and David Blei.
Nov 9, 2009 Yuhong Yang
University of Minnesota, School of Statistics
Parametricness and Adaptation
Parametric and nonparametric models are convenient mathematical tools to describe characteristics of data with different degrees of simplification. When a model is to be selected from a number of candidates, not surprisingly, differences occur when the data generating process is assumed to be parametric or nonparametric. In this talk, in a regression context, we will consider the question if and how we can distinguish between parametric and nonparametric situations and discuss feasibility of adaptive estimation to handle both parametric and nonparametric scenarios optimally.

Part of the presentation is based on a joint work with Wei Liu.
Nov 16, 2009 Peter Grünwald
CWI Amsterdam
The Catch-Up Phenomenon in Model Selection and Model Averaging
Wednesday Nov 18, 2009 Mark Kelbert
Swansea University, School of Physical Sciences
Continuity of Mutual Entropy in the Large Signal-to Noise Ratio Limit
The talk addresses the issue of the proof of the entropy power inequality, an important tool in the analysis of Gaussian channels of information transmission, proposed by Shannon. We analyze continuity properties of the mutual entropy of the input and output signals in an additive memory less channel and show how this can be used for a correct proof of the entropy-power inequality.
Nov 23, 2009 No Seminar - Fall Recess
Nov 30, 2009 Mamikon Ginovyan
Boston University, Department of Mathematics and Statistics
Efficient estimation of spectral functionals for stationary models
December 15, 2009 Paul Baines
Harvard University
Peering into the Black-Box: Statistical Inference in the Physical Sciences
Many modern statistical applications involve noisy observations of an underlying process that can best be described by a complex deterministic system. In fields such as astronomy, astrophysics and the environmental sciences, these systems often involve the solution of partial differential equations that represent the best available understanding of the physical processes. Statistical computation in this context is typically hampered by either look-up tables or expensive "black-box" function evaluations. We present an example from astrophysics with a look-up table likelihood: the analysis of stellar populations. Astrophysicists have developed sophisticated models describing how intrinsic physical properties of stars relate to observed photometric data. The mapping between the parameters and the data-space cannot be solved analytically and is represented as a series of look-up tables. We present a flexible hierarchical model for analyzing stellar populations. Our computational framework is applicable to many "black-box" settings, and robust to the structure of the black-box. The performance of various sampling schemes will be presented, together with the results for an Astronomical dataset.

This is joint work with Xiao-Li Meng, Andreas Zezas and Vinay Kashyap.

Paul Baines is a PhD student in the Department of Statistics, Harvard University. His research interests include Astrostatistics, Spatiotemporal Modeling, Bayesian Statistics and Statistical Computing. Paul's advisor is Professor Xiao-Li Meng, and he is a member of the California-Harvard Astrostatistics Collaboration (CHASC) and the nationwide Tiger Team collaboration for research in the environmental sciences. Paul previously studied at Durham University (UK) and the University of Cambridge (UK) before coming to Harvard.
January 10, 2010 Daniela Witten
Stanford University, California
A Penalized Matrix Decomposition, with Applications to Sparse Clustering and Sparse Linear Discriminant Analysis
We present a penalized matrix decomposition, a new framework for computing a low-rank approximation for a matrix. This low-rank approximation is a generalization of the singular value decomposition. While the singular value decomposition usually yields singular vectors that have no elements that are exactly equal to zero, our new decomposition results in sparse singular vectors. When this decomposition is applied to a data matrix, it can yield interpretable results. Moreover, when applied to a dissimilarity matrix, this leads to a method for sparse hierarchical clustering, which allows for the clustering of a set of observations using an adaptively-chosen subset of the features. One can apply it to a between-class covariance matrix to develop an interpretable version of linear discriminant analysis for the high-dimensional setting. These methods are demonstrated on the Netflix data and on a genomic data set.

This is joint work with Robert Tibshirani and Trevor Hastie.
January 18, 2010 Jing Zhang
Bayesian Inference of Interactions in Biological Problems
Recent development of bio-technologies such as microarrays and high-throughput sequencing has greatly accelerated the pace of genetics experimentation and discoveries. As a result, large amounts of high-dimensional genomic data are available in population genetics and medical genetics. With millions of biomarkers, it is a very challenging problem to search for the disease-associated or treatment-associated markers, and infer the complicated interaction (correlation) patterns among these markers.
January 25, 2010 Arseni Seregin
University of Washington, Department of Statistics
Estimation of convex-transformed densities
A convex-transformed density is a quasi-concave (or a quasi-convex) density which is a composition of a fixed monotone transformation and a convex function. Many parametric and non-parametric families of densities can be included in a suitable family of convex-transformed densities: normal, gamma, beta, Gumbel and other log-concave densities, multivariate Pareto, Burr, Student Snedecor etc.

We will discuss the role of convexity in shape constrained density estimation then introduce classes of convex-transformed densities and present our results related to estimation of a convex-transformed density: structure, existence and consistency of the maximum likelihood estimator, and asymptotic minimax lower bounds for estimation.
January 29, 2010 Paul Edlefsen
Harvard University
Profile HMMs for DNA sequence families: the Conditional Baum-Welch and Dynamic Model-Surgery algorithms
Profile hidden Markov Models (Profile HMMs) are widely used for protein sequence family modeling, but are rarely used for modeling DNA sequence families because the Baum-Welch EM algorithm used to parameterize Profile HMMs performs particularly poorly in the DNA context. I will report the results of a simulation study comparing the Baum-Welch algorithm to two new approaches, Conditional Baum-Welch and Dynamic Model Surgery, showing that these provide a great improvement over Baum-Welch in both the protein and DNA domains. I will also compare these methods in the context of the transposon (interspersed repeat) modeling problem that originally inspired the research.
February 1, 2010 Veronica Berrocal
Statistical and Mathematical Sciences Institute (SAMSI) North Carolina
Downscaling outputs from numerical models
In many environmental disciplines, data often arise from two sources: Numerical models and monitoring networks. The first source provides predictions at the level of grid cells and is characterized by full spatial coverage of the region of interest, high temporal resolution, no missing data, but consequential calibration concerns. The second gives measurements at points, tends to be sparsely collected in space with coarse temporal resolution, often with missing data but, where recorded, provides, essentially, the true value. Integrating the two sources of data has been a widely investigated topic among several communities: from atmospheric scientists (a notable example is the data assimilation literature) to statisticians.

In this talk, I will first briefly review common approaches for integrating monitoring data and computer model output, then I will propose an attractive, fully model-based strategy to combine the two sources of data, focusing mostly on the change of support problem with the goal of downscaling the output from numerical models to point level.

I will present the downscaler model in both a univariate and bivariate setting, introducing the models first in a purely spatial setting, and then showing how they can be easily extended to accommodate for the temporal dimension. Using an application on air quality, I will show how our downscaler model, that employs underlying correlated Gaussian processes, provides a better predictive performance than traditional geostatistical techniques and Bayesian Melding (Fuentes and Raftery, 2005). I will conclude by discussing further avenues to extend the approach to incorporate Dirichlet Processes and Markov Random Fields as well as to develop a process-driven spatially-varying weighted downscaler.
February 8, 2010 Guillaume Obozinski
University of California at Berkeley, CA
"From Joint Sparsity to Structured Sparsity"
Over the last decade large progress has been made in the understanding of sparse methods both from a statistical and computational point of view. The good properties of sparsity call for the investigation of more sophisticated forms of sparsity than those corresponding to simple variable selection. In this talk I will talk about sparse methods for simultaneous variable selection, introduce the notion of structured sparsity and illustrate it by an example of selection of variables in groups with overlap between the groups.
February 12, 2010 Zongming Ma
Stanford University, CA
Sparse Principal Component Analysis and Iterative Thresholding
Principal component analysis (PCA) is a widely used dimension reduction method, but difficulties can arise when it is applied to very high dimensional data. For example, in a natural model, classical PCA gives inconsistent estimators of the principal axes. In this talk, we suppose that there is a sparse representation of those principal axes. We find that a new iterative thresholding approach recovers the leading principal subspace consistently, even optimally, in the high dimensional settings. We study the properties of this approach and demonstrate its performance on simulated and real examples.
February 15, 2010 NO SEMINAR
February 22, 2010 Donald Lee
Yale University
Boosting functional data with application to hazard regression and queuing inference
Virtually all queuing systems employed in Operations Management/ Research can be modeled as finite-state Markov processes. As such the dynamics of the system is completely described by the transition intensities of the corresponding Markov process. Hazard regression then allows us to recover the system dynamics from data.

Motivated by the problem of modeling the kidney transplant waitlist as a stochastic network, I propose a version of Friedman's boosting algorithm to handle functional data especially those arising from survival data. Existing hazard models are unsuited for modeling the U.S. kidney transplant waitlist since data suggest that the transition intensities have arbitrary dependence on time-varying patient covariates as well as time-covariate interactions. On the other hand the proposed method can accommodate flexible specifications for the hazard form, which we hope to use to provide customized web-based information that transplant candidates can use to decide whether to remain on the waitlist or seek a live donation from family members.
March 1, 2010 Victoria Stodden
Yale Law School,
Computational Research and the Scientific Method: A Third Branch?
As computation becomes more pervasive in scientific research, it seems to have become a mode of discovery in itself, a “third branch” of the scientific method. In addition the advent of greater computation facilitates transparency in research through the unprecedented ease of communication of the associated code and data, but typically code and data are not made available. Computational science broadly understood is missing a crucial opportunity to control for error, the central motivation of the scientific method, through reproducibility. In this talk I explore these two changes to the scientific method and present possible ways to bring reproducibility into today’s scientific endeavor, elevating computational science to a third branch of the scientific method. I present recent work on barriers to code and data sharing in the machine learning community and propose a licensing structure for all components of the research, called the “Reproducible Research Standard,” intended to align intellectual property law with longstanding scientific norms. Recent community efforts to encourage verifiability, such as the Data and Code Sharing Roundtable (see http:

www.stanford.edu/~vcs/Conferences/RoundtableNov212009/ ), will be discussed. Victoria Stodden is the Law and Innovation Fellow at the Information Society Project at Yale Law School, and a Fellow at Science Commons. She was previously a Fellow at Harvard’s Berkman Center and postdoctoral fellow with the Innovation and Entrepreneurship Group at the MIT Sloan School of Management. She obtained a PhD in Statistics from Stanford University, and an MLS from Stanford Law School.
March 22, 2010 John Emerson
Yale University, Statistics
Figure Skating: the Phantom Phantom Judge.
In 2006, my study of the newly-minted scoring system used in international figure skating competitions was based entirely on probabilities. As of January, 2010, such an analysis is no longer possible, opening the door for the (unexpected) use of statistics in studying the scoring system. In this admittedly unusual example, randomization is undesirable, the good-old "iid" assumption would pose a serious problem, and the choice of any particular statistic is of secondary interest. This talk will be broadly accessible and all are welcome.
March 29, 2010 NO Seminar
April 2, 2010 Teemu Roos
Helsinki Institute for Information Technology HIIT
Looking for trees in data
Tree-structured (or hierarchical) graphical models offer an interesting compromise between simple independence models or linear models on one hand, and complex network models on the other hand. The benefits of trees include ease of learning, inference, and interpretability in many cases. Darwin's evolutionary theory, depicted as a "Tree of Life", is a classic example. In Darwin's spirit, I present some recent work in stemmatology, the study of historical manuscripts from the point of view of their "evolution". Many similarities between manuscript evolution and natural evolution are apparent, as are some important differences. Methods for finding manuscript phylogenies involve data compression techniques and a structural EM algorithm. The talk covers mostly work by other people, and some work in progress by myself and collaborators. Biographical sketch: Teemu T. Roos obtained his MSc and PhD degrees, both in computer science, from the University of Helsinki in 2001 and 2007, respectively. Since then, he has been working at the Helsinki Institute for Information Technology HIIT as a postdoctoral researcher. In Spring 2010, he is also a visiting scientist at the Computer Science and Artificial Intelligence Labroratory (CSAIL), MIT. His research interests include statistical and information-theoretic methods in data analysis and machine learning, and their applications.
April 5, 2010 Steven Schwager
Cornell University, Statistical Science
Acoustic estimation of wildlife abundance: methodology for vocal mammals in forested habitats
Habitat loss and hunting pressure threaten mammal populations worldwide, generating critical time constraints on trend assessment. We introduce a new survey method that samples continuously and non-invasively over long time periods, obtaining estimates of abundance (population size) from vocalization rates. We present feasibility assessment methods for acoustic surveys and develop equations for estimating population size. We demonstrate the feasibility of acoustic surveys for African forest elephants (Loxodonta africana cyclotis). Visual surveys and vocalizations from a forest clearing in the Central African Republic were used to establish that low frequency elephant calling rate is a useful index of elephant numbers (linear regression P < 0.001, r²(adj) = 0.58). The effective sampling area was 3.22 km² per acoustic sensor, a dramatic increase in coverage over dung survey transects. These results support the use of acoustic surveys for estimating elephant abundance over large remote areas and in diverse habitats, using a distributed network of acoustic sensors. We describe a survey of forest elephants at Kakum Conservation Area, Ghana, the first use of acoustic methods to estimate elephant abundance. Acoustic survey confidence intervals were about half as wide as those from dung-based surveys. The methods presented here can be applied in surveys of any species for which an acoustic abundance index and detection function have been established. This provides an opportunity to improve management and conservation of many acoustically-active taxa whose populations are currently under-monitored.

This is joint work with Mya Thompson, Katherine Payne, and Andrea Turkalo
April 12, 2010 No Seminar
April 19, 2010 Narayana P. Santhanam
University of Hawaii at Manoa
Probability estimation over discrete alphabets in the undersampled regime
With our advances in biology, computation and storage, we have invited the "curse of dimensionality" upon many problems that concern the modern engineer. The colorful phrase in quotes coined by Bellman highlights the inability of certain classical methods to handle problem instances wherein the number of parameters associated with each data point is comparable to size of the data samples. In this talk, we focus on the problem of discrete probability estimation in the undersampled regime, and develop theory to tackle this problem using ideas from information theory, number theory, combinatorics, analysis as well as tools in statistical learning. The framework that emerges encompasses well known algorithms including the Laplace and Good Turing estimators from World War 2. Interestingly, this framework parallels and complements the extensive work (since 1978) in the statistical community involving exchangeable random partitions. On the other hand, handling complex dependencies in data is very much work in progress---for a flavor of how things may turn out, we review recent results developed on graphical models. The big picture is to see this work as source coding driven by data analysis, rather than the more familiar communication or storage paradigms.
April 26, 2010 Diane Lambert
Statistics At Google
In a very real sense, Google can be thought of as a big statistical analysis system. This talk will describe how Google uses statistics to turn huge amounts of heterogeneous data into information about search, ads, and advertisers, taking a close look at a few of the problems that statisticians at Google have worked on.
May 3, 2010 NO SEMINAR
May 7, 2010 Mark Low
The Wharton School, Depart. of Statistics
Optimal estimation of a Nonsmooth Functional
In this talk I will discuss some recent joint work with Tony Cai on optimal estimation of nonsmooth functionals. These problems exhibit some interesting features that are significantly different from those that occur in estimating smooth functionals. This is a setting where standard techniques fail. I will discuss a new general lower bound technique and illustrate the ideas by focusing on optimal estimation of the l1 norm of a high dimensional normal mean vector. An asymptotically sharp minimax estimator will be presented using approximation theory and Hermite polynomials.
May 10, 2010 Alon Orlitsky
UC San Diego
On Estimating the Probability Multiset
Many statistical properties are determined not by the probabilities of the possible outcomes, but by just the multiset of these probabilities.Given a data sample, we estimate this multiset to be the one maximizing the probability of the sample's profile - the number of symbols appearing any given number of times. We establish some of the estimate's properties and demonstrate its efficacy on experimental data. The talk is self contained and based on work with several current and past students.

Revised: 3 April 2010