Date  Speaker  Seminar Title 

Sept 8, 2009 Tuesday  special seminar 
Arup Bose
Indian Statistical Institute, Kolkata 
Limiting spectral distribution of large dimensional random matrices:Another look at the moment method 
Sept 14, 2009 
Bodhisattva Sen
Columbia University, Department of Statistics 
Bootstrap in some Nonstandard Problems 
Sept 21, 2009 
David Pollard
Statistics Department, Yale University 
Lurking convexity
[abstract] The solutions to seemingly difficult statistical or probabilistic problems sometimes turn out to involve little more than disguised applications of convexity. This talk will present a few examples from my own work where convexity took me by surprise. The talk will be accessible to all Statistics graduate students. 
Sept 28, 2009 
Ioannis Karatzas
Columbia University, Mathematics Department 
Probabilistic Aspects of Arbitrage 
Oct 5, 2009 
Antony Unwin
Augsburg University 
Graphics of Large Data Sets 
Oct 12, 2009 
Venu Veeravalli
University of Illinois, Department of Electrical and Computer Engineering 
Quickest Change Detection in Sensor Networks
[abstract] The problem of detecting an abrupt change in a system based on stochastic observations of the system has a variety of applications, including critical infrastructure monitoring, quality control engineering, and intrusion detection. The centralized version of this problem, where all the information about the change is available at a single location, is wellunderstood and has been solved under a variety of criteria since the seminal works of Page and Shiryaev. This talk will cover recent results on the quickest change detection problem in the context of sensor networks, where the information available for decisionmaking is obtained through measurements taken at a set of distributed sensors, and a central entity (fusion center) must detect the change as soon as possible based on information received from the sensors. A particular focus of the talk will be on tests that are designed to detect a change process across the sensors. (This talk describes joint work with Alexander Tartakovsky and Vasanth Raghavan.) 
Oct 19, 2009 
Rick A. Vitale
University of Connecticut, Department of Statistics 
Geometric Gaussiana
[abstract] Notions from geometric convexity have been effectively applied to various aspects of Gaussian measure. I will discuss some instances including bounds, singularities, and an intriguing class of functionals.

Oct 26, 2009 
Balaji Raman
Yale University, Department of Statistics 
On Gaussian HJM Framework for Eurodollar Futures 
Nov 2, 2009 
Miroslav Dudik
Carnegie Mellon University, School of Computer Science 
Maximum entropy density estimation and modeling species habitats
[abstract] Maximum entropy (maxent) approach, equivalent to maximum likelihood, is a widely used densityestimation technique. However, when trained on small datasets, maxent frequently overfits. Therefore, many smoothing techniques were proposed to mitigate overfitting. We propose a unified treatment for a large and general class of smoothing techniques including L1 and L2 regularization and totalvariation regularization. As a result, we can easily prove nonasymptotic performance guarantees and derive novel regularizations based on structure of the sample space. Our approach can also be naturally extended to the problem of multipledensity estimation. To obtain solutions for a large class of maxent problems, we propose new algorithms derived from boosting. As an application of maxent, I discuss an important problem in ecology: modeling distributions of biological species. Regularized maxent fits this problem well and offers several advantages over previous techniques. In particular, it addresses the problem in a statistically sound manner and allows principled extensions to situations when the datacollection process is biased or when we have access to data on many related species. I demonstrate the utility of maxent on large realworld datasets. Joint work with Robert Schapire, Steven Phillips, and David Blei. 
Nov 9, 2009 
Yuhong Yang
University of Minnesota, School of Statistics 
Parametricness and Adaptation
[abstract] Parametric and nonparametric models are convenient mathematical tools to describe characteristics of data with different degrees of simplification. When a model is to be selected from a number of candidates, not surprisingly, differences occur when the data generating process is assumed to be parametric or nonparametric. In this talk, in a regression context, we will consider the question if and how we can distinguish between parametric and nonparametric situations and discuss feasibility of adaptive estimation to handle both parametric and nonparametric scenarios optimally. Part of the presentation is based on a joint work with Wei Liu. 
Nov 16, 2009 
Peter Grünwald
CWI Amsterdam 
The CatchUp Phenomenon in Model Selection and Model Averaging 
Wednesday Nov 18, 2009 
Mark Kelbert
Swansea University, School of Physical Sciences 
Continuity of Mutual Entropy in the Large Signalto Noise Ratio Limit
[abstract] The talk addresses the issue of the proof of the entropy power inequality, an important tool in the analysis of Gaussian channels of information transmission, proposed by Shannon. We analyze continuity properties of the mutual entropy of the input and output signals in an additive memory less channel and show how this can be used for a correct proof of the entropypower inequality.

Nov 23, 2009 
No Seminar  Fall Recess


Nov 30, 2009 
Mamikon Ginovyan
Boston University, Department of Mathematics and Statistics 
Efficient estimation of spectral functionals for stationary models 
December 15, 2009 
Paul Baines
Harvard University 
Peering into the BlackBox: Statistical Inference in the Physical Sciences
[abstract] Many modern statistical applications involve noisy observations of an underlying process that can best be described by a complex deterministic system. In fields such as astronomy, astrophysics and the environmental sciences, these systems often involve the solution of partial differential equations that represent the best available understanding of the physical processes. Statistical computation in this context is typically hampered by either lookup tables or expensive "blackbox" function evaluations. We present an example from astrophysics with a lookup table likelihood: the analysis of stellar populations. Astrophysicists have developed sophisticated models describing how intrinsic physical properties of stars relate to observed photometric data. The mapping between the parameters and the dataspace cannot be solved analytically and is represented as a series of lookup tables. We present a flexible hierarchical model for analyzing stellar populations. Our computational framework is applicable to many "blackbox" settings, and robust to the structure of the blackbox. The performance of various sampling schemes will be presented, together with the results for an Astronomical dataset. This is joint work with XiaoLi Meng, Andreas Zezas and Vinay Kashyap. Paul Baines is a PhD student in the Department of Statistics, Harvard University. His research interests include Astrostatistics, Spatiotemporal Modeling, Bayesian Statistics and Statistical Computing. Paul's advisor is Professor XiaoLi Meng, and he is a member of the CaliforniaHarvard Astrostatistics Collaboration (CHASC) and the nationwide Tiger Team collaboration for research in the environmental sciences. Paul previously studied at Durham University (UK) and the University of Cambridge (UK) before coming to Harvard. 
January 10, 2010 
Daniela Witten
Stanford University, California 
A Penalized Matrix Decomposition,
with Applications to Sparse Clustering and Sparse Linear Discriminant Analysis
[abstract] We present a penalized matrix decomposition, a new framework for computing a lowrank approximation for a matrix. This lowrank approximation is a generalization of the singular value decomposition. While the singular value decomposition usually yields singular vectors that have no elements that are exactly equal to zero, our new decomposition results in sparse singular vectors. When this decomposition is applied to a data matrix, it can yield interpretable results. Moreover, when applied to a dissimilarity matrix, this leads to a method for sparse hierarchical clustering, which allows for the clustering of a set of observations using an adaptivelychosen subset of the features. One can apply it to a betweenclass covariance matrix to develop an interpretable version of linear discriminant analysis for the highdimensional setting. These methods are demonstrated on the Netflix data and on a genomic data set. This is joint work with Robert Tibshirani and Trevor Hastie. 
January 18, 2010 
Jing Zhang

Bayesian Inference of Interactions in Biological Problems
[abstract] Recent development of biotechnologies such as microarrays and highthroughput sequencing has greatly accelerated the pace of genetics experimentation and discoveries. As a result, large amounts of highdimensional genomic data are available in population genetics and medical genetics. With millions of biomarkers, it is a very challenging problem to search for the diseaseassociated or treatmentassociated markers, and infer the complicated interaction (correlation) patterns among these markers.

January 25, 2010 
Arseni Seregin
University of Washington, Department of Statistics 
Estimation of convextransformed densities
[abstract] A convextransformed density is a quasiconcave (or a quasiconvex) density which is a composition of a fixed monotone transformation and a convex function. Many parametric and nonparametric families of densities can be included in a suitable family of convextransformed densities: normal, gamma, beta, Gumbel and other logconcave densities, multivariate Pareto, Burr, Student Snedecor etc. We will discuss the role of convexity in shape constrained density estimation then introduce classes of convextransformed densities and present our results related to estimation of a convextransformed density: structure, existence and consistency of the maximum likelihood estimator, and asymptotic minimax lower bounds for estimation. 
January 29, 2010 
Paul Edlefsen
Harvard University 
Profile HMMs for DNA sequence families: the Conditional
BaumWelch and Dynamic ModelSurgery algorithms
[abstract] Profile hidden Markov Models (Profile HMMs) are widely used for protein sequence family modeling, but are rarely used for modeling DNA sequence families because the BaumWelch EM algorithm used to parameterize Profile HMMs performs particularly poorly in the DNA context. I will report the results of a simulation study comparing the BaumWelch algorithm to two new approaches, Conditional BaumWelch and Dynamic Model Surgery, showing that these provide a great improvement over BaumWelch in both the protein and DNA domains. I will also compare these methods in the context of the transposon (interspersed repeat) modeling problem that originally inspired the research.

February 1, 2010 
Veronica Berrocal
Statistical and Mathematical Sciences Institute (SAMSI) North Carolina 
Downscaling outputs from numerical models
[abstract] In many environmental disciplines, data often arise from two sources: Numerical models and monitoring networks. The first source provides predictions at the level of grid cells and is characterized by full spatial coverage of the region of interest, high temporal resolution, no missing data, but consequential calibration concerns. The second gives measurements at points, tends to be sparsely collected in space with coarse temporal resolution, often with missing data but, where recorded, provides, essentially, the true value. Integrating the two sources of data has been a widely investigated topic among several communities: from atmospheric scientists (a notable example is the data assimilation literature) to statisticians. In this talk, I will first briefly review common approaches for integrating monitoring data and computer model output, then I will propose an attractive, fully modelbased strategy to combine the two sources of data, focusing mostly on the change of support problem with the goal of downscaling the output from numerical models to point level. I will present the downscaler model in both a univariate and bivariate setting, introducing the models first in a purely spatial setting, and then showing how they can be easily extended to accommodate for the temporal dimension. Using an application on air quality, I will show how our downscaler model, that employs underlying correlated Gaussian processes, provides a better predictive performance than traditional geostatistical techniques and Bayesian Melding (Fuentes and Raftery, 2005). I will conclude by discussing further avenues to extend the approach to incorporate Dirichlet Processes and Markov Random Fields as well as to develop a processdriven spatiallyvarying weighted downscaler. 
February 8, 2010 
Guillaume Obozinski
University of California at Berkeley, CA 
"From Joint Sparsity to Structured Sparsity"
[abstract] Over the last decade large progress has been made in the understanding of sparse methods both from a statistical and computational point of view. The good properties of sparsity call for the investigation of more sophisticated forms of sparsity than those corresponding to simple variable selection. In this talk I will talk about sparse methods for simultaneous variable selection, introduce the notion of structured sparsity and illustrate it by an example of selection of variables in groups with overlap between the groups.

February 12, 2010 
Zongming Ma
Stanford University, CA 
Sparse Principal Component Analysis and Iterative Thresholding
[abstract] Principal component analysis (PCA) is a widely used dimension reduction method, but difficulties can arise when it is applied to very high dimensional data. For example, in a natural model, classical PCA gives inconsistent estimators of the principal axes. In this talk, we suppose that there is a sparse representation of those principal axes. We find that a new iterative thresholding approach recovers the leading principal subspace consistently, even optimally, in the high dimensional settings. We study the properties of this approach and demonstrate its performance on simulated and real examples.

February 15, 2010 
NO SEMINAR


February 22, 2010 
Donald Lee
Yale University 
Boosting functional data with application to hazard regression and queuing inference
[abstract] Virtually all queuing systems employed in Operations Management/ Research can be modeled as finitestate Markov processes. As such the dynamics of the system is completely described by the transition intensities of the corresponding Markov process. Hazard regression then allows us to recover the system dynamics from data. Motivated by the problem of modeling the kidney transplant waitlist as a stochastic network, I propose a version of Friedman's boosting algorithm to handle functional data especially those arising from survival data. Existing hazard models are unsuited for modeling the U.S. kidney transplant waitlist since data suggest that the transition intensities have arbitrary dependence on timevarying patient covariates as well as timecovariate interactions. On the other hand the proposed method can accommodate flexible specifications for the hazard form, which we hope to use to provide customized webbased information that transplant candidates can use to decide whether to remain on the waitlist or seek a live donation from family members. 
March 1, 2010 
Victoria Stodden
Yale Law School, 
Computational Research and the Scientific Method: A Third Branch?
[abstract] As computation becomes more pervasive in scientific research, it seems to have become a mode of discovery in itself, a third branch of the scientific method. In addition the advent of greater computation facilitates transparency in research through the unprecedented ease of communication of the associated code and data, but typically code and data are not made available. Computational science broadly understood is missing a crucial opportunity to control for error, the central motivation of the scientific method, through reproducibility. In this talk I explore these two changes to the scientific method and present possible ways to bring reproducibility into todays scientific endeavor, elevating computational science to a third branch of the scientific method. I present recent work on barriers to code and data sharing in the machine learning community and propose a licensing structure for all components of the research, called the Reproducible Research Standard, intended to align intellectual property law with longstanding scientific norms. Recent community efforts to encourage verifiability, such as the Data and Code Sharing Roundtable (see http: www.stanford.edu/~vcs/Conferences/RoundtableNov212009/ ), will be discussed. Victoria Stodden is the Law and Innovation Fellow at the Information Society Project at Yale Law School, and a Fellow at Science Commons. She was previously a Fellow at Harvards Berkman Center and postdoctoral fellow with the Innovation and Entrepreneurship Group at the MIT Sloan School of Management. She obtained a PhD in Statistics from Stanford University, and an MLS from Stanford Law School. 
March 8, 2010 
NO SEMINAR SPRING RECESS


March 15, 2010 
NO SEMINAR SPRING RECESS


March 22, 2010 
John Emerson
Yale University, Statistics 
Figure Skating: the Phantom Phantom Judge.
[abstract] In 2006, my study of the newlyminted scoring system used in international figure skating competitions was based entirely on probabilities. As of January, 2010, such an analysis is no longer possible, opening the door for the (unexpected) use of statistics in studying the scoring system. In this admittedly unusual example, randomization is undesirable, the goodold "iid" assumption would pose a serious problem, and the choice of any particular statistic is of secondary interest. This talk will be broadly accessible and all are welcome.

March 29, 2010 
NO Seminar


April 2, 2010 
Teemu Roos
Helsinki Institute for Information Technology HIIT 
Looking for trees in data
[abstract] Treestructured (or hierarchical) graphical models offer an interesting compromise between simple independence models or linear models on one hand, and complex network models on the other hand. The benefits of trees include ease of learning, inference, and interpretability in many cases. Darwin's evolutionary theory, depicted as a "Tree of Life", is a classic example. In Darwin's spirit, I present some recent work in stemmatology, the study of historical manuscripts from the point of view of their "evolution". Many similarities between manuscript evolution and natural evolution are apparent, as are some important differences. Methods for finding manuscript phylogenies involve data compression techniques and a structural EM algorithm. The talk covers mostly work by other people, and some work in progress by myself and collaborators. Biographical sketch: Teemu T. Roos obtained his MSc and PhD degrees, both in computer science, from the University of Helsinki in 2001 and 2007, respectively. Since then, he has been working at the Helsinki Institute for Information Technology HIIT as a postdoctoral researcher. In Spring 2010, he is also a visiting scientist at the Computer Science and Artificial Intelligence Labroratory (CSAIL), MIT. His research interests include statistical and informationtheoretic methods in data analysis and machine learning, and their applications.

April 5, 2010 
Steven Schwager
Cornell University, Statistical Science 
Acoustic estimation of wildlife abundance: methodology for vocal mammals in forested
habitats
[abstract] Habitat loss and hunting pressure threaten mammal populations worldwide, generating critical time constraints on trend assessment. We introduce a new survey method that samples continuously and noninvasively over long time periods, obtaining estimates of abundance (population size) from vocalization rates. We present feasibility assessment methods for acoustic surveys and develop equations for estimating population size. We demonstrate the feasibility of acoustic surveys for African forest elephants (Loxodonta africana cyclotis). Visual surveys and vocalizations from a forest clearing in the Central African Republic were used to establish that low frequency elephant calling rate is a useful index of elephant numbers (linear regression P < 0.001, r²(adj) = 0.58). The effective sampling area was 3.22 km² per acoustic sensor, a dramatic increase in coverage over dung survey transects. These results support the use of acoustic surveys for estimating elephant abundance over large remote areas and in diverse habitats, using a distributed network of acoustic sensors. We describe a survey of forest elephants at Kakum Conservation Area, Ghana, the first use of acoustic methods to estimate elephant abundance. Acoustic survey confidence intervals were about half as wide as those from dungbased surveys. The methods presented here can be applied in surveys of any species for which an acoustic abundance index and detection function have been established. This provides an opportunity to improve management and conservation of many acousticallyactive taxa whose populations are currently undermonitored. This is joint work with Mya Thompson, Katherine Payne, and Andrea Turkalo 
April 12, 2010 
No Seminar


April 19, 2010 
Narayana P. Santhanam
University of Hawaii at Manoa 
Probability estimation over discrete alphabets in the undersampled regime
[abstract] With our advances in biology, computation and storage, we have invited the "curse of dimensionality" upon many problems that concern the modern engineer. The colorful phrase in quotes coined by Bellman highlights the inability of certain classical methods to handle problem instances wherein the number of parameters associated with each data point is comparable to size of the data samples. In this talk, we focus on the problem of discrete probability estimation in the undersampled regime, and develop theory to tackle this problem using ideas from information theory, number theory, combinatorics, analysis as well as tools in statistical learning. The framework that emerges encompasses well known algorithms including the Laplace and Good Turing estimators from World War 2. Interestingly, this framework parallels and complements the extensive work (since 1978) in the statistical community involving exchangeable random partitions. On the other hand, handling complex dependencies in data is very much work in progressfor a flavor of how things may turn out, we review recent results developed on graphical models. The big picture is to see this work as source coding driven by data analysis, rather than the more familiar communication or storage paradigms.

April 26, 2010 
Diane Lambert

Statistics At Google
[abstract] In a very real sense, Google can be thought of as a big statistical analysis system. This talk will describe how Google uses statistics to turn huge amounts of heterogeneous data into information about search, ads, and advertisers, taking a close look at a few of the problems that statisticians at Google have worked on.

May 3, 2010 
NO SEMINAR


May 7, 2010 
Mark Low
The Wharton School, Depart. of Statistics 
Optimal estimation of a Nonsmooth Functional
[abstract] In this talk I will discuss some recent joint work with Tony Cai on optimal estimation of nonsmooth functionals. These problems exhibit some interesting features that are significantly different from those that occur in estimating smooth functionals. This is a setting where standard techniques fail. I will discuss a new general lower bound technique and illustrate the ideas by focusing on optimal estimation of the l1 norm of a high dimensional normal mean vector. An asymptotically sharp minimax estimator will be presented using approximation theory and Hermite polynomials.

May 10, 2010 
Alon Orlitsky
UC San Diego 
On Estimating the Probability Multiset
[abstract] Many statistical properties are determined not by the probabilities of the possible outcomes, but by just the multiset of these probabilities.Given a data sample, we estimate this multiset to be the one maximizing the probability of the sample's profile  the number of symbols appearing any given number of times. We establish some of the estimate's properties and demonstrate its efficacy on experimental data. The talk is self contained and based on work with several current and past students.
