Yale Statistics Department Seminars: 2007-08

Date Speaker Seminar Title
April 28, 2008 Edmund Yeh
Department of Electrical Engineering, Yale University
On the Critical Density for Percolation in Random Geometric Graphs
[abstract]Percolation theory has become a useful tool for the analysis of large-scale wireless networks. We investigate the fundamental problem of characterizing the critical density for d-dimensional Poisson random geometric graphs in continuum percolation theory. By using a probabilistic analysis which incorporates the clustering effect in random geometric graphs, we develop a new class of analytical lower bounds for the critical density. These analytical lower bounds are the tightest known to date, and reveal a deep underlying relationship between clustering effects and percolation phenomena.

Joint work with Zhenning Kong, Yale University.
April 24, 2008 Michael Trosset
Department of Statistics, Indiana University
Locally Linear Embedding is a Laplacian Eigenmap
[abstract]An approach to nonlinear dimension reduction called manifold learning posits that high-dimensional data lie (approximately) on low-dimensional manifolds. In a seminal paper, Roweis and Saul (2000) proposed a widely used manifold learning technique that attempts to characterize local geometry by linear coefficients that reconstruct each point from its neighbors. We present a simple example that demonstrates that Locally Linear Embedding (LLE) can profoundly distort data. We relate LLE to another manifold learning technique, Laplacian eigenmaps, demonstrating that the former can be understood as a somewhat problematic special case of the latter.

This research is supported by the Office of Naval Research. It is joint work with Brent Castle.
April 14, 2008 Alexei Onatski
Economics Department, Columbia University
Testing hypotheses about the number of factors in large factor models.
[abstract]In this paper we study high-dimensional time series that have the generalized dynamic factor structure. We develop a test of the null of k0 factors against the alternative that the number of factors is larger than k0 but no larger than k1 > k0. Our test statistic equals max k0 \lt k \le k1 (\gamma_k - \gamma_{k+1}) / (\gamma_{k+1} - \gamma_{k+2}), where \gamma_i is the i-th largest eigenvalue of the smoothed periodogram estimate of the spectral density matrix of data at a pre-specified frequency. We describe the asymptotic distribution of the statistic, as the dimensionality and the number of observations rise, as a function of the Tracy-Widom distribution and tabulate the critical values of the test. As an application, we test different hypotheses about the number of dynamic factors in macroeconomic time series and about the number of dynamic factors driving excess stock returns.

Keywords: Generalized dynamic factor model, approximate factor model, number of factors, hypothesis test, Tracy-Widom distribution.
April 7, 2008 Ping Li
Cornell University
Compressed Counting and Stable Random Projections
[abstract]The method of (symmetric) random projections has become a popular tool for dimension reduction, in particular, for efficiently computing pairwise distances in massive high-dimensional data (including streaming data), with many applications in data mining and machine learning such as clustering, nearest neighbors, kernel methods etc. This talk will start with the recent work on (symmetric) stable random projections; then I will focus on the most recent work called Compressed Counting (CC), whose underlying technique is maximally-skewed stable random projections. Lp The computation task involving massive dynamic data streams was recently identified as one of the "10 challenges" in data mining research. Compressed Counting (CC) is a tool for counting the pth frequency moment of data streams. Due to the dynamic nature, while counting the first () moment (the sum) is trivial using a simple counter, the task is challenging when because the massive data are subject to frequent updating. Compressed Counting (CC) works almost like a simple counter when, and its complexity (number of counters) is a continuous function of how close 1=p1?pk1?ppis to 1. More precisely, the number of required counters )/1(?Ok= )when p is close to 1, a significant improvement compared with , the previous best result. Counting the /1(2epsilon;O=pth moment has direct applications in parameter estimation using the method of moments. CC also provides a very efficient solution for counting the entropy moment which had been extensively studied in theoretical computer science.

Bio: Ping Li is an assistant professor at Cornell University in the Department of Statistical Science and Faculty of Computing and Information Science. He graduated from Stanford in 2007, with a Ph.D. in Statistics, an MS in Computer Science, and an MS in Electrical Engineering. Ping Li's research interests include applied and computational statistics, ranking and web search, machine learning and data mining, randomized algorithms, etc.
March 31, 2008 Erick A. Matsen
The Geometry of Phylogenetics
[abstract]Phylogenetics is the inference of the evolutionary history of a set of species using present-day sequence information. This talk will describe two recent appliations of geometry to phylogenetics: first, we describe in terms of convex geometry why phylogenetic inference using concatenation of several gene sequences can produce anomalous results. Second, we present a complete implicit characterization of the possible data sets corresponding to phylogenetic trees under several popular models of sequence evolution. This characterization shows that the relevant model spaces live on the boundary of a non-convex higher dimensional contractible space. (Although these sound like lots of big words, I'll be showing more diagrams than formulas, and the talk is certainly for non-experts!)
March 24, 2008 Vijay Nair
Department of Statistics, University of Michiagan
Statistical Inverse Problems in Network Tomography
[abstract]The term network tomography characterizes two classes of large-scale inverse problems that arise in the modeling and analysis of computer and communications networks. In passive tomography, network traffic data are collected at the nodes, and the goal is to reconstruct origin-destination traffic patterns. The second is active network tomography where the goal is to recover link-level quality of service parameters, such as packet loss rates and delay distributions, from end-to-end path-level measurements. Internet service providers use this to characterize network performance and to monitor service quality. This talk will provide an overview of the network application and the statistical inverse problems with an emphasis on active tomography. Issues on design of probing experiments, inference, and applications to network monitoring will be discussed.

This is joint work with George Michailidis, Earl Lawrence, Bowei Xi, and Xiaodong Yang.
March 3, 2008 Victor de la Pena
Department of Statistics, Columbia University
Pseudo-Maximization and self-normalized processes
[abstract]In this talk we provide a survey of results for self-normalized processes in dependent variables. An important example of self-normalized processes is Student's t-statistic (Student (1908)). We will provide examples of the LIL for this type of processes as well as extensions of Student's type statistics to the case of martingales.
February 25, 2008 Xin Qi
Department of Biostatistics, Yale University
Functional CLT for spatial birth and death processes and application in spatial statistics
[abstract]We give a functional central limit theorem for spatial birth and death processes based on the representation of such processes as solutions of stochastic equations. For any bounded and integrable function in Euclidean space, we define a family of processes which is obtained by integrals of this function with respect to the centered and scaled spatial birth and death process with constant death rate. We prove that this family converges weakly to some Gaussian process. We do not need that the birth rate have a finite range of interaction. Instead we require that the the birth rate have a range of interaction that decays polynomially. In order to show the convergence of the finite-dimensional distributions of the above processes, we extend the Penrose's multivariate spatial central limit theorem. As an application, we apply the idea to prove the asymptotic normality of time-invariance estimators for the nearest neighbor interaction model in spatial point processes.
February 18, 2008 Jose Blanchet
Department of Statistics, Harvard University
Free Energy Estimation, Rare Events and Efficient Importance Sampling.
[abstract]Estimating the free energy of a Gibbs distribution can sometimes be related to estimating an associated large deviations probability. In this talk, I'll discuss some examples in which one can develop efficient importance sampling estimators for this task. Our algorithms can be rigorously proved to achieve asymptotic optimality and, in some cases related to counting combinatorial objects, lead to the fastest known randomized algorithms for free energy estimation. As an example, we shall discuss an asymptotically optimal algorithm for estimating the free energy of a Gibbs distribution, on the space of random walks, that induces a self-attracting path interaction; a problem that is related to estimating the probability that a random walker avoids a set of Poissonian obstacles.
February 11, 2008 Sergey Bobkov
Department of Mathematics, University of Minnesota
Concentration of distributions of sums and randomized CLT's for dependent observations.
[abstract]We give review of some results about randomized forms of the central limit theorem for dependent observations and will discuss the concentration methods in the study of randomized sums.
February 4, 2008 Prasad Tetali
School of Mathematics & School of Computer Science, Georgia Institute of Technology
On Correlation Decay in Matchings and Multi-spin Systems
[abstract]In the first part of the talk, we will describe a deterministic fully polynomial time approximation scheme (FPTAS) for computing the total number of matchings in a bounded degree graph, which is a #P-complete problem. The approach is based on the so-called correlation decay technique originating in statistical physics. The second part of the talk deals with the construction of a computation tree for interacting systems modeled using graphs that preserve the marginal probability of any vertex of interest. In particular we construct, for any finite graph and a fixed vertex, a finite computation tree with appropriately defined boundary conditions so that the marginal probability on the tree at the vertex matches that on the graph. For several interacting systems, we show using our approach that if there is strong spatial mixing on an infinite regular tree, then one has strong spatial mixing for any given graph with maximum degree bounded by that of the regular tree.

The talk is based on joint works with M. Bayati (Microsoft), C. Nair (Hong Kong), D. Gamarnik (MIT) and D. Katz (MIT).
January 28, 2008 Nick Reed
Department of Physics, Yale University
Geometry of random minimum spanning trees
[abstract]Given a connected finite graph with costs (real numbers) assigned to the edges, the minimum spanning tree is defined to be an acyclic subgraph (a tree) that is spanning (meets all the vertices) such that the sum of the costs of the edges occupied by the tree is minimized. We consider the random version in which the costs on the edges are independent random variables, uniformly distributed between 0 and 1. The model arises physically as the strong-disorder limit of a spin glass. Using Kruskal's greedy algorithm to solve a random instance leads to a relation with bond (Bernoulli) percolation on the same graph, which we exploit throughout. We address the geometry of the random trees when the graph is a portion of the hypercubic lattice in d dimensions. As a first step, we consider instead the case in which the graph is the "Bethe lattice" (infinite Cayley tree, of fixed degree at all vertices). This requires use of "wired" boundary conditions to obtain a non-trivial minimum spanning forest. We show that the connected component of the forest containing any given vertex has of order M**3 vertices within M steps of the given vertex. This is interpeted (heuristically) as implying that in high dimensions (d>6) on the lattice in Euclidean space the connected components contain R**6 vertices within Euclidean distance R. Second, we address the fractal dimension of the path connecting any two vertices on the MST in Euclidean space. (This has been addressed numerically in d=2 dimensions by several other authors.) We set up an expansion for the related problem in which the Kruskal process is halted at the percolation threshold, producing a minimum spanning forest. (It has been argued that the use of this model produces the same result as the MST itself.) Taking a heuristic continuum limit and using renormalization group methods, we calculate this dimension to first order in 6-d for d<6 (for d>6, we argue that it is 2, as for a Brownian random walk).

Work done in collaboration with Tom Jackson (Yale).
January 14, 2008 Robin Pemantle
Department of Mathematics, University of Pennsylvania
An upper bound on the time for quadratic sieving
[abstract]Central to many factoring algorithms in use today is the following random process: generate random numbers in the interval [1,N] until some subset has a product which is a square (and there is a computable witness to this). Naive probabilistic models for the distribution of prime factors suggest that this stopping time has a very sharp threshold. Based on more sophisticated probabilistic models, we find a rigorous upper bound that is within a factor of 4/3 of a proven lower bound, and conjecture that our upper bound is in fact asymptotically sharp.

This is joint work with Andrew Granville, Ernie Croot and Prasad Tetali.
December 3, 2007 Michael Nussbaum
Department of Mathematics, Cornell University
The Chernoff lower bound for symmetric quantum hypothesis testing
[abstract]Symmetric hypothesis testing in quantum statistics is considered, where the hypotheses are density operators on a finite dimensional complex Hilbert space, representing states of a finite quantum system. A lower bound on the asymptotic rate exponents of Bayesian error probabilities is proved. The bound represents a quantum extension of the Chernoff bound, which gives the best asymptotically achievable error exponent in classical discrimination between two probability measures on a finite set. In the present framework the classical result is reproduced if the two hypothetic density operators commute. Recently it has been shown that the lower bound is achievable also in the generic quantum (noncommutative) case. This implies that the result is one part of the definitive quantum Chernoff bound.
November 26, 2007 Ji Zhu
Department of Statistics, University of Michigan
Group Variable Selection via Hierarchical Lasso and Its Oracle Property
[abstract]In many engineering and scientific applications, predictor variables are grouped, for example, in biological applications where assayed genes or proteins can be grouped by biological role. Common statistical analysis methods such as ANOVA, factor analysis, and functional modeling with partially ordered basis sets also exhibit natural variable groupings. We develop a new variable selection method that respects the group constraint. Our new method enjoys benefits that existing successful methods do not have, while offering the potential for achieving asymptotic "oracle" properties.
November 12, 2007 Robert E. Kass
Department of Statistics and Center for the Neural Basis of Cognition, Carnegie Mellon University
Some Statistical Methods for Analyzing Neural Spike Trains
[abstract]One of the most important techniques in learning about the functioning of the brain has involved examining neuronal activity in laboratory animals under varying experimental conditions. Neural information is represented and communicated through series of action potentials, or spike trains, and the central scientific issue in many studies concerns the physiological significance that should be attached to a particular neuron firing pattern in a particular part of the brain. In addition, a major relatively new effort in neurophysiology involves the use of multielectrode recording, in which responses from dozens of neurons are recorded simultaneously. Among other things, this has made possible the construction of brain-controlled robotic devices, which could benefit people whose movement has been severely impaired.

My colleagues and I have formalized specific scientific questions in terms of point process intensity functions, and have used Bayesian methods to fit the point process models to neural spike-train data. In my talk I will very briefly outline some of the substantive problems we have examined, and the progress we've made. I will also describe our work on approximate methods for state-space models and its potential applicability to neural prostheses.
October 29, 2007 Steve Stigler
Department of Statistics, University of Chicago
Five Great Errors by Great Statisticians.
[abstract]What sort of role models should aspiring statistical researchers follow? A new approach is suggested by a study of great errors in statistical theory by Joseph-Louis Lagrange, Karl Pearson, Ronald Fisher, and Harold Hotelling. It is arguable that they succeeded because of their errors, not in spite of them. The lessons from this are obvious.
October 15, 2007 Junhui Wang
Department of Statistics, Columbia University
On Margin Based Semisupervised Learning
[abstract]In classification, semi-supervised learning occurs when a large amount of unlabeled data is available with only a small number of labeled data. This imposes a great challenge in that it is difficult to achieve good classification performance through labeled data alone. To leverage unlabeled data for enhancing classification, we introduces a margin based semisupervised learning method within the framework of regularization, based on an efficient margin loss for unlabeled data, which seeks efficient extraction of the information from unlabeled data for estimating the Bayes rule for classification. In particular, I will discuss three aspects: (1) the idea and methodology development; (2) computational tools; (3) a statistical learning theory. Numerical examples will be provided to demonstrate the advantage of our proposed methodology against other existing competitors. An application to gene function prediction will be discussed.
October 8, 2007 Hannes Leeb
Department of Statistics, Yale University
Informal seminar about conditional predictive inference post model selection.
September 24, 2007 Linda H. Zhao
Statistics Department, University of Pennsylvania
Good Estimators for Gaussian Models Having a Block-wise Structure
[abstract]Many multivariate Gaussian models can conveniently be split into disjoint, block-wise problems. Common settings where this situation arises are balanced ANOVA models, balanced longitudinal models and certain block-wise shrinkage estimators in nonparametric regression estimation involving orthogonal bases such as Fourier or wavelet bases.

It is well known that the standard, least squares estimate in multidimensional Gaussian models can often be improved through the use of minimax shrinkage estimators or related Bayes estimators that lead to well-motivated shrinkage estimation. In a situation involving statistically independent blocks it is natural to apply this shrinkage separately within each block. From the Bayesian perspective this results from placing independent priors on the statistically independent blocks.

In the talk we show that standard estimators constructed via independent shrinkage are inadmissible in the original problem in terms of their squared-error risk; and we provide improved minimax estimators. An alternate class of block-wise shrinkage estimators is also considered, and fairly precise conditions are given that characterize when these estimators are admissible or quasi-admissible.

This is a joint work with L. Brown

Revised: 15 July 2009