Date | Speaker | Seminar Title |
---|---|---|
Apr. 27, 2009 |
Hongyu Zhao
Epid/Public Health, Yale University |
Incorporating Prior Biological Knowledge in Genome-Wide Association Studies
[abstract] The last four years have seen great successes in many Genome-Wide Association Studies (GWAS) which have identified numerous genetic variants underlying complex traits. The analysis and interpretation of data from GWAS presents great statistical and computational challenges, especially after the initial discoveries of variants carrying relatively large effects. Although various statistical approaches have been or are being developed to better analyze GWAS data, it has become apparent that the incorporation of information from prior studies and other sources is indispensable. In this presentation, we discuss our recently developed statistical methods and bioinformatics tools that are designed to more effectively integrate diverse types of prior biological information in analyzing GWAS data. The usefulness of these methods will be illustrated through their applications to some recent large scale GWAS data.
|
Apr. 20, 2009 |
|
no seminar |
Apr. 13, 2009 |
Ioannis (or Yiannis) Kontoyiannis
Department of Informatics Athens University of Econ & Business |
Control Variates for Reversible MCMC Samplers
[abstract] Control variates are a well-known and effective tool for variance reduction in Monte Carlo sampling. We will present a general methodology for the construction and effective use of control variates for reversible Markov chain Monte Carlo (MCMC) samplers. We will show that the values of the coefficients of the optimal linear combination of the control variates can be computed explicitly, and we will derive adaptive, consistent MCMC estimators for these optimal coefficients. Numerous MCMC simulation examples from Bayesian inference applications will be presented, demonstrating that the resulting variance reduction can be quite dramatic, often by a factor of the order of hundreds or thousands. Joint work with Petros Dellaportas. |
Apr. 6, 2009 |
Ian McKeague
Department of Biostatistics Columbia University |
Logistic Regression with Brownian-like Predictors
[abstract] This talk introduces a new type of logistic regression model involving functional predictors of binary responses, along with an extension of the approach to generalized linear models. The predictors are trajectories that have certain sample-path properties in common with Brownian motion. Time points are treated as parameters of interest, and confidence intervals developed under prospective and retrospective (case-control) sampling designs. In an application to fMRI data, signals from individual subjects are used to find the portion of the time course that is most predictive of the response. This allows the identification of sensitive time points, specific to a brain region and associated with a certain task, that can be used to distinguish between responses. A second application concerns gene expression data in a case-control study involving breast cancer, where the aim is to identify genetic loci along a chromosome that best discriminate between cases and controls. The talk is based on joint work with Martin Lindquist. |
Mar. 30, 2009 |
Mokshay Madiman
Department of Statistics Yale University |
A New Look at the Compound Poisson Distribution and Compound Poisson
Approximation using Entropy
[abstract] We develop an information-theoretic foundation for compound Poisson approximation and limit theorems (analogous to the corresponding developments for the central limit theorem and for simple Poisson approximation). First, sucient conditions are given under which the compound Poisson distribution has maximal entropy within a natural class of probability measures on the nonnegative integers. In particular, it is shown that a maximum entropy property is valid if the measures under consideration are log-concave, but that it fails in general. Second, approximation bounds in the (strong) relative entropy sense are given for distributional approximation of sums of independent nonnegative integer valued random variables by compound Poisson distributions. The proof techniques involve the use of a notion of local information quantities that generalize the classical Fisher information used for normal approximation, as well as the use of ingredients from Stein's method for compound Poisson approximation. This work is joint with Andrew Barbour (Zurich), Oliver Johnson (Bristol) and Ioannis Kontoyiannis (AUEB). |
Mar. 23, 2009 |
Jack Silverstein
Department of Mathematics North Carolina State University |
Eigenvalues of Large Dimensional Random Matrices |
Mar. 9, 2009 |
|
Spring Break: no seminar until March 23 |
Mar. 2, 2009 |
Andrew Barron
Yale University Department of Statistics |
Superposition Codes with Polynomial Size Dictionary are Reliable at Rates Up to Channel Capacity |
Feb. 23, 2009 |
Charles R. Johnson
Department of Mathematics The College of William & Mary |
Determinantal Inequalities: Ancient History and Recent Advances |
Feb. 16, 2009 |
Daniel Spielman
Computer Science Department Yale University |
Graph approxiamtion and local clustering, with applications to the solution of diagonally-dominant systems of linear equations. |
Feb. 9, 2009 |
Gideon Weiss
University of Haifa |
FCFS Infinite Bipartite Matching of Servers and Customers |
Feb. 2, 2009 |
Hui Zou
School of Statistics University of Minnesota |
Local CQR Smoothing |
Jan. 26, 2009 |
Joerg Stoye
Department of Economics New York University |
More on Confidence Intervals for Partially Identified Parameters |
Jan. 19, 2009 |
|
Martin Luther King Jr. Day : no seminar |
Jan. 12, 2009 |
Patrick Wolfe
Statistics and Information Sciences Laboratory Harvard University |
Prespectives on Large-Scale Network Data: Blending Inference and Algorithms for Analysis |
Dec. 8, 2008 |
|
Winter Break: no seminar until January 12 |
Dec. 1, 2008 |
Matt Harrison
Department of Statistics, Carnegie Mellon University |
Conditional inference for assessing the statistical significance of neural spiking patterns
[abstract] Conditional inference has proven useful for exploratory analysis of neurophysiological point process data. I will illustrate this approach and then focus on two sub-problems: (1) uniform generation of binary matrices with marginal constraints and (2) multiple hypothesis testing for random measures. (1) Sequential importance sampling (SIS) is an effective technique for approximate uniform sampling of binary matrices with specified marginals. I will describe how to simplify and improve existing SIS procedures using improved asymptotic enumeration and dynamic programming (DP). The DP approach is interesting because it facilitates generalizations. (2) For point process data or functional data collected in different experimental conditions, it is often important to determine if the data are distributed differently in different conditions and to further localize where (in time or space, for example) the differences occur. This can be framed as a multiple testing problem for random measures. When differences might exist at multiple and/or unknown (spatio-temporal) scales, I call this multi-scale multiple testing because for each location there are many hypothesis tests corresponding to many potential scales. I will describe a non-parametric permutation test approach to this problem. This is joint work with Stuart Geman and Asohan Amarasingham. |
Nov. 24, 2008 |
|
Thanksgiving Break: no seminar |
Nov. 17, 2008 |
Yufeng Liu
Department of Statistics and Operations Research, University of North Carolina |
The Large Margin Unified Machine: A Bridge between Hard and Soft Classification.
[abstract] Margin-based classifiers have been popular in both machine learning and statistics for classification problems. Among numerous classifiers, some are hard classifiers and some are soft ones. Soft classifiers explicitly estimate the class conditional probabilities and then perform classification based on estimated probabilities. In contrast, hard classifiers directly target on the classification decision boundary without producing the probability estimation. These two types of classifiers are based on different philosophies and each has its own merits. In this talk, instead of making a choice between hard and soft classification, we propose a novel family of large-margin classifiers, namely large-margin unified machines (LUMs), which cover a broad range of margin-based classifiers including both hard and soft ones. The LUM family has close connections with some well-known large margin classifiers such as the Support Vector Machine and Boosting. By offering a natural bridge from soft to hard classification, the LUM provides a unified algorithm to fit various classifiers and hence a convenient platform to compare hard and soft classification.
|
Nov. 10, 2008 |
George Michailidis
Dept of Statistics and EECS, University of Michigan |
Dual Modality Network Tomography
[abstract] In this talk, we discuss joint modeling mechanisms for packet volumes and byte volumes to perform computer network tomography, whose goal is to estimate characteristics of source-destination flows based on link measurements. Network tomography is a prototypical example of a linear inverse problem on graphs. We examine two generative models for the relation between packet and byte volumes, establish identifiability of their parameters and discuss different estimating procedures. The proposed estimators of the flow characteristics are evaluated using both simulated and emulated data. Finally, the proposed models allow us to estimate parameters of the packet size distribution, thus providing additional insights into the composition of network traffic.
|
Nov. 3, 2008 |
Kjell Doksum
University of Wisconsin |
On Nonparametric Variable Selection
[abstract] We consider regression experiments involving a response variable Y and a large numbered of predictor variables (X's) many of which may be of no value for the prediction of Y and thus need to be removed before predicting Y from the X's. This talk considers procedures that select variables by using importance scores that measure the strength of the relationship between predictor variables and a response. In the first of these procedures, scores are obtained by randomly drawing subregions (tubes) of the covariate space that constrain all but one predictor and in each subregion computing a signal to noise ratio (efficacy) based on a nonparametric univariate regression of Y on the unconstrained variable. The regions are adapted to boost weak variables iteratively by searching (hunting) for the regions where the efficacy is maximized. The efficacy can be viewed as an approximation to a one-to-one function of the probability of identifying features. By using importance scores based on averages of maximized efficacies, we develop a variable selection algorithm called EARTH (Efficacy adaptive egression tube hunting). The second importance score method (RFVS) is based on using Random Forest importance values to select variables. Computer simulations show that EARTH and RFVS are successful variable selection methods when compared to other procedures in nonparametric situations with a large number of irrelevant predictor variables. Moreover, when each is combined with the model selection and prediction procedure MARS, the tree-based prediction procedure GUIDE, or the Random Forest prediction method, the combinations lead to improved prediction accuracy for certain models with many irrelevant variables. We give conditions under which a version of the EARTH algorithm selects the correct model with probability tending to one as the sample size tends to infinity, even if d tends to infinity as n tends to infinity. We end with the analysis of a real data set. (This is joint work with Shijie Tang and Kam Tsui.) |
Oct. 27, 2008 |
Jun Liu
Department of Statistics, Harvard University |
Inference of Patterns and Associations Using Dictionary Models
[abstract] Pattern discovery is a ubiquitous problem in many disciplines. It is especially prominent in recent years due to our greatly improved data-generation capabilities in science and technologies. The method I present here is motivated by the "motif-finding" and "module-finding" problems in biology, i.e., to find sequence patterns (i.e., "words") that seem to appear more frequent than usual in a given set of text sequences (i.e., sentences) and to find which of these "words" tend to co-occur in a sentence. A challenge in the motif-finding problem is that there are no spacings and punctuations between the words and the dictionary of "words" is unknown to us. Existing methods are mostly "bottom-up" approaches, i.e., to build up the dictionary starting with single-letter words and then concatenate some existing words that appear to occur next to each other in sentences more frequently than chance. Our new approach is a top-down strategy, which uses a tree structure to represent the relationship among all possible existing words and uses the EM algorithm to estimate the usage frequency of each word. It automatically trims down most of the incorrect "words" by letting their usage frequencies converge to zero. The module-finding problem is closely related to the well-known "market basket" problem, in which one attempts to mine association rules among the items in a supermarket based on customers' transaction records. It is also related to the two-way clustering problem. In this problem, we assume that the words are given, and our goal is to find subsets of words that tend to co-occur in a sentence. We call the set of co-occurring words (not necessarily orderly) a "theme" or a "module". We can generalize the dictionary model to the "theme"-model and use a similar EM-strategy to infer these themes. I will demonstrate its applications in a few examples including an analysis of Chinese medicine prescriptions and an analysis of a Chinese novel. This is based on a joint work with Ke Deng and Zhi Geng. |
Oct. 20, 2008 |
Alexander Barvinok
Department of Mathematics University of Michigan |
What does a random contingency table look like? |
Oct. 13, 2008 |
Sid Resnick
Cornell University |
Detection of the Conditional Extreme Value Model |
Oct. 8, 2008 |
Hannes Leeb
Department of Statistics Yale University |
(informal seminar) |
Oct. 6, 2008 |
Bing Li
Department of Statistics Penn State University |
Dimension Reduction for Non-Elliptically Distributed Predictors: Second-Order Methods |
Sept. 29, 2008 |
Roger Cooke
Resources for the Future and Department of Mathematics, Delft University of Technology |
The Vine-Copula and Bayesian Belief Net
Representation of High Dimensional Distributions
[abstract] Regular vines are a graphical tool for representing complex high dimensional distributions as bivariate and conditional bivariate distributions. Assigning marginal distributions to each variable and (conditional) copulae to each edge of the vine uniquely species the joint distribution, and every joint density can be represented (non-uniquely) in this way. From a vine-copulae representation an expression for the density and a sampling routine can be immediately derived. Moreover the mutual information (which is the appropriate generalization of the determinant for non-linear dependencies) can be given an additive decomposition in terms of the conditional bivariate mutual informations at each edge of the vine. This means that minimal information completions of partially specied vine-copulae can be trivially constructed. The basic results on vines have recently been applied to derive similar representations for continuous, non-parametric Bayesian Belief Nets (BBNs). These are directed acyclic graphs in which influence (directed arcs) are interpreted in terms of conditional copulae. Interpreted in this way, BBNs inherit all the desirable properties of regular vines, and in addition have a more transparent graphical structure. New results concern 'optimal' vine-copulae representations; that is, loosely, representations which capture the most dependence in the smallest number of edges. This development uses the mutual information decomposition theorem, the theory of majorization and Schur convex functions. Keywords: correlation, graphs, positive definite matrix, Bayesian Belief Nets, majorization, determinant, mutual information, Schur convex functions, model inference.
|
Sept. 22, 2008 |
Paul Kabaila
La Trobe University Department of Mathematics and Statistics |
Confidence Intervals in Regression Utilizing Prior Information |
Sept. 15, 2008 |
Muni S. Srivastava
University of Toronto Department of Statistics |
Analyzing High Dimensional Data with Fewer Observations |
Sept. 12, 2008 |
Larry Shepp
Rutgers University Department of Statistics |
A Mathematical Approach to Managing Diabetes |