STAT 242b/542b THEORY OF STATISTICS: DRAFT OF COURSE SYLLABUS Yale Main Campus section meets MWF 9:30-10:20, Becton 102 Yale Pfizer section meets Tues. Thurs. 4:00-5:30 in Groton Text is by Rice ``Mathematical Statistics and Data Analysis'' WEEK 1: January 14-18. PROBABILITY REVIEW * Probability models and problem solving using conditioning, independence (Ch. 1,2,3). * Expected values and variances of sample means. Use of the Central Limit Theorem. Normal approximation. Delta method. (Ch. 4,5). WEEK 2: * January 21. Yale Holiday. * January 22 or 23. SOME MORE PROBABILITY FOR STATITICS Normal, Chi-square, t, and F distributions for statistics based on samples from a normal. (Ch. 6). * January 24 or 25. PRELIMINARIES ON INFERENCE (Sec.8.2). Example involving a simple discrete distribution (such as the Poisson or geometric). Observed and expected counts. Intuitive parameter estimates. Chi-square test. WEEK 3: January 28- Feb. 1. ESTIMATION (Ch. 8) * Methods of estimation. (Sections 8.3, 8.4, 8.5) Emphasis on maximum likelihood and method of moments. Brief discussion of Bayes methods. * Sufficient statistics and likelihood factorization. (Section 8.7). WEEK 4: February 4-8. SAMPLING DISTRIBUTIONS AND LARGE SAMPLE APPROXIMATIONS (Sections 8.3, 8.4, 8.5). * Comparison of sampling distributions of MLE and other estimators in an example. (such as estimating the shape parameter of a Gamma distribution for rainfall). * Notions of consistency, asymptotic normality, asymptotic variance and standard error. * Consistency of the MLE. [Shannon entropy and his information inequality. Ideas of Wald.] * Asymptotic Normality of the MLE. [Idea based on Taylor expansion, CLT, and Fisher information.] WEEK 5: February 11-15. RISK AND EFFICIENCY (Sec. 8.6). * Mean squared error. Bias and variance tradeoff. * Unbiased estimators. Cramer-Rao inequality. Fisher information. * Efficiency of estimators * Asymptotic efficiency for MLE discussed [also remark on asymptotic efficiency of Bayes estimators] * Examples. WEEK 6: February 18-22. TESTING STATISTICAL HYPOTHESES (Ch.9 and Ch.15, Sec.15.2.3). * Notions of simple and composite hypotheses concering distributions and their parameters. * Neyman-Pearson Lemma for optimal tests in simple versus simple cases. * Generalized likelihood ratios for composite cases (not necessarily optimal). Simplifying the form of generalized likelihood ratios. * Choosing the threshold of a test statistic according to the desired significance level. WEEK 7: February 25-March 1. MORE ON TESTING HYPOTHESES AND REVIEW * Tests for a specific value of a parameter. -- one-sided alternatives. Uniformly most powerful tests made easy. -- two-sided alternatives. Relationship with confidence intervals. -- examples. * Tests for goodness of fit of an estimated model. -- Chi-square test statistic -- Generalized likelihood ratio test statistic (both approximately Chi-square distributed under the null hypothesis.) -- Accounting for degrees of freedom. -- Example. * Review of Inference -- Selecting probability models. -- Estimating parameters in the model. -- Testing statistical hypotheses. WEEK 8: * March 4 or 5. MIDTERM EXAM Covering chapters 8, 9 and any additional material presented. * March 6-8. SUMMARIZING DATA (a sampling of ch. 10). Plotting and interpreting empirical distributions, survival functions, quantile-quantile plots, histograms, densities, stem & leaf, and box-plots. March 9-24. SPRING BREAK WEEK 9: March 25-29. HYPOTHESIS TESTS FOR COMPARING LOCATION IN TWO SAMPLES (Ch. 11). * Tests for location in independent samples -- various tests do about the same thing. -- specifics for tests of location depend on assumptions about the variances. * Tests for paired samples -- Reduces to a one-sample test regarding differences. -- Easier and more accurate than with independent samples. [Optional: Sign test, Mann-Whitney tests, signed rank test.] * Issues in the design of experiments. WEEK 10: April 1-5. HYPOTHESIS TESTS FOR COMPARING LOCATION IN MULTIPLE SAMPLES (Ch. 12). * One-way layout -- plots of means or medians (with box-plots). -- multiple comparisons. -- analysis of variance. * Two-way layout -- superimposed plots of mean resonse for each value of a second factor. -- the additive model and its graphical interpretation. -- tests for additive and interaction models. -- analysis of variance. WEEK 11: April 8-12. LEAST SQUARES FITS OF LINES (AND CURVES) (Ch. 14.1,14.2). * Solving and interpreting the least squares problem in linear regression. -- Graphical interpretation (best fitting line) -- Solution in terms of standardized variables (slope = correlation coefficient r) -- Solution in terms of original variables -- Pythagorian identity for sum of squared errors and its geometrical interpretation. -- least squares projection yields residual vector which is orthogonal to the vector of the explanatory variable. -- residual plots. * Statistical Properties of the estimated parameters. -- standard linear model with homoscedastic errors. -- means, variances, and distribution of estimated parameters. -- estimated variances, standard errors, confidence intervals. -- heteroscedastic errors and nonlinearities revealed through residual plots. -- uses of transformations. Tranforming inputs to correct for nonlinearities. Transforming outputs to correct for heteroscedasticity. * Galton and the ``regression'' interpretation of least squares. WEEK 12: April 15-19. LINEAR LEAST SQUARES WITH MULTIPLE EXPLANATORY VARIABLES. * Iterative Projection Interpretation * Matrix Interpretation * Statistical Properties -- Standard statistical model -- Mean and covariance matrix of least squares estimates. -- Estimation of sigma^2. -- Standard errors of coefficients. -- Confidence intervals for coefficients. -- Hypothesis tests concerning values of coefficients. -- t-values and interpreting regression output (testing whether a coeff. is zero, withen the others are not). -- standard errors and confidence intervals for regression fits. * Practice with multiple regression, tests and confidence intervals. WEEK 13: April 22-29. MODEL SELECTION. * Step-size selection by examination of largest magnitude t-value (merits and difficulties). * Prediction error criterion (merits and difficulties). * Practice with model selection by either method. * Class evaluation of course and teachers. April 30 - May 3. READING WEEK. * At least one comprehensive review session. * Practice exam? May 9 (Thursday Afternoon for 3.5 hours). FINAL EXAM.