Andrew R. Barron
Yale University
Professor of Statistics (100% Appointment)
Professor of Electrical Engineering (Curtesy Appointment)
Director of Graduate Studies and Director of Undergraduate Studies in Statistics
Senior Coordinator of Undergraduate Studies in Applied Mathematics

Address: Department of Statistics, P.O. Box 208290, New Haven, CT 06520
Phone: 203-432-0666, 203-432-0634
Fax: 203-432-0633
E-Mail: Andrew.Barron@yale.edu

Date of Birth: September 28, 1959
Family: Married, May 1981 (Spouse: Carol Barron); Six children (John, Peter, Timothy, Michelle, Julie, Gina)

Education

  • Ph.D., Electrical Engineering, Stanford University, 1985.
  • M.S., Electrical Engineering, Stanford University, 1982.
  • B.S. (Magna Cum Laude), E.E. and Math Science, Rice University, 1981.
  • W. T. Woodson H.S., Fairfax, Virginia, 1977.

Experience:

  • 1992 - present   Professor of Statistics, Yale University
  • 1990 - 1992       Associate Professor of Statistics and Electrical & Computer Engineering, University of Illinois
  • 1984 - 1998       Consultant, Barron Associates, Inc., Stanardsville, Virginia
  • 1992 Spring      Visiting Reseach Scholar, Barron Associates, Inc., Stanardsville, Virginia
  • 1991 Fall           Visiting Scholar, Mathematical Sciences Research Institute, Berkeley, California
  • 1985 - 1990       Assistant Professor of Statistics and Electrical & Computer Engineering, University of Illinois
  • 1982 - 1985       Research Assistant, Stanford University
  • 1981 - 1983       Consultant, Adaptronics, Inc., McLean, Virginia
  • 1977 - 1980       Engineer, Adaptronics, Inc., McLean, Virginia (Summers)

Associate Editorship:

  • 1993 - 1995 IEEE Transactions on Information Theory. A.E. for nonparametric estimation, classification, and neural nets.
  • 1995 - 1997 Annals of Statistics.
  • 1994 - 1997 Neural Networks.

Honors:

  • IMS Medallion Award Winner. Presented at 2005 Joint ASA-IMS Annual Meetings, Minneapolis, MN.
  • Best paper prize for all IEEE journals in 1990-1991, Browder J. Thompson Memorial Prize,
    for authors of age 30 or under at time of submission.
  • Best paper prize, National Aerospace Electronics Conference, 1990.
  • Finalist for the best paper prize, Information Theory Society, IEEE, 1987.
  • Nominated for the Marconi Young Scientist Award, 1990.
  • Keynote speaker at several conferences.

  • Twice Elected to Board of Governors, IEEE Information Theory Society, 1995-1997, 1998-2000.
  • Appointed Secretary, Board of Governors, IEEE Information Theory Society, 1989-1990.
  • Chairman, AMS Summer Research Conference, Adaptive Selection of Models and Procedures, 1996.
  • Program Committee, IMS-ASA Joint Statistical Meetings, 1991.
  • Program Committees, IEEE International Symposium on Information Theory, 1993, 1995.
  • Program Committees, IEEE Workshop on Information Theory, 1989, 2008.
  • Program Committee, World Congress on Neural Networks, 1995.
  • Program Committee, Neural Information Processing Systems: Natural and Synthetic, 1995.
  • Program Committee, ACM Workshop on Computational Learning Theory, 1991, 1997.
  • James Waters Creativity Award for best undergraduate research at Rice, 1981.
  • Houston Telephone Engineers scholarship, top student in communication theory, 1981.
  • Top Award for leadership, scholarship and service, Woodson, High School, Fairfax, VA, 1977.

Research Interests:

  • Entropy Power Inequalities and Central Limit Theorems
  • Entropy Rates, Likelihood Stabilization, and Inference for Dependent Processes
  • Foundations of Minimum Description Length Principle of Inference and Universal Data Compression
  • Statistical Risk Analysis for Penalized Criteria for Model Selection
  • Statistical Risk Analysis for Bayes Procedures
  • Statistical Perspectives and Analysis of Artificial Neural Networks
  • Nonlinear Approximation and Estimation for High-dimensional Libraries of Functions
  • Greedy Algorithms for Subset Selection, Mixture Density Estimation, and L1 Penalty Optimization
  • Maximum Wealth Stock Indices and Growth Rate Optimal Portfolio Estimation

Ph.D. Dissertation:

  1. A. R. Barron (1985). Logically smooth density estimation. Stanford Univ., Stanford, CA.

Journal Publications:

  1. D. Cleveland, A. R. Barron, and A. N. Mucciardi (1980). Methods for determining the depth of near-surface defects. Journal of Nondestructive Evaluation, Vol.1, pp.21-36.

  2. A. R. Barron (1985). The strong ergodic theorem for densities: generalized Shannon-McMillan-Breiman theorem. Annals of Probability, Vol.13, pp.1292-1303. (Finalist for the best paper prize by the IEEE Information Theory Society.)

  3. A. R. Barron (1986). Entropy and the central limit theorem. Annals of Probability, Vol.14, pp.336-342. (Finalist for the best paper prize by the IEEE Information Theory Society.)

  4. A. R. Barron (1986).Discussion on Diaconis and Freedman: The consistency of Bayes estimates. Annals of Statistics, Vol.14, pp.26-30.

  5. A. R. Barron and T. M. Cover (1988). A bound on the financial value of information. IEEE Transactions on Information Theory, Vol.34, pp.1097-1100.

  6. A. R. Barron (1989). Uniformly powerful goodness of fit tests. Annals of Statistics, Vol.17, pp.107-124.

  7. B. Clarke and A. R. Barron (1990). Information theoretic asymptotics of Bayes methods. IEEE Transactions on Information Theory, Vol.IT-38, pp.453-471. (Winner 1992 Browder J. Thompson Memorial Prize award for the best paper in all IEEE journals for authors of age 30 or under at time of submission).

  8. A. R. Barron and X. Xiao (1991). Discussion on Friedman's multiple additive regression splines. Annals of Statistics, Vol.19, pp.67-82.

  9. A. R. Barron and C. Sheu (1991). Approximation of density functions by sequences of exponential families. Annals of Statistics, Vol.19, pp.1347-1369.

  10. A. R. Barron and T. M. Cover (1991). Minimum complexity density estimation. IEEE Transactions on Information Theory, Vol.IT-37, pp.1034-1054.

  11. A. R. Barron, L. Gyorfi, and E. C. van der Meulen (1992). Distribution estimation consistent in total variation and in two types of information divergence. IEEE Transactions on Information Theory, Vol.IT-38, pp.1437-1454.

  12. A. R. Barron (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, Vol.IT-39, pp.930-944.

  13. A. R. Barron (1994). Approximation and estimation bounds for artificial neural networks. Machine Learning, Vol.14, pp.113-143.

  14. A. R. Barron (1994). Comment on Cheng and Titterington: Neural networks, a review from a statistical perspective. Statistical Science, Vol.9, No. 1, pp.33-35.

  15. B. Clarke and A. R. Barron (1994). Jeffreys' prior is asymptotically least favorable under entropy risk. Journal of Statistical Planning and Inference, Vol.41, pp.37-60.

  16. Q. Xie and A. R. Barron (1997).Minimax redundancy for the class of memoryless sources. IEEE Transactions on Information Theory, Vol.43, pp.646-657.

  17. Y. Yang and A. R. Barron (1998). An asymptotic property of model selection criteria. IEEE Transactions on Information Theory, Vol.44, pp.117-133.

  18. A. R. Barron, J. Rissanen and B. Yu (1998). The minimum description length principle in coding and modeling. (Invited Paper. Special issue in honor of 50 years since Claude Shannon's seminal work.) IEEE Transactions on Information Theory, Vol.44, pp.2734-2760.

  19. A. R. Barron and N. Hengartner (1998). Information theory and superefficiency. Annals of Statistics, Vol.26, pp.1800-1825.

  20. A. R. Barron, L. Birge and P. Massart (1999). Risk bounds for model selection by penalization. Probability Theory and Related Fields, Vol.113, pp.301-413.

  21. A. R. Barron, M. Schervish, and L. Wasserman (1999). The consistency of posterior distributions in nonparametric problems. Annals of Statistics, Vol.27, pp.536-651.

  22. Y. Yang and A. R. Barron (1999). Information-theoretic determination of minimax rates of convergence. Annals of Statistics, Vol.27, pp.1564-1599.

  23. Q. Xie and A. R. Barron (2000). Asymptotic minimax regret for data compression, gambling, and prediction. IEEE Transaction on Information Theory, Vol.46, pp.431-445.

  24. G. Cheang and A. R. Barron (2000). A better approximation for balls. Journal of Approximation Theory, Vol.104, pp.183-203.

  25. J. E. Cross and A. R. Barron (2003).Efficient universal portfolios for past dependent target classes. Mathematical Finance, Vol.13, Issue 2, pp.245-276.

  26. Oliver Johnson and A. R. Barron (2004). Fisher Information inequalities and the central limit theorem. Probability Theory Related Field , Vol.129, pp.391-409.

  27. F. Liang and A. R. Barron (2004). Exact Minimax Strategies for predictive density estimation, data compression, and model selection. IEEE Transactions on Information Theory, Vol.50, pp.2708-2726.

  28. G. Leung and A. R. Barron (2006). Information theory and mixing least-squares regressions. IEEE Transactions on Information Theory, Vol.52, pp.3396-3410.

  29. M. Madiman and A. R. Barron (2007). Generalized entropy power inequalities and monotonicity properties of information. IEEE Transactions on Information Theory, Vol.53, pp.2317-2329.

  30. A. R. Barron, A. Cohen, W. Dahmen and R. DeVore (2008). Approximation and learning by greedy algorithms. Annals of Statistics, Vol.36, pp.64-94.

  31. J. Takeuchi, T. Kawabata and A. R. Barron. Properties of Jeffreys mixture for Markov sources. IEEE Transactions on Information Theory, Accepted for publication 2007.

  32. C. Huang, G.L.H. Cheang and A. R. Barron. Risk of penalized least squares, greedy selection and L1 penalization for flexible function libraries. Submitted to Annals of Statistics, 2008.

  33. A. M. Kagan, T. Yu, A. R. Barron and M. Madiman. Contributions to the theory of Pitman estimators . Submitted 2008.

  34. Book Chapters and Articles:

  35. A. R. Barron (1984). Predicted squared error: a criterion for automatic model selection. Chapter 4 in Self-Organizing Methods in Modeling, S. J. Farlow (Editor), Marcel Dekker, New York, pp.87-103.

  36. R. L. Barron, A. N. Mucciardi, F. J. Cook, J. N. Craig, and A. R. Barron (1984). Adaptive learning networks. Chapter 2 in Self-Organizing Methods in Modeling, S. J. Farlow (Editor), Marcel Dekker, New York, pp.25-65.

  37. A. R. Barron (1987). Are Bayes rules consistent in information? In Open Problems in Communication and Computation, T. M. Cover and B. Gopinath (Editors), Springer-Verlag, New York, pp.85-91.

  38. A. R. Barron (1991). Complexity regularization with application to artificial neural networks. In Nonparametric Functional Estimation and Related Topics, G. Roussas (Editor), Kluwer Academic Publishers, Boston, MA and Dordrecht, The Netherlands, pp.561-576.

  39. A. R. Barron (1998). Information-theoretic characterization of Bayes performance and the choice of priors in parametric and nonparametric problems. In Bayesian Statistics 6, J.M. Bernardo, J.O. Berger, A.P. Dawid and A.F.M. Smith (Editors). Oxford University Press. pp.27-52.

  40. J.Q. Li and A.R. Barron (2000). Mixture density estimation. In Advances in Neural Information Processing Systems, Vol.12, S.A. Solla, T.K. Leen and K-R. Mueller (Editors). MIT Press, Cambridge, Massachusetts, pp. 279-285.

  41. F. Liang and A.R. Barron (2005). Exact minimax predictive density estimation and MDL. In Advances in Minimum Description Length: Theory and Applications, P. Grunwald, I.J. Myung and M. Pitt (Editors). MIT Press, Cambridge, Massachusetts.

  42. A.R. Barron, C. Huang, J. Q. Li and Xi Luo (2008). The MDL principle, penalized likelihood, and statistical risk. In Feschrift for Jorma Rissanen. Presented to Rissanen Nov. 2007. Tampere University Press, Tampere, Finland, April 2008.

  43. Publications in Conference Proceedings: (5 to 25 pages)

  44. A. R. Barron, F. W. van Straten, and R. L. Barron (1977). Adaptive learning network approach to weather forcasting: a summary. Proceedings of the IEEE International Conference on Cybernetics and Society, Washington, DC, September 19-21. Published by IEEE, New York, pp.724-727.
  45. A. R. Barron and R. L. Barron (1988). Statistical learning networks: a unifying view. In Computing Science and Statistics: Proceedings of the 20th Symposium on the Interface, Reston, Virginia, April 20-23. E. Wegman, Ed., Published by the American Statistical Association, Alexandria, Virginia, pp.192-203. (Invited presentation).

  46. A. R. Barron (1989). Statistical properties of artificial neural networks. Proceedings of the IEEE International Conference on Decision and Control, Tampa, Florida, December 13-15. Published by IEEE, New York. (Invited presentation).

  47. R. L. Barron, R. L. Cellucci, P. R. Jordan, N. E. Beam, P. Hess, and A. R. Barron (1990). Applications of polynomial neural networks to fault detection, isolation, and estimation (FDIE) and reconfigurable flight control. Proceedings of the National Aerospace Electronics Conference, Dayton, Ohio, May 23-25. (Winner of the best paper prize, 1990 NAECON).

  48. A. R. Barron (1991). Approximation and estimation bounds for artificial neural networks. In Computational Learning Theory: Proceedings of the Fourth Annual ACM Workshop, Santa Cruz, CA, August 5-7. L. Valiant, Ed., Morgan Kaufmann Publishers, Inc., San Mateo, California, pp.243-249. (Honored as one of the four papers invited to appear in expanded form in a special issue of Machine Learning, representing the top presentations at the workshop.)

  49. A. R. Barron (1992). Neural net approximation. Proceedings of the 7th Yale Workshop on Adaptive and Learning Systems, May 20-22, K. S. Narendra, Ed., Center for Systems Science, Yale University, pp.69-72.

  50. D. Haussler and A. R. Barron (1993). How well do Bayes methods work for on-line prediction of + or -1 values? Computational Learning and Cognition: Proc. Third NEC Research Symposium, SIAM, Philadelphia, pp.74-100.

  51. J.-I. Takeuchi and A. R. Barron (1997). Asymptotically minimax regret for exponential families. 20st Symposium on Information Theory and Its Applications. Japan, pp.665-668. (Best paper award at SITA 97).

  52. J.-I. Takeuchi and A. R. Barron (1998). Robustly minimax codes for universal data compression. 21st Symposium on Information Theory and Its Applications. Gifu, Japan, December 2-5.

  53. J.-I. Takeuchi, T. Kawabata and A. R. Barron (2001). Properties of Jeffreys mixture for Markov sources. Proceedings of the Fourth Workshop on Information-Based Induction Sciences (IBIS 2001), pp. 327-332.

  54. G.H.L. Cheang and A. R. Barron (2001). Penalized least squares, model selection, convex hull classes, and neural nets. Proceedings of the 9th European Symposium on Artificial Neural Networks. M. Verleysen, Ed. pp.371-376.

  55. M. Madiman and A. R. Barron (2006). The monotonicity of information in the central limit theorem and entropy power inequalities. Proceedings of the 2006 IEEE International Symposium on Information Theory. Seattle, Washington, July 2006.

  56. A. R. Barron and Xi Luo (2007). Adaptive annealing. Proceedings 45th Annual Allerton Conference on Communication, Control, and Computing. Allerton House, UIUC, Illinois. September 26-28. pp.665-673.

  57. A. R. Barron, C. Huang, J. Q. Li, and Xi Luo (2008). MDL, Penalized likelihood and statistical risk. IEEE Information Theory Workshop. Porto Portugal, May 4-9.

Technical Reports: (with details not in subsequent publications)

  • A. R. Barron (1984). Monotonic central limit theorem for densities. Department of Statistics Technical Report #50, Stanford University, Stanford, California.

  • A. R. Barron (1988). The exponential convergence of posterior probabilities with implications for Bayes estimators of density functions. Department of Statistics Technical Report #7, University of Illinois, Champaign, Illinois.

  • B. Clarke and A. R. Barron (1990). Entropy risk and the Bayesian central limit theorem. Department of Statistics Technical Report, Purdue University, West Lafayette, Indiana.

  • A. R. Barron (1991). Information theory and martingales. Presented at 1991 IEEE International Symposium on Information Theory (recent results session), Budapest, Hungary, June 23-29.

  • A. R. Barron (1997). Information theory in probability, statistics, learning, and neural nets. Department of Statistics. Yale University. Working paper distributed at plenary presentation of the Tenth Annual ACM Workshop on Computational Learning Theory.

  • J.-I. Takeuchi and A. R. Barron (1997). Asymptotically minimax regret for exponential and curved exponential families. Fourteen page summary for presentation at the 1998 International Symposium on Information Theory, Cambridge, Massachusetts.

  • A. R. Barron (1999). Limits of information, Markov chains, and projection. Eight page summary of presentation at the 2000 IEEE International Symposium on Information Theory, Sorrento, Italy.

  • J. Yu and A. R. Barron (2003). Maximal compounded wealth for portfolios of stocks and options. (Needs revision, resubmission).

  • W. Qiu and A. R. Barron (2007). A maximum wealth asset index and mixture strategies for universal portfolios on subsets of stocks. (Working paper, for submission).

Other Conference Presentations: (with proceedings containing not more than 1 page abstracts)

  1. A. R. Barron (1983). Convergence of logically simple estimates of unknown probability densities. IEEE International Symposium on Information Theory, Saint Jovite, Canada, September 26-30.

  2. A. R. Barron (1985). Entropy and the central limit theorem. IEEE International Symposium on Information Theory, Brighton, England, June 23-28.

  3. A. R. Barron (1985). Ergodic theorem for densities: generalized Shannon-McMillan-Breiman theorem. IEEE International Symposium on Information Theory, Brighton, England, June 23-28.

  4. A. R. Barron (1985). Logically smooth density estimation. Joint IMS, ASA Annual Meeting, Las Vegas, Nevada, August 5-8.

  5. A. R. Barron (1987). Applications of large deviations in statistics. Conference on Asymptotic Methods for Stochastic Systems: Large Deviations Theory and Practice, University of Maryland, October 25-27. (Invited Presentation).

  6. A. R. Barron (1988). The convergence in information of probability density estimators. IEEE International Symposium on Information Theory, Kobe, Japan, June 19-24.

  7. T. M. Cover and A. R. Barron (1988). A bound on the financial value of information. IEEE International Symposium on Information Theory, Kobe, Japan, June 19-24.

  8. A. R. Barron (1989). Minimum complexity density estimation. IMS Regional Meeting, Lexington, Kentucky, March 19-22. (Invited Presentation).

  9. A. R. Barron (1989). Portfolio selection based on nonparametric density estimates for the stock market. 21st Symposium on the Interface: Computing Science and Statistics, Orlando, Florida, April 9-12. (Invited Presentation).

  10. A. R. Barron (1989). Minimum complexity estimation. IEEE Workshop on Information Theory, Center for Applied Math, Cornell University, June 26-30. (Session Organizer).

  11. A. R. Barron (1989). Some statistical properties of polynomial networks and other artificial neural networks. Conference on Neural Information Processing Systems, Denver, Colorado, November 27-30. (Invited Plenary Presentation).

  12. A. R. Barron (1990). An index of resolvability of probability density estimators. IEEE International Symposium on Information Theory, San Diego, California, January 14-19.

  13. A. R. Barron (1990). Some statistical convergence properties of artificial neural networks. IEEE International Symposium on Information Theory, San Diego, California, January 14-19.

  14. A. R. Barron (1990). The index of resolvability: statistical convergence of minimum complexity estimation. AAAI Symposium on Minimal-Length Encoding, Stanford California, March 27-29 (Invited Presentation).

  15. A. R. Barron (1990). Some approximation and estimation theorems for artificial neural networks. IEEE Information Theory Workshop, Veldhoven, The Netherlands, June 10-15 (Invited Presentation).

  16. A. R. Barron (1990). Statistical properties of artificial neural networks. SIAM Annual Meeting, Chicago, Illinois, July 20 (Invited Presentation).

  17. A. R. Barron (1991). Information theory and the stock market: the effect of side information. Workshop on Coordination of Distributed Information and Decisions, Cornell University, April 11-13. (Invited Presentation).

  18. A. R. Barron (1991). Approximation and estimation results for adaptively synthesized neural network architectures. Workshop on Theoretical Issues in Neural Nets, Center for Discrete Mathematics and Theoretical Computer Science, Rutgers, University, May 20-23. (Invited Presentation).

  19. A. R. Barron and R. L. Barron (1991). Artificial neural networks in industry: some developments using statistical techniques. IMS Special Topics Meeting on Statistics in Industry, Philadelphia, Pennsylvania, June 9-12 (Invited Presentation).

  20. A. R. Barron (1991). Universal approximation bounds for superpositions of a sigmoidal function. IEEE International Symposium on Information Theory, Budapest, Hungary, June 23-29.

  21. A. R. Barron (1991). Approximation and estimation bounds for sigmoidal and polynomial networks. Joint Statistical Meetings, Atlanta, Georgia, August 19-22. (Session Organizer).

  22. A. R. Barron (1991). Approximation and estimation bounds for neural networks and projection pursuit. Workshop on Neural Information Processing Systems, Vail, Colorado, December 10-11. (Invited Presentation).

  23. A. R. Barron (1991). Risk estimation, risk optimality, regularization, and neural nets. Workshop on Neural Information Processing Systems, Vail, Colorado, December 10-11.

  24. A. R. Barron (1992). Approximation, estimation, and computation results for artificial neural networks. 24th Symposium on the Interface: Computing Science and Statistics, College Station, Texas, March 19-21. (Invited Presentation).

  25. A. R. Barron (1992). Artificial neural networks: stochastic analysis and engineering applications? Symposium on Stochastic Processes in Engineering Applications, Otaniemi, Finland, April 7-9. (Invited Plenary Presentation).

  26. A. R. Barron, D. Olive, and Y. Yang (1992). Asymptotically optimal complexity-based model selection. IMS Regional Meeting, Corvallis, Oregon, June 14-15. (Invited Presentation).

  27. A. R. Barron (1992). Statistical accuracy of neural nets. Conference on Neural Information Processing Systems: Natural and Synthetic. Denver, Colorado, November 30 - December 2. (Invited Tutorial).

  28. A. R. Barron (1993). Neural net approximation. Annual Meeting of the American Math Society, San Antonio, Texas, January 13-16. (Invited Presentation).

  29. A. R. Barron, L. Gyorfi, and E. C. van der Meulen (1993). Universal coding of non-discrete sources based on distribution estimation consistent in expected information divergence. IEEE International Symposium on Information Theory, San Antonio, Texas, January 17-22.

  30. A. R. Barron, B. Clarke, and D. Haussler (1993). Information bounds for the risk of Bayesian predictions and the redundancy of universal codes. IEEE International Symposium on Information Theory, San Antonio, Texas, January 17-22.

  31. A. R. Barron (1993). Statistical accuracy of neural nets. Workshop on Information and Geometry, Hakone, Japan. March. (Invited Presentation).

  32. A. R. Barron (1993). Optimal rate properties of minimum complexity estimation. Workshop on Descriptional Complexity, Schloss Dagstahl, Wadern, Germany. May. (Invited Presentation).

  33. A. R. Barron (1993). Do neural nets avoid the curse of dimensionality? Congress on Statistics, Vannes, France, May. (Invited Presentation).

  34. A. R. Barron (1993). Performance bounds for neural net estimation and classification. Annual Meeting of the Classification Society of North America, Pittsburgh, June. (Invited Presentation).

  35. A. R. Barron (1993). Do neural nets avoid the curse of dimensionality? NATO ASI on Statistics and Neural Nets, Les Arc, France, June. (Invited Presentation).

  36. A. R. Barron (1994). The accuracy of Bayes estimates of neural nets. 26th Symposium on the Interface: Computing Science and Statistics, Research Triangle Park, NC, June 15-18.

  37. A. R. Barron, Y. Yang and B. Yu (1994). Asymptotically optimal function estimation by minimum complexity criteria. IEEE International Symposium on Information Theory, Trondheim, Norway, June 27 - July 1. (One page proceeding and three page summary available).

  38. A. R. Barron (1994). Neural net approximation and estimation. IEEE Workshop on Information Theory, Moscow, Russia, July 3-6. (Invited Presentation).

  39. A. R. Barron (1994) Minimum complexity estimation. ML/COLT: Workshop on Applications of Descriptional Complexity to Induction, Statistical and Visual Inference, New Brunswick, New Jersey, July 8-9. (Invited Presentation.)

  40. A. R. Barron (1995). Statistics and neural nets. New England Statistics Conference, Storrs, Connecticut, April 22. (Invited Plenary Presentation).

  41. A. R. Barron, Y. Yang (1995). Information-theoretic development of minimax rates of convergence. IEEE International Symposium on Information Theory, Whistler, British Columbia, September 17-22. (Added to session at recommendation of program chair; not in proceedings).

  42. N. Hengartner and A. R. Barron (1995). Information theory and superefficiency. IMS regional meeting Stanford, California. (Invited Presentation).

  43. B. S. Clarke and A. R. Barron (1995). Jeffreys' prior yields the asymptotic minimax redundancy. IEEE-IMS Workshop on Information Theory and Statistics, Alexandria, Virginia, October 27-29. (Invited Presentation).

  44. A. R. Barron (1995). Asymptotically optimal model selection and neural nets. IEEE-IMS Workshop on Information Theory and Statistics, Alexandria, Virginia, October 27-29. (Invited Presentation).

  45. Q. Xie and A. R. Barron (1996). Asymptotic minimax regret for data compression, gambling, and prediction. Workshop on sequence prediction, Santa Cruz, California, May 3-5. (Invited, jointly presented).

  46. A. R. Barron (1996). The fundamental role of Kullback Information in large sample statistics. Solomon Kullback Memorial Conference, Washington, DC, May 23-24. (Invited Presentation).

  47. A. R. Barron (1996). Adaptation and model selection. AMS-IMS-SIAM Summer Research Conference on adaptive selection of models and statistical procedures, Mount Holyoke, Massachusetts, June 22-28. (Conference Chairmen, A. Barron, P. Bickel, I. Johnstone, D. Donoho).

  48. A. R. Barron and Y. Yang (1996). Information theory in nonparametric estimation. Nonparametric Estimation: The Road Ahead, Canberra, Australia, July 2-4. (Invited Presentation).

  49. A. R. Barron (1996). Adaptive model selection and neural networks. Sydney Interational Statistics Congress, IMS regional meeting, Sydney, Australia, July 8-12. (Invited Presentation).

  50. A. R. Barron (1996). Asymptotics of Bayes estimators. International Society of Bayesian Analysis, Regional Meeting, Chicago, Illinois, August 2-3. (Invited Presentation).

  51. A. R. Barron and Y. Yang (1996). Adaptive model selection and the index of resolvability. Joint Statistical Meetings of the ASA and IMS, Chicago, Illinois, August 4-8. (Invited Presentation).

  52. A. R. Barron and Q. Xie (1997). Asymptotic minimax regret for data compression, gambling and prediction. IEEE International Symposium on Information Theory, Ulm, Germany, June 29 - July 4.

  53. A. R. Barron (1997). Information theory in probability, statistics, learning, and neural nets. Computational Learning Theory: Tenth Annual ACM Workshop, Nashville, Tennessee, July 6-9. (Invited Plenary Presentation).

  54. A. R. Barron (1997). Information theory in probability, statistics, learning, and neural nets. International Conference on Combinatorics, Information Theory, and Statistics, University of Southern Maine, Portland, Maine, July 18-20. (Invited Presentation).

  55. A. R. Barron and Y. Yang (1997). Information-theoretic determination of minimax rates of convergence. Symposium on Nonparametric Functional Estimation, Centre de Recherches Mathematiques, University of Montreal, October 16-18. (Invited Presentation).

  56. A. R. Barron (1998). Nonlinear approximation, greedy algorithms, and neural networks. Ninth International Conference on Approximation Theory, January 4-6. (Invited Presentation).

  57. A. R. Barron (1998). How information theory illuminates the behavior of risk functions of Bayes proceedures. Purdue Workshop on the Interface between Paradigms of Statistics. June 17-19. (Invited Presentation).

  58. A. R. Barron and J.-I. Takauchi (1998). Mixture models achieving optimal coding regret. IEEE Information Theory Workshop. Killarney, Ireland, June 22-26.

  59. J.-I. Takauchi and A. R. Barron (1998). Asymptotically minimax regret by Bayes mixtures. IEEE International Symposium on Information Theory, Cambridge, Massachusetts, August 16-21.

  60. A. R. Barron (1998). Information theory in probability and statistics; Approximation and estimation bounds for Gaussian mixtures. CIRM Workshop on Information Theory, Statistics, and Image Analysis, Marsielle, France, December 7-11. (Two Invited Presentations).

  61. A. R. Barron (1999). Information theory in probability and statistics. IEEE Information Theory Workshop on Detection, Estimation, Classification, and Imaging, Sante Fe, New Mexico, February 24-26. (Invited Presentation).

  62. A. R. Barron (1999). Decision theory of regret for universal coding, gambling, and prediction. DIMACS Workshop: Online Decision Making, Rutgers University, New Brunswick, NJ, July 12-15. (Invited Presentation).

  63. A. R. Barron (2000). Limits of information, Markov chains and projection. IEEE International Symposium on Information Theory, Sorrento, Italy, June 26-30.

  64. A. R. Barron, Laszlo Gyorfi, Micheal Nussbaum (2000). Nonparametric Estimation, Neural Nets and Risk Asymptotics. Short Course. Mathematical Research Institute, Oberwolfach, Germany, June 10-17.

  65. A. R. Barron (2000). Information theory in probability; Information theory in statistics. J. Bolyia Society Conference on Information Theory in Mathematics, honoring 50th anniversary of formation of what is now known as the Renyi Institute of Mathematics of the Hungarian Academy of Sciences. Balatonelle, Hungary, July 3-7. (Two Invited Presentations).

  66. A. R. Barron (2000). Prediction, data compression, gambling, and model selection: Do Bayes procedures nearly minimize the maximum of regret over all possible data sequences? AMS-IMS-SIAM Summer Research Conference on Bayes, Frequentist, and Likelihood Inference: a Synthesis. Mount Holyoke College, South Hadley, Massachusetts, July 9-13. (Invited Presentation.)

  67. A. R. Barron (2001). Information-theoretic bounds for mixture modeling, model selection, and data compression. Workshop on Information Theory and Statistics, DIMACS, Rutgers University, March 2001. (Invited Presentation).

  68. A. R. Barron (2001). Information theory in probability and statistics. 23rd European Meeting of Statistics, Funchal, Madeira, Portugal, August 13-18. (Invited Keynote Plenary Presentation).

  69. F. Liang and A. R. Barron (2001). Minimax optimal predictive density estimation, data compression, and model selection. Workshop on MDL at Conference on Neural Information Processing Systems, Whistler, British Columbia, December 10. (Invited Presentation).

  70. F. Liang and A. R. Barron (2002). Exact minimax strategies for predictive density estimation, data compression and model selection. IEEE International Symposium on Information Theory, Lausanne, Switzerland, July 1-5.

  71. A. R. Barron, Jiangfeng Yu, and Wei Qui (2003). Maximum compounded wealth: portfolio estimation, option pricing, and stock selection. Workshop on Complexity and Inference, DIMACS, Rutgers University, June 2-5. (Invited Presentation).

  72. A. R. Barron (2003). The role of information in the central limit problem. Symposium on Information Theory and Some Friendly Neighbors -- Ein Wunschkonzert, ZIF, Center for Interdisciplinary Research, Bielefeld, August 11-13. (Invited Presentation).

  73. A. R. Barron (2003). Interplay of statistics and information theory in formulation and selection of models. Workshop on Model Building, Dortmund, Germany, November 13-14. (Invited Presentation).

  74. G. Leung and A. R. Barron (2004). Information theory, model selection and model mixing for regression, Conference on Information Sciences and Systems, Princeton, NJ, March 17-19. (Invited Presentation in session on Information Theory, Computer Science, and Statistics.)

  75. A. R. Barron and G. Leung (2004). Risk assessment for Bayes procedures and model mixing in regression. IVth Workshop on Bayesian Nonparametrics, Universita di Roma La Sapienza, Rome, Italy, June 12-16. (Invited Presentation).

  76. A. R. Barron and G. Leung (2004). Risk assessment for model mixing. Workshop on Mathematical Foundations of Learning Theory, Barcelona, Spain, June 18-23. (Invited Presentation).

  77. A. R. Barron (2004). Relative entropy in probability theory and mathematical statistics. Workshop on Entropy in the Mathematical, Physical, and Engineering Sciences, Padova, Italy, June 24-27. (Two Invited Presentations).

  78. A. R. Barron (2004). Fitting functions of many variables: neural networks and beyond. 16th Conference on Computational Statistics (COMPSTAT 2004), Prague, Czech Republic, August 23-28. (Invited Keynote Plenary Presentation).

  79. A. R. Barron (2005). Neural nets, mixture models, and adaptive kernel machines. Yale Workshop on Adaptive and Learning Systems, May 29-31, Center for Systems Science, Yale University. (Invited presentation).

  80. A. R. Barron (2005). Challenges in high-dimensional function estimation and attempted solutions, Congress on Statistics, Pau, France, June 6-10. (Invited Presentation).

  81. A. R. Barron (2005). Information theory and risk analysis. Medallion Lecture. Joint Statistical Meetings of the IMS and ASA, Minneapolis, Minnesota, August 7-11. (Presented with IMS Medallion Award; one-hour special invited presentation on Aug. 7).

  82. A. R. Barron (2005). Information theory and statistics for machine learning, IEEE Workshop on Machine Learning for Signal Processing XV, Mystic, CT, October 28-30. (Invited Keynote Plenary Presentation).

  83. M. Madiman and A. R. Barron (2006). Monotonicity of information in the central limit theorem, Workshop on Information Theory and its Applications, University of California, San Diego, February 6-9. (Invited Presentation).

  84. A. R. Barron (2006). Simple risk bounds for mixing least squares regressions. Journees: Model Selection in Statistics: Different approaches, University de Nice, Sophia-Antipolis, Nice, France, March 14-19 (Two one-hour invited presentations).

  85. A. R. Barron (2006). Simple risk bounds for mixing least squares regressions. International Workshop on Applied Probability, University of Connecticut, Storrs, CT, May 18. (Invited Presentation).

  86. A. R. Barron and Wei Qiu (2007). Maximum wealth portfolios, Workshop on Information Theory and its Applications, University of California, San Diego, January 29 - February 2. (Invited Presentation).

  87. A. R. Barron, Cong Huang, and Xi Luo (2008). Penalized squared error and likelihood: risk bounds and fast algorithms, Workshop on Sparsity in High Dimensional Statistics and Learning Theory, Georgia Istitute of Technology, Atlanta, Georgia, March 22-24. (Invited Three-Part Presentation).

  88. A. R. Barron (2008). Information theory principles in probability and statistics, Elements of Information Theory Workshop, on the Occasion of Tom Cover 70th birthday, Stanford University, Stanford, CA, May 16. (Invited Presentation).

Invited Departmental Seminar Presentations: (no proceedings; abstracts not more than a paragraph)

  1. Purdue University, Joint Statistics Colloquium, October 3, 1985. Topic: Entropy and the central limit theorem.
  2. Michigan State University, Department of Statistics and Probability, January 28, 1986. Topic: Generalized Shannon-McMillan-Breiman theorem.
  3. University of Chicago, Department of Statistics, October 20, 1986. Topic: Uniformly powerful tests.
  4. University of Virginia, Department of Mathematics, March 5, 1987. Topic: Convergence of Bayes estimators of probability density functions.
  5. Stanford University, Department of Statistics, October 20, 1987. Topic: Convergence of Bayes estimators of probability density functions.
  6. University of Chicago, Department of Statistics, March 7, 1988. Topic: Convergence of Bayes estimators of probability density functions.
  7. McGill University, Joint Statistics Seminar for Montreal universities, March 31, 1988. Topic: Approximation of densities by sequences of exponential families.
  8. Dupont Research Center, Dover, Delaware, April 26, 1988. Topic: Statistical learning networks.
  9. IBM T. J. Watson Research Center, Yorktown Heights, New York, August 10, 1988. Topic: Statistical learning networks.
  10. Stanford University, Information Systems Laboratory, November 3, 1988. Topic: Minimum complexity density estimation.
  11. IBM Technical Education Center, Thornwood, New York, January 11-12, 1989. Statistical learning networks. In the short course on Knowledge Acquisition from Data.
  12. Cornell University, Department of Economics and Program of Statistics (Co-hosts), February 1, 1989. Topic: Convergence of Bayes estimators of probability density functions.
  13. Purdue University, Department of Statistics, September 7, 1989. Topic: Minimum complexity density estimation.
  14. University of Lowell, Massachusetts, Joint Seminar, Department of Mathematics and Department of Electrical Engineering, March14, 1990. Topic: Statistical properties of polynomial networks and other artificial neural networks.
  15. Carnegie Mellon University, Department of Statistics, April 4, 1990. Topic: Statistical properties of polynomial networks and other artificial neural networks.
  16. University of Chicago, Department of Statistics, October 15, 1990. Topic: Statistical properties of artificial neural networks.
  17. University of California, San Diego, Department of Mathematics, January 7, 1991. Topic: Approximation bounds for artificial neural networks.
  18. University of California, San Diego, Department of Mathematics, January 8, 1991. Topic: Complexity regularization for nonlinear model selection.
  19. Siemens Corporation, Princeton, New Jersey, February 28, 1991. Topic: Universal approximation bounds for superpositions of a sigmoidal function.
  20. University of Wisconsin, Department of Statistics, April 3, 1991. Topic: Complexity regularization for nonlinear model selection.
  21. University of Wisconsin, Department of Mathematics, April 4, 1991. Topic: Approximation bounds for artificial neural networks.
  22. Technical University of Budapest, Department of Electrical Engineering, July 2, 1991. Topic: Universal approximation bounds for superpositions of a sigmoidal function.
  23. Mathematical Sciences Research Institute, Berkeley, California, September 25, 1991. Topic: Empirical process bounds for artificial neural networks.
  24. Stanford University, Department of Statistics, October 15, 1991. Topic: Approximation and estimation bounds for artificial neural networks.
  25. University of California, Santa Cruz, Department of Computer and Information Sciences, October 17, 1991. Topic: Computationally efficient approximation and estimation of functions using artificial neural networks.
  26. Yale University, Department of Statistics, January 13, 1992. Topic: Neural network estimation.
  27. University of Virginia, Department of Electrical Engineering, Eminent Speaker Series, February 21, 1992. Topic: Estimation of functions of several variables -- neural networks, Fourier decomposition, and Bayes methods.
  28. North Carolina State University, Department of Statistics, February 29, 1992. Topic: Estimation of functions of several variables -- neural networks, Fourier decomposition, and Bayes methods.
  29. Cornell University, Center for Applied Mathematics, March 6, 1992. Topic: Estimation of functions of several variables -- neural networks, Fourier decomposition, and Bayes methods.
  30. University of North Carolina, Department of Statistics, March 30, 1992. Topic: Estimation of functions of several variables -- neural networks, Fourier decomposition, and Bayes methods.
  31. University of Joenesu, Finland, Department of Statistics, April 9, 1992. Topic: Introduction to artificial neural networks.
  32. University of Paris VI, Department of Statistics, April 15, 1992, and University of Paris, Orsay, Department of Statistics, April16, 1992. Topic: Estimation of functions of several variables -- neural networks, Fourier decomposition, and Bayes methods.
  33. University of Paris VI, Department of Statistics, April 22, 1992, and University of Paris, Orsay, Department of Statistics, April23, 1992. Topic: Performance bounds for complexity-based model selection.
  34. Princeton University, Department of Electrical Engineering, May 14, 1992. Topic: Overview of approximation results for sigmoidal networks.
  35. University of Massachusetts at Lowell, Joint Seminar, Department of Mathematics and Department of Electrical Engineering, October 21, 1992. Topic: Statistical accuracy of neural nets.
  36. University of Tokyo, Japan, Department of Information, Physics and Engineering, March 1993. Topic: Information theory and model selection.
  37. University of Paris VI, Department of Statistics, May 1993. Topic: Optimal rate properties of minimum complexity estimation.
  38. University of Paris VI, Department of Statistics, May 1993. Topic: Information-theoretic proof of martingale convergence.
  39. University of Pennsylvania, Wharton School, October 21, 1993. Topic: Neural networks and statistics.
  40. Massachusetts Institute of Technology, Center for Biological and Computational Learning, October 27, 1993. Topic: Neural networks and statistics.
  41. Rutgers University, Department of Statistics, October 5, 1994, Topic: Statistical accuracy of neural nets.
  42. University of South Carolina, Department of Mathematics, Spring 1995. Topic: Neural net approximation.
  43. Carnegie Mellon University, Department of Statistics, Fall 1995. Topic: Information risk and superefficiency.
  44. Massachusetts Institute of Technology, Department of Applied Mathematics, March 1996. Topic: Consistent and uniformly consistent classification.
  45. Columbia University, Department of Statistics, Fall 1996. Topic: Consistency of posterior distributions in nonparametric problems.
  46. Northeastern University, Joint Mathematics Colloquium with MIT, Harvard, and Brandiess, February 27, 1997. Topic: Information theory in probability and statistics.
  47. Iowa State University, Department of Statistics, March 28, 1997. Topic: Information theory in probability and statistics: The fundamental role of Kullback divergence.
  48. Washington University, St. Louis, Department of Electrical Engineering, Center for Imaging Systems, April 16, 1997. Topic: Universal data compression, prediction, and gambling.
  49. Massachusetts Institute of Technology, LIDS Colloquium, May 5, 1998. Topic: Simple universal portfolio selection.
  50. University of California, Santa Cruz, Baskin Center for Computer Engineering, October 1998. Topic: Approximation bounds for Gaussian mixtures.
  51. Lucent, Bell Laboratories, Murray Hill, New Jersey, March 1999. Topic: Approximation and estimation bounds for mixture density estimation.
  52. Stanford University, Department of Statistics, Probability Seminar, May 24, 1999. Topic: Information, martingales, Markov chains, convex projections, and the CLT.
  53. Stanford University, Department of Statistics, Statistics Seminar, May 25, 1999. Topic: Mixture density estimation.
  54. Rice University, Departments of Statistics and Electrical Engineering, November 4, 2000. Topics: Information theory and statistics -- best invariant predictive density estimators.
  55. University of Chicago, Department of Statistics, November 21, 2000. Topics: Information theory and statistics -- best invariant predictive density estimators.
  56. Yale University, Department of Computer Science, Alan J. Perlis Seminar, April 26, 2001. Topic: Neural nets, Gaussian mixtures, and statistical information theory.
  57. Brown University, Department of Applied Mathematics, May 9, 2001. Topic: I do not recall.
  58. University of Massachusetts at Lowell, Department of Mathematics, September 19, 2001. Topic: Mixture density estimation.
  59. University of California at Los Angeles, Department of Statistics, May 21, 2002. Topic: Nonlinear approximation, estimation, and neural nets (I do not recall the specific title).
  60. Columbia University, Department of Statistics, October 28, 2002. Topic: Information inequalities in probability and statistics.
  61. University of Georgia, Department of Statistics, November 26, 2002. Topic: Information inequalities in probability and statistics.
  62. University of Pennsylvania, Wharton School, October 29, 2003. Topic: Portfolio estimation for compounding wealth.
  63. University of North Carolina (in conjunction with Duke University), Departments of Statistics, November 3, 2004. Topic: Risk assessment and advantages of model mixing for regression.
  64. South Carolina, Department of Mathematics, April 7, 2005. IMI Distinquished Lecture. Topic: Statistical theory for nonlinear function approximation: neural nets, mixture models, and adaptive kernel machines.
  65. Helsinki University and Helsinki Institute of Information Technology, Helsinki, Finland. August 22-25, 2005. Two talks: (1) Statistical foundations and analysis of the minimum description length principle. (2) Consequences of MDL for neural nets and Gaussian mixtures.
  66. Princeton University, Department of Operations Research and Financial Engineering, October 4, 2005. Topic: Statistical perspectives on growth rate optimal portfolio estimation.
  67. IBM Research Laboratories, Yorktown Heights, September 22, 2006. Topic: Generalized entropy power inequalities and the central limit theorem.
  68. Purdue University, Department of Computer Science, February 26, 2007. Prestige Lecture Series on the Science of Information. Topic: The interplay of information theory, probability, and statistics.
  69. University of Illinois, Joint Seminar, Department of Statistics and Department of Electrical and Computer Engineering, February 27, 2007. Prestige Conference Series. Topic: The interplay of information theory and probability.
  70. Boston University, Department of Statistics, March 1, 2007. Prestige Conference Series. Topic: Information inequalities and the central limit theorem.
  71. University of California at Berkeley, Department of Computer Science, October 4, 2007. Topic: Fast and accurate greedy algorithm for L1 penalized least squares. Primarily presented by Cong Huang.
  72. Rutgers University, Department of Statistics, December 12, 2007. Topic: Fast and accurate L1 penalized least squares. Co-presented with Cong Huang.

Ph.D. Dissertations Supervised:

  • Bertrand S. Clarke (1989). Asymptotic Cumulative Risk and Bayes Risk under Entropy Loss, with Applications. University of Illinois at Urbana-Champaign. [Professor, University of British Columbia]

  • Chyong-Hwa Sheu (1989). Density Estimation with Kullback-Leibler Loss. University of Illinois at Urbana-Champaign.

  • Yuhong Yang (1996). Minimax Optimal Density Estimation. Yale University. [Was Assistant Professor at Iowa State University; Now Full Professor at University of Minnesota.]

  • Trent Qun Xie (1997). Minimax Coding and Prediction. Yale University. [Was at GE Capital, Inc., Fairfield, CT. Now Assistant Professor at Tsinghua Univ.]

  • Gerald Cheang (1998). Approximation and Estimation Bounds for Two Hidden-Layer Sigmoidal Networks. Yale University. [Now at Singapore Technical and Education University.]

  • Jason Cross (1999). Universal Portfolios for Target Classes having a Continuous Form of Dependence on Side Information. Yale University. [Was at an investment start-up firm with Myron Scholes in New York. Now runs an investment firm in Minneapolis, Minnesota.]

  • Jonathan Li (1999). Estimation of Mixture Models. Yale University. [Was at KPMG Financial Services, New York. Then at Stanford Research Institute, Palo Alto. Now at Radar Networks, Inc., San Franscisco]

  • Feng Liang (2002). Exact Minimax Predictive Density Estimation. Yale University. [On leave from Duke University. Now at University of Illinois at Urbana-Champaign]

  • Gilbert Leung (2004). Information Theory and Mixing Least Squares Regression. [At Qualcomm, first in San Diego, now near San Jose.]

  • Wei Qiu (2007). Maximum Wealth Portfolios. Yale University. [Now at J.P. Morgan Chase, Columbus, Ohio.]

  • Cong Huang (2008). Risk of Penalized Least Squares, Greedy Selection and L1-Penalization for Flexible Function Libraries. Yale University.

Research Interests: discussion of results and open problems

  • Entropy Power Inequalities and Central Limit Theorems:
    [Publications 3,26,29,53; Conference Presentations 2,60,68,72,77,83,88; Seminars 1,47,52,60,61,67,70.]

    Information theory tools involving entropy and Fisher information provide demonstration of a strengthened central limit theorem. One compares the distribution of the standardized sum of i.i.d. random variables to the standard normal distribution. Publication 3, building on earlier work by Larry Brown, shows the Kullback divergence (relative entropy distance) converges to zero if and only if it is eventually finite. Publication 26, with Oliver Johnson, shows it converges to zero at rate 1/n if the random variables have finite Poincare constant; also shown, in the same issue, by Artstein, Ball, Barthe, and Naor. Implications are given in publication 26 for risk efficiency of the best unbiased estimator of natural parameters in exponential families.
    Publications 29 and 53, with Mokshay Madiman, simplify proofs of monotonicity of information in the central limit theorem and extend the Shannon entropy power inequality to arbitrary collections of subset sums. Implications for distributed estimation of location parameters are being explored with Madiman, Abram Kagan, T. Yu in publication 33 and current work.

    Open Problem: Show the relative entropy distance converges to zero at rate 1/n assuming only finite initial Fisher information and finite third and/or fourth moments, rather than the infinite moments implied by a finite Poincare constant.
  • Entropy Rates, Likelihood Stabilization, and Inference for Dependent Processes:
    [Publications 2; Conference Presentations 3,77; Seminar 2.]

    Asymptotically there is agreement between sample information quantities and their expectations; likewise agreement between log likelihoods and their expectations. Building on the Shannon, McMillan, Brieman theory from the 1950s, which was for discrete ergodic processes, and the theory of Moy, Perez, and Kieffer, which established L1 covergence for (1/n) log-densities of absolutely continuous joint distributions of size n samples from stationary processes, publication 2, in 1985, established the long-conjectured convergence with probability 1; independently shown by Steven Orey in the same year. The conclusion also holds for log likelihood ratios, if the second measure is Markov with stationary transitions, while the governing first measure is not required to be Markov. This stability for log-likelihood ratios provides best error exponents for hypothesis tests. It also provides a step in publication 0, in demonstration of universal consistency properties of logical smoothing (a general Kolmogorov complexity based minimum description-length criterion) for all computable stationary ergodic processes.

    Open Problem: Provide an extension showing log likelihood ratio stabilization in the case that both measures are stationary and ergodic without Markov assumption. What weaker assumption on the second measure is appropriate? A word of caution: John Kieffer in the early 1970s gave a counterexample in which the (1/n) log-likelihood ratio is not convergent for a pair of stationary (but not ergodic) measures.
  • Foundations of Minimum Description Length Principle of Inference and Universal Data Compression: [Publications 0,7,10,15,16,18,23,27,40,41,55; Conference Presentations 1,4,8,10,12,14,26,32,39,43,45,52,58,59,62,66, 67,69,70,73,74; Seminars 10,13; Student Theses of Clarke, Xie, Li and Liang.]

    The minimum description length principle of statistical inference initiated by Jorma Rissanen, with related work by Wallace, has realizations via two-part codes, mixture codes, and predictive codes. Early statistical analysis is in publication 0. Two-part codes correspond to complexity penalized log-likelihood, for which risk analysis is in publications 0,10,41,55. Mixture codes (average case optimal and minimax optimal) correspond to universal data compression in information theory and to reference priors in statistics, for which analysis is in publications 0,7,15,16,18,23,27,31,38. Optimal predictive codes correspond to prequential inference and to posterior model averaging, for which analysis is in publications 7,18,19,22,23,27,36,38,41. These investigations show relationships in properties of penalized likelihoods, Bayes mixture procedures, and predictive procedures revealed through their minimum description length interpretations. Moreover, data compression performance and statistical risk properties are linked.
    For further literature on the minimum description length principle and where our work fits in this context, one may see the review by Barron, Jorma Rissanen, and Bin Yu (publication 18), and the book by Peter Grunwald (2007, MIT Press).

    Open Problems: It is the cumulative Kullback risk of predictive estimators, for sample sizes up to n, that is most directly related to redundancy of data compression, for which simple and clean bounds are available. (1) Show that the Kullback risk with sample size n (the last term in the sum) has corresponding performance or give a counterexample. (2) For two-stage codes (complexity penalized log likelihoods), the existing risk bounds in broad generality are for Hellinger divergence and related measures of distance. Show that such penalized procedures have corresponding performance for the stronger Kullback risk, comparable to the predictive Bayes procedures, or give a counterexample.
  • Statistical Risk Analysis for Penalized Criteria for Model Selection: [Publications 13,17,20,32,34,37,41,52,55; Conference Presentations 23,26,47,51,67,73,81,82,87; Seminars 18,20,36,37; Student Theses of Yang and Huang.]

    Desirable penalties for empirical criteria produce estimators with statistical risk shown to be related to the corresponding population tradeoff between accuracy and penalty. Conditions on the penalty for which such conclusions hold are in the cited publications, some of which is joint with Yuhong Yang, Lucien Birge, Pascal Massart, Gerald Cheang, or Cong Huang. The penalty conditions emphasize variable-complexity cover properties. Accordingly, one can adapt the structure of models (e.g. as in publications 43,44) as well the number and size of parameters. For models built from linear combinations of candidate terms from a library, implications for subset size penalties are in publications 17,20,32,34, for L1 penalties on coefficients in publications 32,41,55, and for combinations of such penalties in publications 13,32.
  • Statistical Risk Analysis for Bayes Procedures:
    [Publications 0,4,6,7,15,21,27,28,31,33,36,38,40,48,51,55; Two Technical Reports; Conference Presentations 30,36,43, 50,57,58,74,75,76,84; Seminars 4,5,6,12,45; Student Theses of Clarke and Leung.]

    Building on theory of Lorraine Schwarz from the early 1960s, necessary and sufficient conditions for consistency of posterior distributions is established in the 1988 technical report, formulated in terms of existence of uniformly consistent tests, parts of which are presented in the publications 6, 38 and 21, the latter with Mark Schervish and Larry Wasserman.
    Posterior predictive densities can be accurate even when posterior distributions on parameters do not necessarily concentrate well on good neighborhoods of the generative parameter. Quantification of prior probability of Kullback neighborhoods is sufficient to provide rates of convergence of predictive densities as shown in 36, 38, using cumulative Kullback risk. Detailed total Kullback risk analysis of Bayes predictive procedures is a consequence of the analysis of Bayes mixtures cited above.
    In a fixed-design Gaussian-error regression setting, simple information-theoretic bounds on risk of posterior model averaging is given with Gibert Leung in publication 28. These bounds are obtained there in the case of known error variance, with subsequent extension to the case of unknown error variance in work by Christophe Giraud and Yannick Baraud. A tool in these developments is the unbiased estimator of risk due to Stein. The cleanest risk bounds have weights for model averaging that are proportional to square roots of posterior weights. The risk bounds are in terms of optimal approximation error and subset size tradeoff, with a multiplicative constant of 1, which makes the conclusion for model mixing stronger in this setting than all known results for model selection.
    A fascinating general result following from publication 28 is that an unbiased estimator of the mean squared error of prediction of Bayes regression procedures is equal to RSS + 2V, the residual sum of squared errors of the fit plus twice the sum of posterior variances. This risk assessment is an attractive combination of accuracy of fit to training data plus posterior variability.

    Open Problem: Starting perhaps with the case of Gaussian distribution on the inputs to a regression, find extension of these conclusions to the case of random design, with estimation of the mean squared error of prediction at new points independently drawn the same distribution as the sample.
  • Optimal Rates of Function Estimation for Large Function Classes:
    [Publications 19, 22; Conference Presentations 41,42,48,55; Thesis of Yang.]

    For regression and density estimation, three quantities of interest for a function class are (1) the Wald minimax risk R_n of estimation from finite samples of size n with mean square error, Hellinger, or Kullback loss, (2) the Kolmogorov metric entropy rate H_n= H(epsilon)/n at a critical covering or packing radius epsilon_n, solving H(epsilon)/n = epsilon^2 and (3) the Shannon information capacity rate C_n of the channel that takes one from functions in the class to data of the indicated size (known in information theory to also equal the minimax redundancy rate of data compression). Beginning in the 1940s, an inequality of Fano was used to bound communication rates (log-cardinality of message sets divided by block length n) by information capacity, an essential step in establishing the capacity of communication channels. Versions of this inequality were borrowed into statistical investigations of lower bounds on minimax risk of function estimation by Pinsker, Hasminskii, and Birge in the late 70s and early 80s. The implication at that time was that if an optimal cover of a small diameter portion of the function space has an appropriately large order size, then that log-cardinality divided by n determines the minimax rate.
    In publication 22 with Yuhong Yang, we appealed to the original Fano inequality to show that, for any infinite-dimensional function class, the minimax risk is of the same order as the metric entropy rate at the critical epsilon_n. Thus to verify the minimax rate for a function class, or to upper and lower bound that rate one only needs to know upper and lower bounds on the metric entropy (one does not need to exhibit the more specialized cover property of small diameter subsets). Equally interesting was the approach of Haussler and Opper, also from the mid-90s, which established similar conclusions with new bounds on the information capacity.
    It may be of interest to discuss the implications of optimal rate theory in the model selection setting. For any complete sequence of basis functions and any decreasing sequence r_k tending zero polynomially fast, a corresponding function class is the approximation class of functions for which the projection onto the first k basis functions has squared L2 error not more than r_k for all k. Lorentz in the 1950s determined the metric entropy for such classes. The implication from publication 22 is that the minimax risk of these classes is of order min {r_k + k/n}, where the minimum is over choices of k, in concert with what is achieved by Bayes procedures as in publication 22 or with model selection rules that adapt k, as in publication 20. These estimators are adaptive in that the minimax rate is achieved simultaneously for all such approximation classes.
    There are some implications of optimal rate theory for subset selection. For a formulation of sparse approximation classes dictated by the accuracy of subset selection is also given in publication 22. The formulation there is idiosyncratic to facilitate appeal to the conclusion of Lorentz for a subsequence of allowed subsets, and does not completely match what would be desired for characterizing sparse approximation classes. Lower bounds on metric entropy for these or other formulations of sparse approximation classes can provide implications for optimal statistical rates for interpolation classes discussed in publication 32, section 6. Positive results are given there, in pub 32, providing estimators achieving rates achieved simultaneously for a range of interpolation classes for linear combinations of libraries of finite metric dimension. These rates are faster than previously obtained and it would be of interest to know whether they are minimax optimal.

    Open Problem: For a given library of candidate terms, generalize the formulation of sparse approximation classes to better match the setting of approximation by subsets of size k out of M and use it to bound the metric entropy of these classes and associated interpolation classes, and hence their minimax rates of estimation.
  • Greedy Algorithms for Subset Selection in Regression and Density Estimation, and L1 Penalty Optimization:
    [Publications 12,14,30,32,39,41,52,54; Conference Presentations 24,56,64,71,80,86,87; Seminars 50,53,56,62,64,66; Theses of Cheang, Li, and Huang.]

    Greedy algorithms are ubiquitous in a range of methods for function estimation and approximation including forward stepwise regression, projection pursuit regression, multiple additive splines, and basis pursuit for wavelets. In 1989 Lee Jones established a result (published in the Annals of Statistics in 1992) for a broad class of greedy procedures that include a relaxation factor in the update formulation, with a slight improvement in publication 12. This result shows that a target function representable as a linear combination of a library of bounded functions with L1 norm of coefficients not more than V is approximated by k steps of such greedy algorithms (that is by k terms of the library) with accuracy C/k where C is the square of V.
    To deal with noise and with targets that may be outside of specified convex hulls, extension of the greedy algorithm result was obtained in a lemma jointly with Wee-Sun Lee, Peter Bartlett, and Bob Williamson (in 1996 IEEE IT, see their acknowledgements), with further refinement in publication 30 with Albert Cohen, Wolfgang Dahmen, and Ron Devore. These results show that in fitting a function f, the squared L2 error of greedy algorithms is bounded by the minimum over all possible linear combinations g of the sum of approximation error ||f-g||^2 and C(g)/k where C(g)=4 V^2(g), with V(g) the L1 norm of coefficients in g. Moreover, with data-based adaptive selection of the number of terms k, a statistical risk is obtained that corresponds to min ||f-g||^2 + lambda V(g), the best tradeoff between approximation error and L1 norm of coefficients times a factor lambda equal to the square root of (log M)/n where M is the library size and n is the sample size.
    Recently my student Cong Huang (Thesis, Publication 32, Seminars 71,72, and Conference Presentation 87) showed that a slight variant of the relaxed greedy algorithm (which we call L1 penalized greedy pursuit -- LPGP), actually solves the L1 penalized least squares problem, with explicit control of the computational accuracy after k steps (through the C(g)/k term in the bound). Corresponding results for log density expansions, with Li, Huang, and Luo, are in publication 41 and for mixture density estimation, with Jonathan Li, are in his Thesis and in publication 39.

    Open Problem: A variant of Least Angle Regression (LARS) algorithm of Brad Efron, Trevor Hastie, Iain Johnstone, and Rob Tibshirani, also builds up the solution one term at a time, for a range of values of lambda, with final solution after a number of steps k said to be not more than the sample size. Relate LARS to the greedy algorithms, with inclusion of a relaxation factor if need be, to quantify its approximate accuracy with fewer numbers of steps.
  • Nonlinear Approximation and Estimation for High-dimensional Libraries of Functions including Neural Nets:
    [Publications 8,12,13,14,20,22,24,30,32,35,37,42,43,44,45,46,47,52,54; Conference Presentations 11,13,15,18,19,20, 21,22,23,24,25,27,28,31,33,34,35,36,38,40,44,49,53,54,56,64,78; Seminars 8,9,11,14,15,16,17,19,21,22,23,24,26,27, 29,30,31,32,34,35,39,40,41,42,56,64; Thesis of Cheang.]

    In dealing with adaptation and high-dimensionality, the early 1990s were a turning point in mathematical understanding of function estimation and approximation as will be explained. Long before that time, statistical practice, both in academics and in industry, revealed clear advantage to data-based adaptation of the structure as well choice of number of parameters of statistical models. The choice of subsets of linear combination in regression is a simple example of such adaptation, and the choice of structure of classification and regression trees, multiple layer neural nets and polynomial nets provide more elaborate examples (see publication 43 for a review).
    However, such statistical practice was very much at odds with the prevailing mathematical understanding of nonparametric function estimation and approximation theory. The tradition was to capture function regularity primarily by local smoothness, through presumption of Sobolev or Holder properties of derivatives. For these function classes, the statistical risk result showed that no advantage for procedures other than those based on linear strategies (fixed Kernel methods or linear projection methods onto fixed bases). The most relevant approximation theory results for these classes was the theory of Kolmogorov k-widths, expressing what fixed k terms provides the best linear space minimizing of the maximal error of L2 projection onto the linear span. In this linear theory, the choice of k terms is fixed, depending on the class, but not on the target function within the class. In contrast, for nonlinear approximation based on a given library of candidate terms, the choice of the best k terms from the library is allowed to depend on the target. Though such nonlinear k-width theory was being developed, when applied to the traditional Sobolev and Holder function classes, it did not reveal substantial advantage in performance.
    Belief that such local derivative regularity is what matters distanced the theory from statistical practice, discouraging the taking of general subsets of terms in regression, when the practitioner knows that such adaptation is advantageous. These disparities between statistical theory and practice were extenuated by the results that for functions of d variables the minimax optimal rates for the traditional classes severely degrade with dimension, leading to a conclusion in that theory that one needs exponentially large sample sizes as a function of the dimension.
    Two activities in the study of approximation and estimation of functions dramatically changed perspectives. One was the development of theory and practice of wavelets. Retaining the idea that regularity can be captured locally through behaviors of moduli of smoothness, it was recognized that such regularity need not be of the same type throughout the input space of the function. Certain Besov spaces capture such freedom and were shown to be equivalent to certain norm properties on coefficients in newly developed orthogonal wavelet expansions. Moreover, optimal approximation and estimation in these classes were shown to require target-based or data-based choices subsets of terms to include, which is a type of nonlinear approximation, rather than fixed linear projection. See books and papers by Ron DeVore and Yuly Makavoz and others from approximation theory, and the writings of this time of Donoho and Johnstone and others from mathematical statistics. Taking advantage of the orthogonality, for the low-dimensional spaces of two- and three-dimensional images and time evolutions of such, for which one can have an exhaustive filling of data-points, practical algorithms become available which bridged theory and practice in these settings.
    The other activity concerns approximation and estimation of responses for high-dimensional input data. In this setting the input data is sparse, typically modeled as a random sample from a distribution for which subsequent data will likewise be drawn, and popular models entertain linear combinations of nonlinearly parameterized terms, such as sinusoids, sigmoids, or multivariate splines with variable knots. I found myself in a whirlwind of discussion, as evidenced by the many conference and seminar requests. I argued that for high-dimensional function estimation, since local derivative regularity fails to capture what can be inferred from manageable data sizes, for such settings we should abandon those local smoothness notions as the primary source of regularity. Instead, more structural notions of regularity matter, related to complexity or to the suitability of the presumed forms for accurate representation of the target.
    A theory I put forward, beginning in 1991, is for functions f which when divided by a suitable V, are in the closure of the convex hull of a given library of candidate terms (the smallest such V being called the variation of f with respect to the library, in concert with the variation interpretation associated with the library of indicators of half-spaces, publication 47). Including terms of either sign in the library, V(f) is equivalent to an L1 norm on coefficients of representation. A Fourier norm is shown to be one device for controlling the variation for ridge type libraries such as sigmoids and sinusoids (publication 13). For functions of finite variation the squared norm of best k-term approximation and of statistical risk are bounded, respectively, by V^2 /k and by V / sqrt{n} to within logarithmic factors. These bounds give rates as exponents of (1/k) or of (1/n) which do not depend on the dimension.
    To achieve these rates under such structural assumptions, unlike the previously mentioned traditional function class settings, adaptation of the subset of k-terms (depending on the target) is essential. So we have a better match of such theory to what we know to be important in practice. For functions that do not have finite variation, but reside in interpolation classes between this L1-coefficient norm class and all of L2, the results of publications 30 and 32 give bounds on rate, which continue to be respectable even for libraries of large metric dimension. These function classes give settings in which, with suitable nonlinear approximation and estimation techniques, we are not subject to a curse of dimensionality.

    Open Problem: A tantalizing prospect of this theory, with infinite libraries of high but finite metric-dimensionality, is that accurate computation of the estimates is possible by use of suitable greedy algorithms discussed above. After all, each step of an algorithm involves optimization of only one term from the library. Empirically, investigators have been pleased with local optimization strategies for each new term in various settings, including the Friedman and Stuetze projection pursuit regression algorithm and various gradient-based neural net training algorithms. The hope of such local strategies is that the residual surface is cleaned by each term included so that ultimately the globally important terms are revealed, though that hope has not yet been confirmed on general mathematical grounds. Or one may study adaptive annealing methods for guided random search of suitable terms from the library (some steps in this direction are in reference 54). The question is to resolve for what forms of nonlinearly parameterized terms is such adaptive annealing rapidly convergent to a term of suitably high inner product with the residuals from the previous greedy algorithm step. Resolution of suitably formed computational questions is critical for both understanding and practice of high-dimensional function estimation.
  • Maximum Wealth Stock Indices and Growth Rate Optimal Portfolio Estimation:
    [Publications 5,23,25; Two technical reports (working papers for eventual publication); Conference Presentations 7,9,17,62,66,71,86; Seminars 48,49,62,66; Theses of Jason Cross and Wei Qiu.]

    Compounding over n periods of investment, rebalancing each period to a portfolio vector W which maintains fractions W1,W2,...,Wm of wealth in each of m assets (stocks, bonds, cash, etc), let (X1i,X2i,...,Xmi) be the vector of returns (ratios of price plus dividend at end of period i to the price at the start of period i), then the multi-period factor by which a unit of wealth is multiplied, denoted Sn(W) is the product for i=1,2,...,n of the portfolio returns Zi= W1 X1i + W2 X2i + ... + Wm Xmi. This multi-period wealth function Sn(W) is a log-concave function of the portfolio vector W. That is, the wealth Sn(W) is the exponential of a smooth concave function Yn(W) which is the sum of the log portfolio returns. As such it exhibits a shape as a function of W governed by Laplace approximation, peaked at a choice of rebalancing portfolio W which would have made the most wealth over that time period. Accordingly, in the report with Wei Qiu (as well as in his thesis), in which we take the periods of investment to be months to avoid over-exposure to commissions, we maintain that an index of the wealth in a market over a suitable period of months (e.g. a decade), appropriate for historical consideration, is the maximum of Sn(W) over choices of W. Examples are given where the market is taken to be the Nasdaq 100, the S&P 500, or all stocks (about 3000 in number) actively traded over an eleven year period accessed in the U. Penn, Wharton School data set (Wharton Research Data Services).
    We provide a relaxed greedy algorithm for fast computation of this maximum. The algorithm starts with the historically single best stock, and then at each algorithm step introduces a new stock which, in convex combination with the previously computed portfolio, yields the greatest increase in compounded wealth. It is proven that in k steps of the algorithm (that is with k stocks included) the error in computation of the maximum of Sn(W) is bounded by V/k in the exponent, where V depends on the volatility of the historically given returns. Thus the algorithm is guaranteed to converge to the portfolio vector W which historically had the highest returns. Our studies showed that only a handful of stocks are included in the hindsight optimal portfolio vector and that the associated maximal portfolio wealth factor (typically several hundred) is much higher than the best single stock over the indicated time period.
    Beginning in the first issue of Mathematical Finance, Tom Cover in 1990 introduced the notion of universal portfolios. These are updating portfolio strategies in which it is guaranteed that the wealth achieved is within a specified factor of the maximum of Sn(W). With Eric Ordentlick in the mid 90s, he refined the universal portfolio theory to show that the optimal factor, valid uniformly over all possible return sequences, exhibits a drop from the maximum by not more than a factor of order (1/n)^d where the power d=(m-1)/2. For markets in which the maximal exponent Yn(W) grows linearly with n with a non-vanishing rate, this shows that the hindsight optimal rate can be achieved universally without advance knowledge of what portfolio W will be best. In publication 23 with Trent Xie, we identified the precise asymptotics of the form of the wealth drop factor (C/n)^d including identification of the constant C, and showed that this theory reduces to the same mathematics as the identification of asymptotics of minimax pointwise regret for data compression and prediction also developed there.
    A continuous-time version of universal portfolios is developed in publication 25 as well as a version that allows target portfolios that use past returns in parameterized ways. An advantage of continuous-time is that for stock price processes of bounded empirical variation, the Sn(W) becomes an explicit Gaussian shaped function of W so that certain universal portfolios can be evaluated exactly by known Gaussian integral computations.
    An obstacle to realizing high gains by portfolio estimation strategies is that it is difficult to know in advance what subset of stocks is best. If one tries to be universal for hundreds of stocks, the worst case factor (1/n)^d will kill off typical gains from the maximum of Sn(W) over the time horizons in which the desirable portfolios remain stable. In the technical report with Wei Qiu, we develop a strategy in which we do not try to be universal for all of the stocks, but rather we mix across all possible subsets of three, four, or more of the stocks. By such mixture strategies we show growth rate optimality for portfolios of stocks of bounded volatility.
    Winner-take-all gambling markets (e.g. horse races) provide a setting in which the theory and practice of growth rate optimality is considerably simplified, as in the text by Cover and Thomas 2006. Briefly the wealth exponent of a gambler is expressible as the differences of Kullback divergences from empirical distributions of distributions associated with the odds-makers and the distribution specified by gambling proportions. The technical report with Jengfeng Yu shows that for single stocks, the presence of put or call options with strike prices at each possible level of the stock makes it possible for linear algebra transformations to convert the questions of portfolio choice (portfolios of cash, stock, and options on the stock) and option pricing to the simpler problems of gambling and odds on such gambles, where the events on which one gambles portions of ones wealth are the different possible outcomes of the stock price.

    Open Problem: For multiple stocks, basket options (that is, options on portfolios of stocks) are sufficient to convert the option portfolio problem and the option pricing problem to the associated gambling and odds-making problems, even in a discrete time setting. Use approximation results for linear combinations of indicators of half-spaces to resolve how many basket options are required to come appropriately close to the optimal growth exponent of the pure gambling version of the problem.

Grants and Fellowships:

  • National Science Foundation, Grant DMS-9505168, 1995-1998
  • Approximation, Estimation, and Computation Properties of Neural Nets and Related Parsimonious Models
       Co-Principal Investigators: Yuly Makavoz and Lee Jones, University of Massachusetts, Lowell.
  • National Science Foundation, Grant ECS-9410760, 1994-1997
  • Polynomial-Time Algorithms for Accurate Approximation and Estimation of Functions.
  • Office of Naval Research, Grant N00014-93-1-0085, 1992-1994
  •    Artificial Neural Nets: Approximation, Estimation, and Computation
  • Office of Naval Research, Grant N00014-89-J-1811, 1989-1992
  • Statistical Learning Networks.
  • Office of Naval Research, Grant N00014-86-K-0670, 1986-1989
  • An Information Theoretic Development of Probability and Statistical Inference.
  • NSF Postdoctoral Research Fellowship in Mathematical Sciences, 1986-1990.

Referee:

      IEEE Transactions on Information Theory
      Annals of Probability
      Annals of Statistics
      Statistical Science
      Journal of the American Statistical Association
      Canadian Journal of Statistics
      Annals of Institute of Statistical Mathematics
      Neural Networks
      Computational Learning Theory
      Neural Information Processing Systems
      Addison-Wesley Publishing Company
      Springer Verlag
      Wiley-Interscience
      National Security Agency
      National Science Foundation
      NSF site review for a center proposal (January 21-23, 2004)
      NSF Panelist (April 26-28, 2005; January 10-12, 2007).

Yale Departmental and Divisional Responsibilities:

  • Department of Statistics
  •    Director of Graduate Studies (Fall 1993 through Spring 1999)
        Acting Chair (Spring 1998).
        Chair (Spring 2001 - Fall 2006).
  • Division of Social Sciences
  • Senior Appointments Committee (Fall 1994 - Spring 1995, Fall 1999 - Spring 2000)
  • Program of Applied Mathematics
  •    Director of Undergraduate Studies (Fall 1999 - Spring 2001; Fall 2004 - Spring 2006)
       Senior Coordinator of Undergraduate Studies (Fall 2006 - Spring 2008)

Other Activities and Honors:
  • FAI free flight model glider designer and competitor, F1A class:
  •    National Champion in 1984, 1987, 1992, 2007.
        United States National Team Member at World Championships in 1995 and 2001
        (Top American finisher at World Championships in 2001).
        America's Cup Champion in 1998.

  • Trinity Evangelical Free Church, Woodbridge, CT. Board of Deacons 1998-2000 (Chairman 1999-2000).
    Missions Committee 1994-present. Adult Class Committee 1994-2001.