There is a well known and deep identification between the problems of universal (lossless) data compression and statistical estimation. In particular, the problem of finding a good code for compressing data whose distribution is unknown is pretty much the same as the problem of finding the probability distribution from which that data was generated. Csiszár and Shields [CS04] provide a nice survey of the implications, which include Rissanen’s powerful Minimum Description Length (MDL) principle.
One would like to have a theoretical framework for lossy source coding (where one is willing to tolerate a certain limited amount of distortion of the data in order to compress it) that treats it as a statistical problem too. Building on work of Kontoyiannis and Zhang [KZ02], we take this approach in joint work with Matthew Harrison and Ioannis Kontoyiannis [MHK04]. It is highly non-trivial to do this because the correspondence between codes and distributions is much more complicated than in the lossless case. We propose two methods for selecting efficient compression algorithms, based on lossy variants of the Maximum Likelihood and MDL principles. Their theoretical performance is analyzed, and it is shown under appropriate assumptions that although the Maximum Likelihood approach leads to overfitting problems as in the traditional setup, the MDL approach to universal lossy coding identifies the optimal model class of lossy codes.
In the nascent theory of distributed estimation using sensor networks, one wishes to characterize the fundamental limits of performing statistical tasks such as parameter estimation using a sensor network, and apply such characterizations to problems of cost or resource allocation. Several toy models for distributed estimation of a location parameter are introduced and analyzed in [MBKY08, MBKY10]. By ignoring communication, computation, and other constraints, these models allow one to study the central question of fundamental statistical limits without obfuscation.
Think of a fixed set of sources; then each sensor has access to some combination of data produced by a subset of sources (this combination may, for example, be a weighted sum). Thus any collection of sensors corresponds to a collection of subsets of the sources (a "configuration"). Our main results are inequalities that, under appropriate conditions, relate the minimax risk of the sensor corresponding to the set of all sources to those of the sensors corresponding to an arbitrary configuration. As consequences, we obtain some insights into sensor configuration design as well as resource allocation for our model.
Cores of cooperative games are ubiquitous in information theory, and arise most frequently in the characterization of fundamental limits in various scenarios involving multiple users. Examples include classical settings in network information theory such as Slepian-Wolf source coding and multiple access channels, classical settings in statistics such as robust hypothesis testing, and new settings at the intersection of networking and statistics such as distributed estimation problems for sensor networks. Cooperative game theory allows one to understand aspects of all of these problems from a fresh and unifying perspective that treats users as players in a game, sometimes leading to new insights. At the heart of these analyses are fundamental dualities that have been long studied in the context of cooperative games; for information theoretic purposes, these are dualities between information inequalities on the one hand and properties of rate, capacity or other resource allocation regions on the other. See [Mad08] for a review of cooperative games and the fundamental limits of multiuser scenarios from this perspective.
[CS04] | I. Csiszár and P. C. Shields. Information Theory and
Statistics: A Tutorial.
Foundations and Trends in Communications and Information Theory,
1(4):1–115, 2004. |
[KZ02] | I. Kontoyiannis and J. Zhang. Arbitrary Source Models and Bayesian
Codebooks in Rate-Distortion Theory.
IEEE Trans. Inform. Theory, 48:2276–2290, 2002. |
[Mad08] | M. Madiman. Cores of Cooperative Games in Information Theory.
EURASIP J. Wireless Comm. and Networking,
Special issue on "Theory and Applications in
Multiuser/Multiterminal Communications", no. 318704, 2008.
[arXiv]
[pdf/journal] |
[MBKY10] | M. Madiman, A. R. Barron, A. M. Kagan, and T. Yu.
Fundamental limits for distributed estimation of a
location parameter. Preprint, 2010. |
[MBKY08] | M. Madiman, A. R. Barron, A. M. Kagan and T. Yu:
"Minimax risks for distributed estimation of the background in a field
of noise sources". Proc. 2nd Intl. Workshop on Info. Theory
for Sensor Networks (WITS '08),
Santorini Island, Greece, June 2008.
[pdf] |
[MHK04] | M. Madiman, M. Harrison, and I. Kontoyiannis. Minimum
Description Length vs. Maximum Likelihood in Lossy Data
Compression. Proc. 2004 IEEE Int. Symp. Inform. Theory,
Chicago, Illinois, June-July 2004. [pdf] |
Read the relevant publications for more details, or go back to research summaries.