Yale University
Department of Statistics

Monday, April 22, 2002

 Source allocation and estimation with incomplete data

Tommi Jaakkola

Many estimation tasks involve multiple heterogeneous or incomplete
information sources. Modern classification problems, for example, have
to be solved in the presence of predominantly unlabeled samples.
Standard estimation algorithms in this context such as EM (or em)
reduce to solving a set of fixed point equations (consistency
conditions). Such algorithms are not stable, however, in the sense
that they can lead to a dramatic loss of accuracy with the inclusion
of incomplete observations (changes in the source allocation). We
develop a more controlled solution to this problem through homotopy
continuation, essentially evolving differential equations that govern
the evolution of fixed points at intermediate allocations of the
sources. We explicitly identify critical points along the resulting
paths to either increase the stability of estimation or to ensure a
significant departure from the initial source. We illustrate these
ideas both in classification tasks with predominantly unlabeled data
(text) as well as in the context of competitive min-max problems (DNA
sequence motif discovery).

This is joint work with Adrian Corduneanu.

Seminar to be held in Room 107, 24 Hillhouse Avenue at 4:15 pm