Department of Statistics

Seminar

**Source allocation and estimation with incomplete data**

Tommi Jaakkola

MIT EECS/AI Lab

Many estimation tasks involve multiple heterogeneous or incomplete

information sources. Modern classification problems, for example, have

to be solved in the presence of predominantly unlabeled samples.

Standard estimation algorithms in this context such as EM (or em)

reduce to solving a set of fixed point equations (consistency

conditions). Such algorithms are not stable, however, in the sense

that they can lead to a dramatic loss of accuracy with the inclusion

of incomplete observations (changes in the source allocation). We

develop a more controlled solution to this problem through homotopy

continuation, essentially evolving differential equations that govern

the evolution of fixed points at intermediate allocations of the

sources. We explicitly identify critical points along the resulting

paths to either increase the stability of estimation or to ensure a

significant departure from the initial source. We illustrate these

ideas both in classification tasks with predominantly unlabeled data

(text) as well as in the context of competitive min-max problems (DNA

sequence motif discovery).

This is joint work with Adrian Corduneanu.

*Seminar to be held in Room 107, 24 Hillhouse Avenue at 4:15 pm*