Monday, February 10, 2003 Haiyan Huang
Department of Biostatistics
Harvard School of Public HealthStatistical Methods for Identifying Transcription Factor Binding Sites
The completion of the genomes of model organisms represents just the
beginning of a long march toward in-depth understanding of biological
systems. One challenge in post-genomic research is the detection of
functional patterns from full-length genomic sequences. This talk focuses
on statistical methods in finding patterns with functional or structural
importance in biological sequences, in particular the identification of
transcription factor binding sites (TFBSs). Some of the underlying
mathematical theories will be discussed as well.TFBSs are often short and degenerate in sequence. Therefore they are often
described by position- specific score matrices (PSSMs), which are used to
score candidate TFBSs for their similarities to known binding sites. The
similarity scores generated by PSSMs are essential to the computational
prediction of single TFBSs or regulatory modules. We develop the Local
Markov Method (LMM), which provides local p-values as a more relia ble and
rigorous alternative. Applying LMM to large-scale known human binding site
sequences in situ, we show that compared to current popular methods, LMM
can reduce false positive errors by more than 50% without compromising
sensitivity.
Seminar to be held in Room 107, 24 Hillhouse Avenue at 4:15 pm