Wednesday, February 19, 2003
10:00 amDragan Radulovic
Department of Statistics
Yale UniversitySTATQUEST, a statistical algorithm for rapid Protein Identification
We have developed a systematic analytical approach, termed PRISM, which permits routine, large-scale protein expression profiling of mammalian cells and tissues. PRISM combines subcellular fractionation, multidimensional liquid chromatography-tandem mass spectrometry-based protein shotgun sequencing. The key contribution, from statistical point of view is a newly develop computer algorithm STATQUEST designed to provide automatic and more rigorous estimate of accuracy than the commonly used software SEQUEST.
The output of SEQUEST is a series of putative protein matches and associated peptide scores, which include a cross-correlation score based on the observed spectral fit (XCorr), the normalized numerical difference between the top and second highest XCorr (DCn) and a preliminary ranking based on the number of matched ion peaks (RSp). A subjective combination of these scores as well as other factors, such as the charge of the precursor ion, the presence of tryptic termini (relevant in experiments where the peptides are generated by digestion with trypsin) and the number of peptides that map to a given protein, is typically used to evaluate the accuracy of each prediction.
The STATQUEST algorithm, on the other hand, uses a probabilistic method for determining the likelihood of every putative peptide match. The algorithm has been used as a means to rapidly identify thousands of
expressed mammalian proteins. The application of PRISM to adult mouse lung and liver resulted in the high-confidance identification of over 2100 unique proteins, the largest proteome study carried to date.