Improving motif finder via a two-tiered significance analysis

Abstract:

Regulatory proteins bind to certain sites of the DNA to regulate the transcription of a protein, thus characterising these binding sites is of a great interest to the bioinformatics community. Motif finding problem is the problem of finding these binding sites among a set of co-regulated sequences.

With over 9000 unique users recorded in the first half of 2013, MEME is one of the most popular motif finding tools available. Reliable estimates of the statistical significance of motifs can greatly increase the usefulness of any motif finder. Currently MEME evaluates its EM generated candidate motifs using an extension of BLAST's E-value to the motif finding context. While the drawbacks of MEME's current significance evaluation was pointed out previously, there was no practical substitute suited for its needs, especially since MEME also relies on the E-value internally to rank competing candidate motifs.

We offer a two-tiered significance analysis that can replace the E-value in selecting the best candidate motif and in evaluating its overall statistical significance. We show that our new approach substantially improve and would also provide the user with a reliable significance analysis. In addition, for large input sets our new approach is in fact faster than the currently implemented E-value analysis.

Speaker

Emi Tanaka

Research Area

Affiliation

University of Wollongong

Date

Fri, 11/04/2014 - 4:00pm to 5:00pm

Venue

OMB-145, Old Main Building, UNSW Kensington Campus

Follow

Improving motif finder via a two-tiered significance analysis

Abstract: