Ab initio Motif Discovery and Visualization

AIMotifViz: Ab Initio Motif Search and Visualization

Given a set of DNA sequences that share a common function, ab initio cis-element search programs can be used to identify which motifs are the best conserved in the set. Specialties and references to these programs are:

GLAM: Frith, M. C., Hansen, U., Spouge, J. L. & Weng, Z. (2004) Finding Functional Sequence Elements by Multiple Local Alignment Nucleic Acids Res. 2004 Jan 1;32(1):189-200.

Sequence Format

Sequences may be entered in Fasta, raw, or GenBank format. Any non-alphabetic characters in the sequence will be ignored, and any alphabetic characters except A, C, G and T (uppercase or lowercase) will be converted to 'n' and excluded from matching motifs. If GenBank format is used, your program of choice will read and display any 'CDS' (protein-coding region) annotations. Limits: at most 50 sequences, of total length up to 100 kb.

GenBank Identifiers

For example GenBank accession numbers (e.g. NC_001669), 'accession.version' numbers (e.g. NC_001669.1), or GI numbers (e.g. 9628421).

Motif Feature Format

Generic format ('|' means or, '[]' means optional) :

>[sequence_1_property]
[substring_1.1|from_1.1-to_1.1,motif_1.1_name[,substring_1.1_property]]
...
|
>[sequence_1_property]
<motif_1.1_name[,motif_1.1_property_for_sequence_1]
[substring_1.1.1|from_1.1.1-to_1.1.1[,substring_1.1.1_property]]
...
|
<motif_1_name[,motif_1_property]
>[motif_1_property_for_sequence_1]
[substring_1.1.1|from_1.1.1-to_1.1.1[,substring_1.1.1_property]]
...

Enjoy this mess for now, examples coming soon :)

Quick match site sequences

IUPAC_symbol_sequence_1[ IUPAC_symbol_sequence_2 ...]

IUPAC_symbol=A/C/G/T/R/Y/S/W/K/M/H/B/V/D/N/X
R=A/G Y=C/T S=C/G W=A/T K=G/T M=A/C B=C/G/T D=A/G/T H=A/C/T V=A/C/G N/X=A/C/G/T

GLAM options

-a	minimum alignment width (1)
-b	maximum alignment width (10000)
-c	cooling factor (1)
-d	frequency of width-adjusting moves (1)
-l	("ell") filter lowercase letters
-m	use modified Lam schedule (default = geometric schedule)
-n	end each run after this many iterations without improvement (10000)
-p	pseudocount weight (1.5)
-q	pretend residue abundances = 1/4
-r	number of alignment runs (10)
-s	seed for random number generator (1)
-t	initial temperature (0.9)
-u	use uniform pseudocounts: each pseudocount = p/4
-v	verbose: print suboptimal alignments
-z	turn off ZOOPS (force every sequence to participate in the alignment)
-1	("one") just examine forward strand (default = both strands)

Visualization only

To save computing time, you can upload a file containing previously saved text output for visualization purposes, after you input the proper sequence information. Please assure the integrity of your input text file. You should either redirect command-line program output to a text file, or save text output at the end of MotifViz result page into a text file.

Please visit the following pages for directions on downloading currently-supported programs:

GLAM -- http://zlab.bu.edu/glam/

Internet Explorer 5.2+ is recommended for Mac users.

Form entries | Gene Regulation Hub | Suggestions to: Yutao Fu 07/27/06 14:00 EDT

3. Set GLAM options:

Command-line options:
Excluded alignment (optional):
Display full sequence details. (Number of bases per line:)
Display consensus sequence logo.