Given a set of DNA sequences that share a common function, ab initio cis-element search programs can be used to identify which motifs are the best conserved in the set. Specialties and references to these programs are:
Sequences may be entered in Fasta, raw, or GenBank format. Any non-alphabetic characters in the sequence will be ignored, and any alphabetic characters except A, C, G and T (uppercase or lowercase) will be converted to 'n' and excluded from matching motifs. If GenBank format is used, your program of choice will read and display any 'CDS' (protein-coding region) annotations. Limits: at most 50 sequences, of total length up to 100 kb.
For example GenBank accession numbers (e.g. NC_001669), 'accession.version' numbers (e.g. NC_001669.1), or GI numbers (e.g. 9628421).
Generic format ('|' means or, '[]' means optional) :
>[sequence_1_property]
[substring_1.1|from_1.1-to_1.1,motif_1.1_name[,substring_1.1_property]]
...
|
>[sequence_1_property]
<motif_1.1_name[,motif_1.1_property_for_sequence_1]
[substring_1.1.1|from_1.1.1-to_1.1.1[,substring_1.1.1_property]]
...
|
<motif_1_name[,motif_1_property]
>[motif_1_property_for_sequence_1]
[substring_1.1.1|from_1.1.1-to_1.1.1[,substring_1.1.1_property]]
...
Enjoy this mess for now, examples coming soon :)
| -a | minimum alignment width (1) |
| -b | maximum alignment width (10000) |
| -c | cooling factor (1) |
| -d | frequency of width-adjusting moves (1) |
| -l | ("ell") filter lowercase letters |
| -m | use modified Lam schedule (default = geometric schedule) |
| -n | end each run after this many iterations without improvement (10000) |
| -p | pseudocount weight (1.5) |
| -q | pretend residue abundances = 1/4 |
| -r | number of alignment runs (10) |
| -s | seed for random number generator (1) |
| -t | initial temperature (0.9) |
| -u | use uniform pseudocounts: each pseudocount = p/4 |
| -v | verbose: print suboptimal alignments |
| -z | turn off ZOOPS (force every sequence to participate in the alignment) |
| -1 | ("one") just examine forward strand (default = both strands) |
To save computing time, you can upload a file containing previously saved text output for visualization purposes, after you input the proper sequence information. Please assure the integrity of your input text file. You should either redirect command-line program output to a text file, or save text output at the end of MotifViz result page into a text file.
Please visit the following pages for directions on downloading currently-supported programs:
Internet Explorer 5.2+ is recommended for Mac users.
Form entries | Gene Regulation Hub | Suggestions to: Yutao Fu 07/27/06 14:00 EDT