Comet Download

Download Comet binary for Linux (Redhat 7.1)
Download Comet binary for Alpha (Compaq Tru64 UNIX V5.0A)
Download Comet binary for Sun (Solaris 8)
Download Comet binary for SGI / IRIX
Download Comet binary for Mac OS X (thanks to Eric Frangulian)

Don't forget to make the file executable by using chmod +x

Comet home

Instructions for Using Comet from the Command Line

Example usage:

comet -i myseqs.fa -m mymatrices -a 20 -o outfile

Options

-i [required]
Follow this option with the name of a file containing the sequences to be analyzed. This file should be in fasta format, eg:
>first_sequence
AGGTCGAG...
GTGGAAC...
>second_sequence
...
-m [required]

Use this option to supply the program with a file containing a list of nucleotide count matrices. Each matrix defines the DNA sequence motif of a cis-element. The file has the following format:

>first_motif
1 1
5 2 38 5
29 1 15 5
3 7 5 35
>second_motif
1 1
4 2 2 12
...
The first line of each matrix definition begins with the symbol > followed by a name for the motif. The second line, which is optional, specifies two weights for the motif: one for the + strand and the other for the - strand. These weights let you specify how often you expect each cis-element to occur on each strand in regulatory clusters. The weights are relative, so multiplying all the weights for all the motifs by a constant makes no difference. If in doubt, leave it out. The remaining lines contain counts of adenine, cytosine, guanine and thymine observed at each position in the cis-element, in a sample of cis-elements of this type.

Palindromes: for matrices that are exact complementary palindromes, there is no distinction between the + and - strand. Comet automatically detects exact complement palindromes, and assigns an overall weight for the motif that is the sum of the two numbers on the second line of the matrix description.

-a [optional]
Specifies the average distance expected between motifs in a cluster. The default is 35.
-o [optional]
Specifies the name of a file to write the output to. The default is to write output to the screen.
-e [optional]
Specifies an E-value threshold to supress output of clusters with greater E-values. The default is 10.
-w [optional]
Local abundances of A, C, G and T are counted in windows of size 2w+1. The default is 75.
-p [optional]
Number of pseudocounts to add to all entries in cis-element matrices. The default is 1.
-s [optional]
Specifies a file to write statistical information used as an intermediate step in calculating the E-values. This option is mainly for development purposes.

Known Limitations

The E-values will not be accurate when using a collection of cis-element matrices including: very similar matrices, a matrix that is almost a complementary palindrome, or a matrix with a high propensity for self-overlap, e.g. consensus sequence AAAAAA. It is recommended that very similar matrices be combined into a single matrix, and near-palindromic matrices be made exactly palindromic.