Relative OVER-abundance of
ROVER is a tool for determining if one or more of a group of
transcription factors is likely to regulate a group of genes. It was
designed for use with promoters from groups of genes that are
suspected of being co-regulated, such as those from a microarray study.
compares two groups of promoters (a suspected co-regulated group and a
non-regulated group) by determining the relative
over-abundance of likely binding sites for a particular Transcription
Factor (TF) in one group versus the other. ROVER calculates the
significance of any over-abundance
of binding sites for each TF and reports a probability of its chance
occurrence. This can be
interpreted as the probability that a given TF regulates the group of
genes in question. Likely binding sites are found by looking for
high-scoring matches to a Position Specific Weight Matrix (PSSM),
which represents known binding sites for a transcription factor. In
addition to determining the significance of each TF, ROVER also
provides the subset of sequences likely to be regulated by each TF and
the specific significant binding sites. ROVER
available as a command-line C++ program for Linux/UNIX (download below). We hope to make a web interface
available in the near future.
ROVER expects three files as input:
We recommend obtaining promoter sequences from Promoser. PSSMs can be
obtained from JASPAR
- Promoter sequence file
- Background promoter sequence file
- PSSM file
JASPAR is an open source database, so we can provide a complete
version of JASPAR (Downloaded 12-15-03) formatted for ROVER:
Sample or Complete.
JASPAR is described in the following paper:
JASPAR: an open access database for eukaryotic transcription factor
Nucleic Acids Res. 2004 Jan; 32(1) Database Issue
Albin Sandelin, Wynand Alkema, Pär Engström, Wyeth Wasserman
and Boris Lenhard
You may need to format your promoter sequences
and/or PSSMs to fit ROVER's requirements:
Each file should be in "FASTA" format, where the first line of each
sequence or matrix starts with a ">" and includes an accession and
name. The following lines should contain the sequence or binding site
matrix. It is important that the accession for the gene or matrix be
separated from the name by a tab character. Here is a sequence file example.
Matrices have four columns and n rows for the numbers of A,C,G, and T,
respectively, in each of n
binding site positions.
Sequences can span multiple lines.
Take care to avoid blank lines in all input
We have had good success using 10-50 promoters in each promoter
file. ROVER is quite quick, so larger promoter sets
are possible, but may not be biologically relevant. Both promoter files
should contain an equal
number of promoters of approximately the same length.
Usage: rover [-f] [-X | -V | -C] [-P pvalue] [-p pvalue] -m matrix_file
-s promoter_file -b background_promoter_file
-f Ignore lower case letters in sequence representing filtered out,
-P Supply P-value cutoff for significant sequences (0.01 default)
-p Supply P-value cutoff for individual cis-elements (0.001 default)
-X XML output in CisML style (Default)
-V Verbose output. Print all significant sequences and hits for all
matrices as plain text
-C 'Clover' output: just pvalues for each matrix. No element
The default sequence significance P-value cutoff is 0.01. This option
only affects the output. It determines the cutoff for the overall
significance of a sequence (multiple hits or single high-scoring
The default individual cis-element significance cutoff is 0.001. This
works well for promoters that are each of length 1000. We recommend
adjusting this cutoff to approximately 1 / promoter length.
The default output is an XML format we have described called CisML. CisML files contain the
findings of ROVER as well as all information necessary to replicate a
rover run. Our CisML website provides simple methods and explanations
generating various reports from CisML.
Other simple text output options are available as well.
Current Version: 1.0
RedHat Linux 7-9 Compiled 2-29-04
ROVER was introduced as part of the CARRIE transcriptional regulatory
network inference tool. Please use the following reference when citing
Haverty, PM., Hansen, U., Weng, Z. (2004) Computational Inference of
Transcriptional Regulatory Networks from Expression Profiling and
Transcription Factor Binding Site Identification. Nucleic Acids
Research, Vol. 32, 179-188.
Questions, and Suggestions