ROVER

Relative OVER-abundance of cis-elements

ROVER is a tool for determining if one or more of a group of transcription factors is likely to regulate a group of genes. It was designed for use with promoters from groups of genes that are suspected of being co-regulated, such as those from a microarray study. ROVER compares two groups of promoters (a suspected co-regulated group and a non-regulated group) by determining the relative over-abundance of likely binding sites for a particular Transcription Factor (TF) in one group versus the other. ROVER calculates the significance of any over-abundance of binding sites for each TF and reports a probability of its chance occurrence. This can be interpreted as the probability that a given TF regulates the group of genes in question. Likely binding sites are found by looking for high-scoring matches to a Position Specific Weight Matrix (PSSM), which represents known binding sites for a transcription factor. In addition to determining the significance of each TF, ROVER also provides the subset of sequences likely to be regulated by each TF and the specific significant binding sites. ROVER is available as a command-line C++ program for Linux/UNIX (download below). We hope to make a web interface available in the near future.

Input

ROVER expects three files as input:
  1. Promoter sequence file
  2. Background promoter sequence file
  3. PSSM file
We recommend obtaining promoter sequences from Promoser. PSSMs can be obtained from JASPAR or TRANSFAC.

JASPAR is an open source database, so we can provide a complete version of JASPAR (Downloaded 12-15-03) formatted for ROVER: Sample or Complete.

JASPAR is described in the following paper:

JASPAR: an open access database for eukaryotic transcription factor binding profiles
Nucleic Acids Res. 2004 Jan; 32(1) Database Issue
Albin Sandelin, Wynand Alkema, Pär Engström, Wyeth Wasserman and Boris Lenhard

You may need to format your promoter sequences and/or PSSMs to fit ROVER's requirements:

Each file should be in "FASTA" format, where the first line of each sequence or matrix starts with a ">" and includes an accession and name. The following lines should contain the sequence or binding site matrix. It is important that the accession for the gene or matrix be separated from the name by a tab character. Here is a sequence file example.

Matrices have four columns and n rows for the numbers of A,C,G, and T, respectively, in each of n binding site positions.
Sequences can span multiple lines.
Take care to avoid blank lines in all input files.

We have had good success using 10-50 promoters in each promoter file. ROVER is quite quick, so larger promoter sets are possible, but may not be biologically relevant. Both promoter files should contain an equal number of promoters of approximately the same length.

Options

Usage: rover [-f] [-X | -V | -C] [-P pvalue] [-p pvalue] -m matrix_file -s promoter_file -b background_promoter_file

-f Ignore lower case letters in sequence representing filtered out, common repeats
-P Supply P-value cutoff for significant sequences (0.01 default)
-p Supply P-value cutoff for individual cis-elements (0.001 default)
-X XML output in CisML style (Default)
-V Verbose output. Print all significant sequences and hits for all
matrices as plain text
 -C 'Clover' output: just pvalues for each matrix. No element locations

The default sequence significance P-value cutoff is 0.01. This option only affects the output. It determines the cutoff for the overall significance of a sequence (multiple hits or single high-scoring hits). The default individual cis-element significance cutoff is 0.001. This works well for promoters that are each of length 1000. We recommend adjusting this cutoff to approximately 1 / promoter length.

Output

The default output is an XML format we have described called CisML. CisML files contain the complete findings of ROVER as well as all information necessary to replicate a rover run. Our CisML website provides simple methods and explanations for generating various reports from CisML.

Other simple text output options are available as well.

Download ROVER Executable

Current Version: 1.0

RedHat Linux 7-9 Compiled 2-29-04

Citing ROVER

ROVER was introduced as part of the CARRIE transcriptional regulatory network inference tool. Please use the following reference when citing ROVER:

Haverty, PM., Hansen, U., Weng, Z. (2004) Computational Inference of Transcriptional Regulatory Networks from Expression Profiling and Transcription Factor Binding Site Identification. Nucleic Acids Research, Vol. 32, 179-188.
Abstract PDF

Contact Us

Comments, Questions, and Suggestions

Last Modified: