ROVER

Relative OVER-abundance of cis-elements

ROVER is a tool for determining if one or more of a group of transcription factors is likely to regulate a group of genes. It was designed for use with promoters from groups of genes that are suspected of being co-regulated, such as those from a microarray study. ROVER compares two groups of promoters (a suspected co-regulated group and a non-regulated group) by determining the relative over-abundance of likely binding sites for a particular Transcription Factor (TF) in one group versus the other. ROVER calculates the significance of any over-abundance of binding sites for each TF and reports a probability of its chance occurrence. This can be interpreted as the probability that a given TF regulates the group of genes in question. Likely binding sites are found by looking for high-scoring matches to a Position Specific Weight Matrix (PSSM), which represents known binding sites for a transcription factor. In addition to determining the significance of each TF, ROVER also provides the subset of sequences likely to be regulated by each TF and the specific significant binding sites. ROVER is available as a command-line Java program (download below). A web version of ROVER is also available as part of the MotifViz web site. There is also a C++ version, which is no longer being maintained.

Input

ROVER expects three files as input:

Promoter sequence file
Background promoter sequence file
PSSM file

We recommend obtaining promoter sequences from Promoser. PSSMs can be obtained from JASPAR or TRANSFAC.

JASPAR is an open source database, so we can provide a complete version of JASPAR (Downloaded 12-15-03) formatted for ROVER: Sample or Complete.

JASPAR is described in the following paper:

JASPAR: an open access database for eukaryotic transcription factor binding profiles
Nucleic Acids Res. 2004 Jan; 32(1) Database Issue
Albin Sandelin, Wynand Alkema, Pär Engström, Wyeth Wasserman and Boris Lenhard

You may need to format your promoter sequences and/or PSSMs to fit ROVER's requirements:

The first line of each sequence or matrix starts with a ">" and includes an accession and name. The following lines should contain the sequence or binding site matrix. It is important that the accession for the gene or matrix be separated from the name by a tab character.

Here is a sequence file example:

>YBL002W        HTB2
TACCCAATAGCTTGTTCAATTCATCATCATTTCTGATGGCCAATTGTAAATGTCTTGGAATAATTCTGGTTTTTTTGTTATCTCTAGCAGCATTACCAGCCAATTCTAAAATTTCAGCAGCCAAATATTCTAAGACAGCAGTTAGATAGACTGGAGCACCAGAACCAATTCTCTGGGCGTAGTTACCTCTTCTTAGCAATCTGTGCACTCTACCAACTGGGAATGTTAAACCAGCTTTAGCAGATCTAGATTGAGAAGCTTTAGCAGCTGAACCAGCTTTACCACCTTTACCACCGGACATTATATATTAAATTTGCTCTTGTTCTGTACTTTCCTAATTCTTATGTAAAAAGACAAGAATTTATGATACTATTTAATAACAAAAAACTACCTAAGAAAAGCATCATGCAGTCGAAATTGAAATCGAAAAGTAAAACTTTAACGGAACATGTTTGAAATTCTAAGAAAGCATACATCTTCATCCCTTATATATAGAGTTATGTTTGATATTAGTAGTCATGTTGTAATCTCTGGCCTAAGTATACGTAACGAAAATGGTAGCACGTCGCGTTTATGGCCCCCAGGTTAATGTGTTCTCTGAAATTCGCATCACTTTGAGAAATAATGGGAACACCTTACGCGTGAGCTGTGCCCACCGCTTCGCCTAATAAAGCGGTGTTCTCAAAATTTCTCCCCGTTTTCAGGATCACGAGCGCCATCTAGTTCTGGTAAAATCGCGCTTACAAGAACAAAGAAAAGAAACATCGCGTAATGCAACAGTGAGACACTTGCCGTCATATATAAGGTTTTGGATCAGTAACCGTTATTTGAGCATAACACAGGTTTTTAAATATATTATTATATATCATGGTATATGTGTAAAATTTTTTTGCTGACTGGTTTTGTTTATTTATTTAGCTTTTTAAAAATTTTACTTTCTTCTTGTTAATTTTTTCTGATTGCTCTATACTCAAACCAACAACAACTTACTCTACAACTA
>YDR311W        TFB1
TCTTTTATATGAAGCGGATTTGAACCAAAACCAGAGCCAACTTGTCGTTTTATATCAGAATCATCACTGACTGGTATGTCTGTGATGGATGGCAAAGCTTTAGCGTTCGCATCTGTATCTAGCTTCCTCAAACTATTAGCTTGATTTTGAGCACTGGTAAGTGCTAACGTATCTACGTCATCTTTGGGTCCAGACGGAAGTCTCTGTTCATTGGTTATGTTATCAGAAGGGGCTGTGGTGTTCTCAGACATCCCCGCAACAAACGAATTTTGTTAATTATGTATGAAACTTTTCGTTTGATCTCAATAATACCACTAGCGACTAAATTTTTATGATACTTAGCTACTTTAAACAAGTCCCTTGTGCTCTGTTTGCTGACACTTTTGATAAAATATGCCTGTGTATAATTCTTTTAGCAGTTTATTTCAAACACAAATGGTATTAAAAGGATAGATGAAAAAAAAAAAAAAAATTAAAGCCACTAGTAATGATACAATCGTGGTATCACAAGCGCTGAATGAAACAAGTGTGGCTATCTATAGCGGATGCAAGTGGAGAACTTGTGAATCCAAACTGAAATATTTTGCCATCATTTGTTGTCCTTTCCCTTTTCCATTCAGGAAAAAAAAAAAAAATTTGACGTCGCCGTCGCGTCGCAGTCATATAATTACAGCAATTTATCTTGTTGAACGACGCAAATTAATGGAAATTGTGACTTACATAGTAAGTATTAGTAAACGTAGTTAAGGCCACGTGGGAAAGATATGAAAGGAGTGTAAGTAATGGATATCGGTCTAACGAAAATGGAAACCAATCTTTAAAAATGATAGTATGATTCGACAGTAAACTAGAAAAGCCACAACCCGTGGGACATGATAAGGCTGCTCGTTTTTGACGCAATTTTTAGACAATACTGAAATTTAGCATAATAAGCTTTCCCAGTGAAAGTAATAATATTTAACCTAGGGTAGGGGTAGGGAAAAAATAAAAGTAAACCATA

and a matrix file example (from JASPAR):

>M00713 TBP
0 8 1 4 0 23 3 20 2
8 6 7 0 2 0 0 0 1
3 2 0 1 0 0 0 0 4
12 7 15 18 21 0 20 3 16
>M00728 ROX1
0 0 1 16 0 0 0 0 0
8 9 7 0 0 0 0 0 1
2 5 1 0 0 0 17 0 1
7 3 8 1 17 17 0 17 15

Matrices have four rows and n columns for the numbers of A,C,G, and T, respectively, in each binding site positions.
Sequences can span multiple lines.
Take care to avoid blank lines in all input files.

We have had good success using 10-50 promoters in each promoter file. ROVER is quite quick, so larger promoter sets are possible, but may not be biologically relevant. Both promoter files should contain an equal number of promoters of approximately the same length.

Options

Usage: java -Xmx 250m -jar rover.jar [-C] [-F] [-f] [-h] [-P pvalue] [-p pvalue]

 -C                           Pseudo-counts to add to each PWM cell.
                              Default is 0.375.
 -p                           Cutoff for single site P-value. Default is
                              0.001.
 -P                           Cutoff for whole sequence P-value. Default
                              is 0.01.
 -B                           File containing Fasta formatted background
                              sequences.
 -F,--flat_base_frequencies   ACGT have equal (or flat) background
                              frequencies.
 -M                           File containing PSSMs.
 -S                           File containing Fasta formatted sequences.
 -f,--filter                  Filter out lower case characters (masked
                              repeats).
 -h,--help                    Print help message.

The argument -Xmx250m tells java to let ROVER use 250Mb of memory. You can change 250m to another number to suit your system and data set.

The default sequence significance P-value cutoff is 0.01. This option only affects the output. It determines the cutoff for the overall significance of a sequence (multiple hits or single high-scoring hits).

The default individual cis-element significance cutoff is 0.001. This works well for promoters that are each of length 1000. We recommend adjusting this cutoff to approximately 1 / promoter length.

Output

The output is in an XML format we have described called CisML. CisML files contain the complete findings of ROVER as well as all information necessary to replicate a rover run. Our CisML website provides simple methods and explanations for generating various reports from CisML.

Download ROVER Executable

Java JAR (Tested with Java 1.4.2) Last Updated 7-11-2005

Citing ROVER

ROVER was introduced as part of the CARRIE transcriptional regulatory network inference tool. Please use the following reference when citing ROVER:

Haverty, PM., Hansen, U., Weng, Z. (2004) Computational Inference of Transcriptional Regulatory Networks from Expression Profiling and Transcription Factor Binding Site Identification. Nucleic Acids Research, Vol. 32, 179-188.
Abstract PDF

Contact Us

Comments, Questions, and Suggestions