Bo Li (bli at cs dot wisc dot edu)
RSEM-EVAL is built off of RSEM. It is a reference-free de novo transcriptome assembly evaluator. This document will only cover the RSEM-EVAL only features. For the shared feature with RSEM, please refer to 'README_RSEM.md'.
To compile RSEM-EVAL, simply run
make
To install, simply put the rsem directory in your environment's PATH variable.
C++, Perl and R are required to be installed.
To take advantage of RSEM-EVAL's built-in support for the Bowtie alignment program, you must have Bowtie installed.
Please note that the RNA-Seq data set used to build the assembly should be exactly the same as the RNA-Seq data set for evaluating this assembly. In addition, currently RSEM-EVAL only supports single-end, fixed-length RNA-Seq reads.
RSEM-EVAL provides a script 'estimateParams' to estimate transcript length distribution from a set of transcript sequences. Transcripts can be from a closely related species to the orgaism whose transcriptome is sequenced. Its usage is:
estimateParams input.fasta [output_lengths_as_list.txt]
input.fasta contains the transcript sequences.
outputlengthsas_list.txt is optional. If set, the lengths of the sequences in 'input.fasta' will be writtent to this file, one number per line.
This script will produce a three by two matrix to the standard output. The first column gives the parameter names and the second column gives the parameter values. The three rows give values for r, p and \mu respectively. r and p are the parameters required for calculating the RSEM-EVAl score. \mu is the mean of the estimated distribution. The parameters are estimated using the method of moments.
To prepare the reference sequences for an assembly, you should run the 'rsem-prepare-reference' program. Run
rsem-prepare-reference --help
to get usage information. We suggest use '--no-polyA' option for assemblies in this step.
To calculate the RSEM-EVAL score, you should run the 'rsem-calculate-expression' program with '--calc-evaluation-score' option. Run
rsem-calculate-expression --help
to get usage information. In particular, please read '--calc-evaluation-score' related descriptions carefully ('--calc-evaluation-score' in the OPTION section, the last paragraph in the DESCRIPTION section, 'samplename.score, samplename.score.isoforms.results and sample_name.score.genes.results' in the OUTPUT section and the last example in the EXAMPLES section.
Suppose we have an organism A which transcript sequences are unknown. We built an assembly 'assembly1' with FASTA file '/data/assembly1.fa' from the single-end, 75bp RNA-Seq reads '/data/reads.fq'. We obtained the transcript sequences, '/data/related.fa', from a related species of A. We can calculate assembly1's RSEM-EVAL score as follows:
We first estimate the transcript length distribution parameters:
estimateParams /data/related.fa
Suppose the estimated r = 1.14944289859475 and p = 0.000485880136894647, we then use the following commands:
rsem-prepare-reference --bowtie-path /sw/bowtie --no-polyA /data/assembly1.fa /ref/assembly1
rsem-calculate-expression -p 8 --calc-evaluation-score 1.14944289859475 0.000485880136894647 75 0 no-bam-output /data/reads.fq /ref/assembly1 assembly1_rsem_eval
The RSEM-EVAL score can be found in 'assembly1rsemeval.score' and the contig impact scores can be found in 'assembly1rsemeval.score.isoforms.results'.
RSEM-EVAL is developed by Bo Li, with substaintial technical input from Colin Dewey.
Please refer to the acknowledgements section in 'README_RSEM.md'.
RSEM-EVAL is licensed under the GNU General Public License v3.