Man page - rsem-generate-ngvector(1)
Packages contains this manual
Manual
RSEM-GENERATE-NGVECTOR
NAMESYNOPSIS
ARGUMENTS
OPTIONS
DESCRIPTION
OUTPUT
EXAMPLES
NAME
rsem-generate-ngvector - Create Ng vector for EBSeq based only on transcript sequences.
SYNOPSIS
rsem-generate-ngvector [options] input_fasta_file output_name
ARGUMENTS
input_fasta_file
The fasta file containing all reference transcripts. The transcripts must be in the same order as those in expression value files. Thus, âreference_name.transcripts.faâ generated by ârsem-prepare-referenceâ should be used.
output_name
The name of all output files. The Ng vector will be stored as âoutput_name.ngvecâ.
OPTIONS
-k <int>
k mer length. See description section. (Default: 25)
-h/--help
Show help information.
DESCRIPTION
This program generates the Ng vector required by EBSeq for isoform level differential expression analysis based on reference sequences only. EBSeq can take variance due to read mapping ambiguity into consideration by grouping isoforms with parent geneâs number of isoforms. However, for de novo assembled transcriptome, it is hard to obtain an accurate gene-isoform relationship. Instead, this program groups isoforms by using measures on read mappaing ambiguity directly. First, it calculates the âunmappabilityâ of each transcript. The âunmappabilityâ of a transcript is the ratio between the number of k mers with at least one perfect match to other transcripts and the total number of k mers of this transcript, where k is a parameter. Then, Ng vector is generated by applying Kmeans algorithm to the âunmappabilityâ values with number of clusters set as 3. ârsem-generate-ngvectorâ will make sure the mean âunmappabilityâ scores for clusters are in ascending order. All transcripts whose lengths are less than k are assigned to cluster 3.
If your reference is a de novo assembled transcript set, you should run ârsem-generate-ngvectorâ first. Then load the resulting âoutput_name.ngvecâ into R. For example, you can use
NgVec <- scan(file="output_name.ngvec", what=0, sep="\n")
. After that, replace âIsoNgTrunâ with âNgVecâ in the second line of section 3.2.5 (Page 10) of EBSeqâs vignette:
IsoEBres=EBTest(Data=IsoMat, NgVector=NgVec, ...)
This program only needs to run once per RSEM reference.
OUTPUT
output_name.ump
âunmappabilityâ scores for each transcript. This file contains two columns. The first column is transcript name and the second column is âunmappabilityâ score.
output_name.ngvec
Ng vector generated by this program.
EXAMPLES
Suppose the reference sequences file is â/ref/mouse_125/mouse_125.transcripts.faâ and we set the output_name as âmouse_125â:
rsem-generate-ngvector /ref/mouse_125/mouse_125.transcripts.fa mouse_125