Man page - spaced(1)
Packages contains this manual
Manual
SPACED
NAMESYNOPSIS
DESCRIPTION
OUTPUT
OPTIONS
COPYRIGHT
REFERENCES
BUGS
Reporting Bugs
NAME
spaced - alignment-free sequence comparison
SYNOPSIS
spaced [ -r ] [ -k INT ] [ -l INT ] [ -n INT ] [ -t INT ] [ -d TYPE ] [ -f FILE ] FILES ...
DESCRIPTION
Spaced Words is
a new approach to alignment-free sequence comparison.
While most alignment-free algorithms compare the
word-composition of
sequences, Spaced Words uses a pattern of care and
don’t care
positions. The occurrence of a spaced word in a sequence is
then
defined by the characters at the match positions only, while
the
characters at the don’t care positions are ignored
(this was originally
inspired by the PatternHunter algorithm for homology search
in
databases). Instead of comparing the frequencies of
contiguous words in
the input sequences, our new approach compares the
frequencies of the
spaced words according to the pre-defined pattern. An
information-theoretic distance measure is then used to
define pairwise
distances on the set of input sequences based on their
spaced-word
frequencies. The original version of our spaced-words
approach was
published in Boden et al.(2013).
OUTPUT
The output is a symmetrical distance matrix similar to PHYLIP format, with each entry representing divergence with a positive real number. A distance of zero means that two sequences are identical, whereas other values are estimates for the nucleotide substitution rate (Jukes-Cantor corrected).
OPTIONS
-o <file>
Print the distance matrix to the given file . Default is DMat .
-k <int>
Set the patterns weight. Default: 14.
-l <int>
Set don’t care positions for the used patterns. Default: 15.
-n <int>
Set the number of patterns. Default: 5.
-f <file>
Instead of generating new patterns, use read them from the given file.
-t <INT>
The number of threads to be
used; by default, 25 threads are used.
Multithreading is only available if
spaced
was
compiled with OpenMP support.
|
-r |
Skip comparison with the reverse complement. |
-d <type>
The distances can be compute with different measures. Available options are Euclidean ( EU ), Jensen-Shannon ( JS ), and evolutionary distance ( EV ). Default: EV.
|
-h |
Prints the synopsis and an explanation of available options. |
COPYRIGHT
Copyright ©
2016 Chris Leimeister
<chris.leimeister@stud.uni-goettingen.de> License
GPLv3+: GNU GPL version 3 or later.
This is free software: you are free to change and
redistribute it. There is NO WARRANTY, to the extent
permitted by law. The full license text is available at
<http://gnu.org/licenses/gpl.html>.
REFERENCES
1) C.-A.
Leimeister, M. Boden, S. Horwege, S. Lindner, B. Morgenstern
(2014). Fast alignment-free sequence comparison using
spaced-word frequencies, Bioinformatics
<http://bioinformatics.oxfordjournals.org/content/early/2014/04/03/bioinformatics.btu177>
2) S. Horwege, S. Linder, M. Boden, K. Hatje, M. Kollmar,
C.-A. Leimeister, B. Morgenstern (2014). Spaced words and
kmacs: fast alignment-free sequence comparison based on
inexact word matches, Nucleic Acids Research 42, W7-W11
<http://nar.oxfordjournals.org/content/42/W1/W7.abstract>
3) B. Morgenstern, B. Zhu, S. Horwege, C.-A Leimeister
(2015). Estimating evolutionary distances between genomic
sequences from spaced-word matches, Algorithms for Molecular
Biology 10,5
BUGS
Reporting Bugs
Please report bugs to <kloetzl@evolbio.mpg.de> or <chris.leimeister@stud.uni-goettingen.de>.