Man page - hfst-tokenise(1)
Packages contains this manual
- hfst-push-weights(1)
- hfst-subtract(1)
- hfst-pmatch(1)
- hfst-project(1)
- hfst-tokenize(1)
- hfst-concatenate(1)
- hfst-reweight(1)
- hfst-summarise(1)
- hfst-twolc(1)
- hfst-affix-guessify(1)
- hfst-txt2fst(1)
- hfst-lexc(1)
- hfst-disjunct(1)
- hfst-guess(1)
- hfst-xfst(1)
- hfst-determinize(1)
- hfst-minus(1)
- hfst-lookup(1)
- hfst-minimize(1)
- hfst-summarize(1)
- hfst-compose-intersect(1)
- hfst-minimise(1)
- hfst-remove-epsilons(1)
- hfst-proc2(1)
- hfst-optimized-lookup(1)
- hfst-invert(1)
- hfst-reverse(1)
- hfst-apertium-proc(1)
- hfst-union(1)
- hfst-proc(1)
- hfst-grep(1)
- hfst-split(1)
- hfst-intersect(1)
- hfst-edit-metadata(1)
- hfst-tail(1)
- hfst-format(1)
- hfst-multiply(1)
- hfst-expand-equivalences(1)
- hfst-determinise(1)
- hfst-prune-alphabet(1)
- hfst-substitute(1)
- hfst-tag(1)
- hfst-repeat(1)
- hfst-shuffle(1)
- hfst-sfstpl2fst(1)
- hfst-fst2fst(1)
- hfst-regexp2fst(1)
- hfst-conjunct(1)
- hfst-compose(1)
- hfst-guessify(1)
- hfst-fst2txt(1)
- hfst-pmatch2fst(1)
- hfst-expand(1)
- hfst-fst2strings(1)
- hfst-pair-test(1)
- hfst-calculate(1)
- hfst-head(1)
- hfst-strings2fst(1)
- hfst-name(1)
- hfst-traverse(1)
- hfst-reweight-tagger(1)
- hfst-optimised-lookup(1)
- hfst-tokenise(1)
- hfst-compare(1)
- hfst-info(1)
apt-get install hfst
Manual
HFST-TOKENIZE
NAMESYNOPSIS
DESCRIPTION
Common options:
REPORTING BUGS
COPYRIGHT
NAME
hfst-tokenize - =perform matching/lookup on text streams
SYNOPSIS
hfst-tokenize [ --segment | --xerox | --cg | --giella-cg ] [ OPTIONS ...] RULESET
DESCRIPTION
perform matching/lookup on text streams
Common options:
-h , --help
Print help message
-V , --version
Print version info
-v , --verbose
Print verbosely while processing
-q , --quiet
Only print fatal erros and requested output
-s , --silent
Alias of --quiet
-n , --newline
Newline as input separator (default is blank line)
-a , --print-all
Print nonmatching text
-w , --print-weight
Print weights (overrides earlier -W option)
-W , --no-weights
Donβt print weights (default; overrides earlier -w , or -w implied by -g , options)
-m , --tokenize-multichar Tokenize multicharacter symbols
(by default only one utf-8 character is tokenized at a time regardless of what is present in the alphabet)
-b , --beam = B
Output only analyses whose weight is within B from best result
-tS , --time-cutoff = S
Limit search after having used S seconds per input
-lN , --weight-classes = N
Output no more than N best weight classes (where analyses with equal weight constitute a class
-u , --unique
Remove duplicate analyses
-z , --segment
Segmenting / tokenization mode (default)
-i , --space-separated
Tokenization with one sentence per line, space-separated tokens
-x , --xerox
Xerox output
-c , --cg
Constraint Grammar output
-S , --superblanks
Ignore contents of unescaped [] (cf. apertium-destxt); flush on NUL
-g , --giella-cg
CG format used in Giella infrastructure (implies -w and -l2 , treats @PMATCH_INPUT_MARK@ as subreading separator, expects tags to be Multichar_symbols, flush on NUL)
-C --conllu
CoNLL-U format
-f , --finnpos
FinnPos output
-L , --visl
VISL input and output (implies -W , handles <s> as blocks and <STYLE> inline)
Use standard streams for input and output (for now).
REPORTING BUGS
Report bugs to <hfst-bugs@helsinki.fi> or directly to our bug tracker at: <https://github.com/hfst/hfst/issues>
hfst-tokenize
home page:
<https://github.com/hfst/hfst/wiki/HfstTokenize>
General help using HFST software:
<https://github.com/hfst/hfst/wiki>
COPYRIGHT
Copyright Β©
2017 University of Helsinki, License GPLv3: GNU GPL version
3 <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and
redistribute it. There is NO WARRANTY, to the extent
permitted by law.