Man page - hfst-tokenize(1)

Packages contains this manual

Manual

HFST-TOKENIZE

NAME
SYNOPSIS
DESCRIPTION
Common options:
REPORTING BUGS
COPYRIGHT

NAME

hfst-tokenize - =perform matching/lookup on text streams

SYNOPSIS

hfst-tokenize [ --segment | --xerox | --cg | --giella-cg ] [ OPTIONS ...] RULESET

DESCRIPTION

perform matching/lookup on text streams

Common options:

-h , --help

Print help message

-V , --version

Print version info

-v , --verbose

Print verbosely while processing

-q , --quiet

Only print fatal erros and requested output

-s , --silent

Alias of --quiet

-n , --newline

Newline as input separator (default is blank line)

-a , --print-all

Print nonmatching text

-w , --print-weight

Print weights (overrides earlier -W option)

-W , --no-weights

Don’t print weights (default; overrides earlier -w , or -w implied by -g , options)

-m , --tokenize-multichar Tokenize multicharacter symbols

(by default only one utf-8 character is tokenized at a time regardless of what is present in the alphabet)

-b , --beam = B

Output only analyses whose weight is within B from best result

-tS , --time-cutoff = S

Limit search after having used S seconds per input

-lN , --weight-classes = N

Output no more than N best weight classes (where analyses with equal weight constitute a class

-u , --unique

Remove duplicate analyses

-z , --segment

Segmenting / tokenization mode (default)

-i , --space-separated

Tokenization with one sentence per line, space-separated tokens

-x , --xerox

Xerox output

-c , --cg

Constraint Grammar output

-S , --superblanks

Ignore contents of unescaped [] (cf. apertium-destxt); flush on NUL

-g , --giella-cg

CG format used in Giella infrastructure (implies -w and -l2 , treats @PMATCH_INPUT_MARK@ as subreading separator, expects tags to be Multichar_symbols, flush on NUL)

-C --conllu

CoNLL-U format

-f , --finnpos

FinnPos output

-L , --visl

VISL input and output (implies -W , handles <s> as blocks and <STYLE> inline)

Use standard streams for input and output (for now).

REPORTING BUGS

Report bugs to <hfst-bugs@helsinki.fi> or directly to our bug tracker at: <https://github.com/hfst/hfst/issues>

hfst-tokenize home page: <https://github.com/hfst/hfst/wiki/HfstTokenize>
General help using HFST software: <https://github.com/hfst/hfst/wiki>

COPYRIGHT

Copyright Β© 2017 University of Helsinki, License GPLv3: GNU GPL version 3 <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.