Man page - apertium-tagger(1)

Packages contains this manual

Manual


APERTIUM-TAGGER (1) General Commands Manual APERTIUM-TAGGER (1)

NAME

apertium-tagger — part-of-speech tagger and trainer for Apertium

SYNOPSIS

apertium-tagger [ options ] -g serialized_tagger [ input [ output ]]
apertium-tagger
[ options ] -r iterations corpus serialized_tagger
apertium-tagger
[ options ] -s iterations dictionary corpus tagger_spec serialized_tagger tagged_corpus untagged_corpus
apertium-tagger
[ options ] -s 0 dictionary tagger_spec serialized_tagger tagged_corpus untagged_corpus
apertium-tagger
[ options ] -s 0 -u model serialized_tagger tagged_corpus
apertium-tagger
[ options ] -t iterations dictionary corpus tagger_spec serialized_tagger

DESCRIPTION

apertium-tagger is the application responsible for the apertium part-of-speech tagger training or tagging, depending on the calling options. This command only reads from the standard input if the option --tagger or -g is used.

MODES
-g
, --tagger

Tags input text by means of Viterbi algorithm.

-r n , --retrain n

Retrains the model with n additional Baum-Welch iterations (unsupervised). This option is incompatible with -u ( --unigram )

-s n , --supervised n

Initializes parameters against a hand-tagged text (supervised) through the maximum likelihood estimate method, then performs n iterations of the Baum-Welch training algorithm (unsupervised). The CRP argument can be omitted only when n = 0.

-t n , --train n

Initializes parameters through Kupiec’s method (unsupervised), then performs n iterations of the Baum-Welch training algorithm (unsupervised).

MODELS
-u
, --unigram=MODEL

use unigram algorithm MODEL from <https://coltekin.net/cagri/papers/trmorph-tools.pdf>

-w , --sliding-window

use the Light Sliding Window algorithm

-x , --perceptron

use the averaged perceptron algorithm

OPTIONS
-d
, --debug

Print error (if any) or debug messages while operating.

-e, --skip-on-error

Used with -xs to ignore certain types of errors with the training corpus

-f , --first

Used in conjunction with -g ( --tagger ) makes the tagger give all lexical forms of each word, with the chosen one in the first place (after the lemma)

-m , --mark

Mark disambiguated words.

-p , --show-superficial

Prints the superficial form of the word along side the lexical form in the output stream.

-z , --null-flush

Used in conjunction with -g ( --tagger ) to flush the output after getting each null character.

--help

Display a help message.

FILES

These are the kinds of files used with each option:

dictionary

Full expanded dictionary file

corpus

Training text corpus file

tagger_spec

Tagger specification file, in XML format

serialized_tagger

Tagger data file, built in the training and used while tagging

tagged_corpus

Hand-tagged text corpus

untagged_corpus

Untagged text corpus, morphological analysis of hand-tagged corpus to use both jointly with -s option

input

Input file, stdin by default

output

Output file, stdout by default

SEE ALSO

apertium (1), lt-comp (1), lt-expand (1), lt-proc (1)

COPYRIGHT

Copyright © 2005, 2006 Universitat d’Alacant / Universidad de Alicante. This is free software. You may redistribute copies of it under the terms of the GNU General Public License : https://www.gnu.org/licenses/gpl.html.

BUGS

Many... lurking in the dark and waiting for you! Apertium February 22, 2021 APERTIUM-TAGGER (1)