Man page - mlpack_decision_tree(1)

Packages contains this manual

Manual

mlpack_decision_tree

NAME
SYNOPSIS
DESCRIPTION
OPTIONAL INPUT OPTIONS
OPTIONAL OUTPUT OPTIONS
ADDITIONAL INFORMATION

NAME

mlpack_decision_tree - decision tree

SYNOPSIS

mlpack_decision_tree [ -m unknown ] [ -l unknown ] [ -D int ] [ -g double ] [ -n int ] [ -a bool ] [ -T string ] [ -L unknown ] [ -t string ] [ -V bool ] [ -w unknown ] [ -M unknown ] [ -p unknown ] [ -P unknown ] [ -h -v ]

DESCRIPTION

Train and evaluate using a decision tree. Given a dataset containing numeric or categorical features, and associated labels for each point in the dataset, this program can train a decision tree on that data.

The training set and associated labels are specified with the ’ --training_file ( -t )’ and ’ --labels_file ( -l )’ parameters, respectively. The labels should be in the range β€˜[0, num_classes - 1]β€˜. Optionally, if ’ --labels_file ( -l )’ is not specified, the labels are assumed to be the last dimension of the training dataset.

When a model is trained, the ’ --output_model_file ( -M )’ output parameter may be used to save the trained model. A model may be loaded for predictions with the ’ --input_model_file ( -m )’ parameter. The ’ --input_model_file ( -m )’ parameter may not be specified when the ’ --training_file ( -t )’ parameter is specified. The ’ --minimum_leaf_size ( -n )’ parameter specifies the minimum number of training points that must fall into each leaf for it to be split. The ’ --minimum_gain_split ( -g )’ parameter specifies the minimum gain that is needed for the node to split. The ’ --maximum_depth ( -D )’ parameter specifies the maximum depth of the tree. If ’ --print_training_accuracy ( -a )’ is specified, the training accuracy will be printed.

Test data may be specified with the ’ --test_file ( -T )’ parameter, and if performance numbers are desired for that test set, labels may be specified with the ’ --test_labels_file ( -L )’ parameter. Predictions for each test point may be saved via the ’ --predictions_file ( -p )’ output parameter. Class probabilities for each prediction may be saved with the ’ --probabilities_file ( -P )’ output parameter.

For example, to train a decision tree with a minimum leaf size of 20 on the dataset contained in ’data.csv’ with labels ’labels.csv’, saving the output model to ’tree.bin’ and printing the training error, one could call

$ mlpack_decision_tree --training_file data.arff --labels_file labels.csv --output_model_file tree.bin --minimum_leaf_size 20 --minimum_gain_split 0.001 --print_training_accuracy

Then, to use that model to classify points in ’test_set.csv’ and print the test error given the labels ’test_labels.csv’ using that model, while saving the predictions for each point to ’predictions.csv’, one could call

$ mlpack_decision_tree --input_model_file tree.bin --test_file test_set.arff --test_labels_file test_labels.csv --predictions_file predictions.csv

OPTIONAL INPUT OPTIONS

--help (-h) [ bool ]

Default help info.

--info [ string ]

Print help on a specific option. Default value ’’.

--input_model_file (-m) [ unknown ]

Pre-trained decision tree, to be used with test points. --labels_file ( -l ) [ unknown ] Training labels.

--maximum_depth (-D) [ int ]

Maximum depth of the tree (0 means no limit). Default value 0.

--minimum_gain_split (-g) [ double ]

Minimum gain for node splitting. Default value 1e-07.

--minimum_leaf_size (-n) [ int ]

Minimum number of points in a leaf. Default value 20.

--print_training_accuracy (-a) [ bool ]

Print the training accuracy.

--test_file (-T) [ string ]

Testing dataset (may be categorical).

--test_labels_file (-L) [ unknown ]

Test point labels, if accuracy calculation is desired.

--training_file (-t) [ string ]

Training dataset (may be categorical).

--verbose (-v) [ bool ]

Display informational messages and the full list of parameters and timers at the end of execution.

--version (-V) [ bool ]

Display the version of mlpack.

--weights_file (-w) [ unknown ]

The weight of labels

OPTIONAL OUTPUT OPTIONS

--output_model_file (-M) [ unknown ]

Output for trained decision tree.

--predictions_file (-p) [ unknown ]

Class predictions for each test point.

--probabilities_file (-P) [ unknown ]

Class probabilities for each test point.

ADDITIONAL INFORMATION

For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of mlpack.