Man page - mlpack_preprocess_split(1)
Packages contains this manual
- mlpack_fastmks(1)
- mlpack_mean_shift(1)
- mlpack_hmm_generate(1)
- mlpack_local_coordinate_coding(1)
- mlpack_sparse_coding(1)
- mlpack_preprocess_scale(1)
- mlpack_kmeans(1)
- mlpack_linear_svm(1)
- mlpack_preprocess_split(1)
- mlpack_softmax_regression(1)
- mlpack_hmm_train(1)
- mlpack_nca(1)
- mlpack_range_search(1)
- mlpack_radical(1)
- mlpack_gmm_generate(1)
- mlpack_cf(1)
- mlpack_random_forest(1)
- mlpack_lmnn(1)
- mlpack_gmm_probability(1)
- mlpack_emst(1)
- mlpack_dbscan(1)
- mlpack_nbc(1)
- mlpack_preprocess_one_hot_encoding(1)
- mlpack_lsh(1)
- mlpack_knn(1)
- mlpack_kde(1)
- mlpack_hoeffding_tree(1)
- mlpack_adaboost(1)
- mlpack_hmm_loglik(1)
- mlpack_nmf(1)
- mlpack_pca(1)
- mlpack_bayesian_linear_regression(1)
- mlpack_hmm_viterbi(1)
- mlpack_preprocess_describe(1)
- mlpack_decision_tree(1)
- mlpack_krann(1)
- mlpack_det(1)
- mlpack_lars(1)
- mlpack_preprocess_binarize(1)
- mlpack_logistic_regression(1)
- mlpack_gmm_train(1)
- mlpack_perceptron(1)
- mlpack_preprocess_imputer(1)
- mlpack_kernel_pca(1)
- mlpack_kfn(1)
- mlpack_linear_regression(1)
- mlpack_approx_kfn(1)
apt-get install mlpack-bin
Manual
mlpack_preprocess_split
NAMESYNOPSIS
DESCRIPTION
REQUIRED INPUT OPTIONS
OPTIONAL INPUT OPTIONS
OPTIONAL OUTPUT OPTIONS
ADDITIONAL INFORMATION
NAME
mlpack_preprocess_split - split data
SYNOPSIS
mlpack_preprocess_split -i unknown [ -I unknown ] [ -S bool ] [ -s int ] [ -z bool ] [ -r double ] [ -V bool ] [ -T unknown ] [ -L unknown ] [ -t unknown ] [ -l unknown ] [ -h -v ]
DESCRIPTION
This utility takes a dataset and optionally labels and splits them into a training set and a test set. Before the split, the points in the dataset are randomly reordered. The percentage of the dataset to be used as the test set can be specified with the β --test_ratio ( -r )β parameter; the default is 0.2 (20%).
The output training and test matrices may be saved with the β --training_file ( -t )β and β --test_file ( -T )β output parameters.
Optionally, labels can also be split along with the data by specifying the β --input_labels_file ( -I )β parameter. Splitting labels works the same way as splitting the data. The output training and test labels may be saved with the β --training_labels_file ( -l )β and β --test_labels_file ( -L )β output parameters, respectively.
So, a simple example where we want to split the dataset βX.csvβ into βX_train.csvβ and βX_test.csvβ with 60% of the data in the training set and 40% of the dataset in the test set, we could run
$ mlpack_preprocess_split --input_file X.csv --training_file X_train.csv --test_file X_test.csv --test_ratio 0.4
Also by default the dataset is shuffled and split; you can provide the β --no_shuffle ( -S )β option to avoid shuffling the data; an example to avoid shuffling of data is:
$ mlpack_preprocess_split --input_file X.csv --training_file X_train.csv --test_file X_test.csv --test_ratio 0.4 --no_shuffle
If we had a dataset βX.csvβ and associated labels βy.csvβ, and we wanted to split these into βX_train.csvβ, βy_train.csvβ, βX_test.csvβ, and βy_test.csvβ, with 30% of the data in the test set, we could run
$ mlpack_preprocess_split --input_file X.csv --input_labels_file y.csv --test_ratio 0.3 --training_file X_train.csv --training_labels_file y_train.csv --test_file X_test.csv --test_labels_file y_test.csv
To maintain the ratio of each class in the train and test sets, theβ --stratify_data ( -z )β option can be used.
$ mlpack_preprocess_split --input_file X.csv --training_file X_train.csv --test_file X_test.csv --test_ratio 0.4 --stratify_data
REQUIRED INPUT OPTIONS
--input_file (-i) [ unknown ]
Matrix containing data.
OPTIONAL INPUT OPTIONS
--help (-h) [ bool ]
Default help info.
--info [string]
Print help on a specific option. Default value ββ.
--input_labels_file (-I) [ unknown ]
Matrix containing labels.
--no_shuffle (-S) [ bool ]
Avoid shuffling the data before splitting.
--seed (-s) [ int ]
Random seed (0 for std::time (NULL)). Default value 0.
--stratify_data (-z) [ bool ]
Stratify the data according to labels
--test_ratio (-r) [ double ]
Ratio of test set; if not set,the ratio defaults to 0.2 Default value 0.2.
--verbose (-v) [ bool ]
Display informational messages and the full list of parameters and timers at the end of execution.
--version (-V) [ bool ]
Display the version of mlpack.
OPTIONAL OUTPUT OPTIONS
--test_file (-T) [ unknown ]
Matrix to save test data to.
--test_labels_file (-L) [ unknown ]
Matrix to save test labels to.
--training_file (-t) [ unknown ]
Matrix to save training data to.
--training_labels_file (-l) [ unknown ]
Matrix to save train labels to.
ADDITIONAL INFORMATION
For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of mlpack.