Man page - datalad-foreach-dataset(1)
Packages contains this manual
- datalad-download-url(1)
- datalad-no-annex(1)
- datalad-addurls(1)
- datalad-siblings(1)
- datalad-remove(1)
- datalad-drop(1)
- datalad-export-archive(1)
- datalad-export-archive-ora(1)
- datalad-create-sibling-gogs(1)
- datalad-push(1)
- datalad-create-sibling-gitea(1)
- datalad-copy-file(1)
- datalad-sshrun(1)
- datalad-subdatasets(1)
- datalad-check-dates(1)
- datalad-create-sibling-gitlab(1)
- datalad-clone(1)
- datalad-create-test-dataset(1)
- datalad-wtf(1)
- datalad-create-sibling-github(1)
- datalad-shell-completion(1)
- datalad-uninstall(1)
- datalad-create-sibling-gin(1)
- datalad-configuration(1)
- datalad-update(1)
- datalad-get(1)
- datalad(1)
- datalad-clean(1)
- datalad-create-sibling-ria(1)
- datalad-rerun(1)
- datalad-create(1)
- datalad-unlock(1)
- datalad-status(1)
- datalad-run(1)
- datalad-foreach-dataset(1)
- datalad-install(1)
- datalad-add-archive-content(1)
- datalad-create-sibling(1)
- datalad-run-procedure(1)
- datalad-save(1)
- datalad-diff(1)
- datalad-add-readme(1)
- datalad-export-to-figshare(1)
apt-get install datalad
Manual
datalad foreach-dataset
NAMESYNOPSIS
DESCRIPTION
Command format
Examples
OPTIONS
AUTHORS
NAME
datalad foreach-dataset - run a command or Python code on the dataset and/or each of its sub-datasets.
SYNOPSIS
|
datalad foreach-dataset [-h] [--cmd-type {auto|external|exec|eval}] [-d DATASET] [--state {present|absent|any}] [-r] [-R LEVELS] [--contains PATH] [--bottomup] [-s] [--output-streams {capture|pass-through|relpath}] [--chpwd {ds|pwd}] [--safe-to-consume {auto|all-subds-done|superds-done|always}] [-J NJOBS] [--version] ... |
DESCRIPTION
This command provides a convenience for the cases were no dedicated DataLad command is provided to operate across the hierarchy of datasets. It is very similar to âgit submodule foreachâ command with the following major differences
- by default (unless --subdatasets-only) it would include operation on the original dataset as well, - subdatasets could be traversed in bottom-up order, - can execute commands in parallel (see JOBS option), but would account for the order, e.g. in bottom-up order command is executed in super-dataset only after it is executed in all subdatasets.
Additional notes:
- for execution of "external" commands we use the environment used to execute external git and git-annex commands.
Command format
--cmd-type external: A few placeholders are supported in the command via Python format specification:
- "{pwd}" will be replaced with the full path of the current working directory. - "{ds}" and "{refds}" will provide instances of the dataset currently operated on and the reference "context" dataset which was provided via ââdatasetââ argument. - "{tmpdir}" will be replaced with the full path of a temporary directory.
Examples
Aggressively git clean all datasets, running 5 parallel jobs::
% datalad foreach-dataset -r -J 5 git clean -dfx
OPTIONS
COMMAND
command for execution. A leading â--â can be used to disambiguate this command from the preceding options to DataLad. For --cmd-type exec or eval only a single command argument (Python code) is supported.
-h , --help , --help-np
show this help message. --help-np forcefully disables the use of a pager for displaying the help message
--cmd-type {auto|external|exec|eval}
type of the command. EXTERNAL: to be run in a child process using datasetâs runner; âexecâ: Python source code to execute using âexec(), no value returned; âevalâ: Python source code to evaluate using âeval()â, return value is placed into âresultâ field. âautoâ: If used via Python API, and âcmdâ is a Python function, it will use âevalâ, and otherwise would assume âexternalâ. Constraints: value must be one of (âautoâ, âexternalâ, âexecâ, âevalâ) [Default: âautoâ]
-d DATASET , --dataset DATASET
specify the dataset to operate on. If no dataset is given, an attempt is made to identify the dataset based on the input and/or the current working directory. Constraints: Value must be a Dataset or a valid identifier of a Dataset (e.g. a path) or value must be NONE
--state {present|absent|any}
indicate which (sub)datasets to consider: either only locally present, absent, or any of those two kinds. Constraints: value must be one of (âpresentâ, âabsentâ, âanyâ) [Default: âpresentâ]
-r , --recursive
if set, recurse into potential subdatasets.
-R LEVELS, --recursion-limit LEVELS
limit recursion into subdatasets to the given number of levels. Constraints: value must be convertible to type âintâ or value must be NONE
--contains PATH
limit to the subdatasets containing the given path. If a root path of a subdataset is given, the last considered dataset will be the subdataset itself. This option can be given multiple times, in which case datasets that contain any of the given paths will be considered. Constraints: value must be a string or value must be NONE
--bottomup
whether to report subdatasets in bottom-up order along each branch in the dataset tree, and not top-down.
-s , --subdatasets-only
whether to exclude top level dataset. It is implied if a non-empty CONTAINS is used.
--output-streams
{capture|pass-through|relpath},
--o-s
{capture|pass-through|relpath}
ways to handle outputs. âcaptureâ and return outputs from âcmdâ in the record (âstdoutâ, âstderrâ); âpass-throughâ to the screen (and thus absent from returned record); prefix with ârelpathâ captured output (similar to like grep does) and write to stdout and stderr. In ârelpathâ, relative path is relative to the top of the dataset if DATASET is specified, and if not - relative to current directory. Constraints: value must be one of (âcaptureâ, âpass-throughâ, ârelpathâ) [Default: âpass-throughâ]
--chpwd
{ds|pwd}
--safe-to-consume
{auto|all-subds-done|superds-done|always}
Important only in the case of parallel (jobs greater than 1) execution. âall-subds-doneâ instructs to not consider superdataset until command finished execution in all subdatasets (it is the value in case of âautoâ if traversal is bottomup). âsuperds-doneâ instructs to not process subdatasets until command finished in the super-dataset (it is the value in case of âautoâ in traversal is not bottom up, which is the default). With âalwaysâ there is no constraint on either to execute in sub or super dataset. Constraints: value must be one of (âautoâ, âall-subds-doneâ, âsuperds-doneâ, âalwaysâ) [Default: âautoâ]
-J NJOBS, --jobs NJOBS
how many parallel jobs (where possible) to use. "auto" corresponds to the number defined by âdatalad.runtime.max-annex-jobsâ configuration item NOTE: This option can only parallelize input retrieval (get) and output recording (save). DataLad does NOT parallelize your scripts for you. Constraints: value must be convertible to type âintâ or value must be NONE or value must be one of (âautoâ,)
--version
show the module and its version which provides the command
AUTHORS
datalad is developed by The DataLad Team and Contributors <team@datalad.org>.