Man page - toil(1)
Packages contains this manual
Manual
TOIL
NAMEQUICKSTART EXAMPLES
Running a basic CWL workflow
Running a basic WDL workflow
Running a basic Python workflow
A (more) real-world example
Running the example
Describing the source code
Logging
Error Handling and Resuming Pipelines
Collecting Statistics
Launching a Toil Workflow in AWS
Running a CWL Workflow on AWS
Running a Workflow with Autoscaling - Cactus
CWL IN TOIL
RUNNING CWL WORKFLOWS
Running CWL Locally
Note for macOS + Docker + Toil
Detailed Usage Instructions
Extra Toil CWL Options
Running CWL in the Cloud
Running CWL workflows with InplaceUpdateRequirement
Toil & CWL Tips
WDL IN TOIL
RUNNING WDL WITH TOIL
Toil WDL Runner Options
Managing Workflow Logs
DEVELOPING A WDL WORKFLOW
Using the UCSC Genomics Institute Tutorial
Using the Official WDL tutorials
Using the Learn WDL Video Tutorials
WDL Specifications
WDL CONFORMANCE TESTING
INTRODUCTION
Job Store
File Job Store
Cloud Job Stores
Batch System
Provisioner
COMMANDLINE OPTIONS
The Config File
The Job Store
Commandline Options
Restart Option
Running Workflows with Services
Setting Options directly in a Python Workflow
TOIL UTILITIES
Stats Command
Running an Example
Displaying Stats
Overall Summary
Worker Summary
Job Breakdown
Example Cleanup
Status Command
Clean Command
Debug Job Command
Kill Command
TOIL DEBUGGING
Reading the Log
Finding Failed Jobs in the Jobstore
Running a Job Locally
Fetching Job Inputs
Interactively Investigating Running Jobs
Introspecting the Job Store
Stats and Status
Using a Python debugger
RUNNING IN THE CLOUD
Managing a Cluster of Virtual Machines (Provisioning)
Toil Cluster Utilities
Launch-Cluster Command
Ssh-Cluster Command
Rsync-Cluster Command
Destroy-Cluster Command
Storage (Toil jobStore)
CLOUD PLATFORMS
Running on Kubernetes
Preparing your Kubernetes environment
AWS Job Store for Kubernetes
Configuring Toil for your Kubernetes environment
Running workflows
Option 1: Running the Leader Inside Kubernetes
Monitoring and Debugging Kubernetes Jobs and Pods
When Things Go Wrong
Option 2: Running the Leader Outside Kubernetes
Running CWL Workflows
AppArmor and Singularity
Running in AWS
Preparing your AWS environment
AWS Job Store
Toil Provisioner
Details about Launching a Cluster in AWS
Static Provisioning
Uploading Workflows
Running a Workflow with Autoscaling
Preemptibility
Using MinIO and S3-Compatible object stores
In-Workflow Autoscaling with Mesos
Dashboard
Running in Google Compute Engine (GCE)
Preparing your Google environment
Google Job Store
Running a Workflow with Autoscaling
HPC ENVIRONMENTS
Running on Slurm
Slurm Tips
Standard Output/Error from Batch System Jobs
WORKFLOW EXECUTION SERVICE (WES)
Preparing your WES environment
Starting a WES server
Running the Server with docker-compose
Running on a Toil cluster
WES API Endpoints
Submitting a Workflow
Upload multiple files
Specify Toil options
Monitoring a Workflow
Checking the state
Getting the full logs
Canceling a run
DEVELOPING A PYTHON WORKFLOW
Scripting Quick Start
Job Basics
Invoking a Workflow
Specifying Commandline Arguments
Resuming a Workflow
Functions and Job Functions
Workflows with Multiple Jobs
Dynamic Job Creation
Promises
Promised Requirements
FileID
Managing files within a workflow
Staging of Files into the Job Store
Using Docker Containers in Toil
Services
Checkpoints
Encapsulation
Depending on Toil
Best Practices for Dockerizing Toil Workflows
TOIL CLASS API
JOB STORE API
TOIL JOB API
FunctionWrappingJob
JobFunctionWrappingJob
EncapsulatedJob
Promise
JOB METHODS API
JobDescription
JOB.RUNNER API
JOB.FILESTORE API
BATCH SYSTEM API
Batch System Environment Variables
Batch System API
JOB.SERVICE API
EXCEPTIONS API
RUNNING TESTS
Running Tests with pytest
Running Integration Tests
Test Environment Variables
Using Docker with Quay
Running Mesos Tests
DEVELOPING WITH DOCKER
Making Your Own Toil Docker Image
Running a Cluster Locally
MAINTAINER’S GUIDELINES
Naming Conventions
Pull Requests
Publishing a Release
Using Git Hooks
Adding Retries to a Function
PULL REQUEST CHECKLISTS
Reviewing Pull Requests
Merging Pull Requests
TOIL ARCHITECTURE
Jobs and JobDescriptions
Statistics and Logging
Optimizations
Read-only leader
Job chaining
Preemptable node support
Caching
Toil support for Common Workflow Language
MINIMUM AWS IAM PERMISSIONS
AUTO-DEPLOYMENT
Auto Deployment with Sibling Python Files
Auto-Deploying a Package Hierarchy
Relying on Shared Filesystems
Toil Appliance
ENVIRONMENT VARIABLES
API REFERENCE
toil
Submodules
toil.batchSystems
Submodules
toil.batchSystems.abstractBatchSystem
Attributes
Exceptions
Classes
Module Contents
toil.batchSystems.abstractGridEngineBatchSystem
Attributes
Exceptions
Classes
Module Contents
toil.batchSystems.awsBatch
Attributes
Classes
Module Contents
toil.batchSystems.cleanup_support
Attributes
Classes
Module Contents
toil.batchSystems.contained_executor
Attributes
Functions
Module Contents
toil.batchSystems.gridengine
Attributes
Classes
Module Contents
toil.batchSystems.htcondor
Attributes
Classes
Module Contents
toil.batchSystems.kubernetes
Attributes
Classes
Functions
Module Contents
toil.batchSystems.local_support
Attributes
Classes
Module Contents
toil.batchSystems.lsf
Attributes
Classes
Module Contents
toil.batchSystems.lsfHelper
Attributes
Functions
Module Contents
toil.batchSystems.mesos
Submodules
toil.batchSystems.mesos.batchSystem
Attributes
Classes
Module Contents
toil.batchSystems.mesos.conftest
Attributes
Module Contents
toil.batchSystems.mesos.executor
Attributes
Classes
Functions
Module Contents
toil.batchSystems.mesos.test
Attributes
Classes
Package Contents
Attributes
Classes
Package Contents
toil.batchSystems.options
Attributes
Classes
Functions
Module Contents
toil.batchSystems.registry
Attributes
Functions
Module Contents
toil.batchSystems.singleMachine
Attributes
Classes
Module Contents
toil.batchSystems.slurm
Attributes
Classes
Functions
Module Contents
toil.batchSystems.torque
Attributes
Classes
Module Contents
Exceptions
Package Contents
toil.bus
Attributes
Classes
Functions
Module Contents
toil.common
Attributes
Exceptions
Classes
Functions
Module Contents
toil.cwl
Submodules
toil.cwl.conftest
Attributes
Module Contents
toil.cwl.cwltoil
Attributes
Exceptions
Classes
Functions
Module Contents
toil.cwl.utils
Attributes
Exceptions
Functions
Module Contents
Functions
Package Contents
toil.deferred
Attributes
Classes
Module Contents
toil.exceptions
Attributes
Exceptions
Module Contents
toil.fileStores
Submodules
toil.fileStores.abstractFileStore
Attributes
Classes
Module Contents
toil.fileStores.cachingFileStore
Attributes
Exceptions
Classes
Module Contents
toil.fileStores.nonCachingFileStore
Attributes
Classes
Module Contents
Classes
Package Contents
toil.job
Attributes
Exceptions
Classes
Functions
Module Contents
toil.jobStores
Submodules
toil.jobStores.abstractJobStore
Attributes
Exceptions
Classes
Module Contents
toil.jobStores.aws
Submodules
toil.jobStores.aws.jobStore
Attributes
Exceptions
Classes
Module Contents
toil.jobStores.aws.utils
Attributes
Exceptions
Classes
Functions
Module Contents
toil.jobStores.conftest
Attributes
Module Contents
toil.jobStores.fileJobStore
Attributes
Classes
Module Contents
toil.jobStores.googleJobStore
Attributes
Classes
Functions
Module Contents
toil.jobStores.utils
Attributes
Exceptions
Classes
Functions
Module Contents
toil.leader
Attributes
Classes
Module Contents
toil.lib
Submodules
toil.lib.accelerators
Functions
Module Contents
toil.lib.aws
Submodules
toil.lib.aws.ami
Attributes
Exceptions
Functions
Module Contents
toil.lib.aws.iam
Attributes
Functions
Module Contents
toil.lib.aws.s3
Attributes
Functions
Module Contents
toil.lib.aws.session
Attributes
Classes
Functions
Module Contents
toil.lib.aws.utils
Attributes
Exceptions
Functions
Module Contents
Attributes
Functions
Package Contents
toil.lib.bioio
Functions
Module Contents
toil.lib.compatibility
Functions
Module Contents
toil.lib.conversions
Attributes
Functions
Module Contents
toil.lib.docker
Attributes
Functions
Module Contents
toil.lib.ec2
Attributes
Exceptions
Functions
Module Contents
toil.lib.ec2nodes
Attributes
Classes
Functions
Module Contents
toil.lib.encryption
Submodules
toil.lib.encryption.conftest
Attributes
Module Contents
toil.lib.exceptions
Exceptions
Classes
Functions
Module Contents
toil.lib.expando
Classes
Module Contents
toil.lib.ftp_utils
Attributes
Classes
Module Contents
toil.lib.generatedEC2Lists
Attributes
Module Contents
toil.lib.humanize
Attributes
Functions
Module Contents
toil.lib.integration
Attributes
Functions
Module Contents
toil.lib.io
Attributes
Classes
Functions
Module Contents
toil.lib.iterables
Attributes
Classes
Functions
Module Contents
toil.lib.memoize
Attributes
Functions
Module Contents
toil.lib.misc
Attributes
Exceptions
Functions
Module Contents
toil.lib.objects
Classes
Module Contents
toil.lib.resources
Classes
Functions
Module Contents
toil.lib.retry
Attributes
Classes
Functions
Module Contents
toil.lib.threading
Attributes
Classes
Functions
Module Contents
toil.lib.throttle
Classes
Module Contents
toil.options
Submodules
toil.options.common
Attributes
Functions
Module Contents
toil.options.cwl
Functions
Module Contents
toil.options.runner
Functions
Module Contents
toil.options.wdl
Functions
Module Contents
toil.provisioners
Submodules
toil.provisioners.abstractProvisioner
Attributes
Exceptions
Classes
Module Contents
toil.provisioners.aws
Submodules
toil.provisioners.aws.awsProvisioner
Attributes
Exceptions
Classes
Functions
Module Contents
Attributes
Functions
Package Contents
toil.provisioners.clusterScaler
Attributes
Exceptions
Classes
Functions
Module Contents
toil.provisioners.gceProvisioner
Attributes
Classes
Module Contents
toil.provisioners.node
Attributes
Classes
Module Contents
Attributes
Exceptions
Functions
Package Contents
toil.realtimeLogger
Attributes
Classes
Module Contents
toil.resource
Attributes
Exceptions
Classes
Module Contents
toil.server
Submodules
toil.server.api_spec
toil.server.app
Attributes
Functions
Module Contents
toil.server.celery_app
Attributes
Functions
Module Contents
toil.server.cli
Submodules
toil.server.cli.wes_cwl_runner
Attributes
Classes
Functions
Module Contents
toil.server.utils
Attributes
Classes
Functions
Module Contents
toil.server.wes
Submodules
toil.server.wes.abstract_backend
Attributes
Exceptions
Classes
Functions
Module Contents
toil.server.wes.amazon_wes_utils
Attributes
Classes
Functions
Module Contents
toil.server.wes.tasks
Attributes
Classes
Functions
Module Contents
toil.server.wes.toil_backend
Attributes
Classes
Module Contents
toil.server.wsgi_app
Classes
Functions
Module Contents
toil.serviceManager
Attributes
Classes
Module Contents
toil.statsAndLogging
Attributes
Classes
Functions
Module Contents
toil.test
Submodules
toil.test.batchSystems
Submodules
toil.test.batchSystems.batchSystemTest
Attributes
Classes
Functions
Module Contents
toil.test.batchSystems.batch_system_plugin_test
Attributes
Classes
Module Contents
toil.test.batchSystems.test_gridengine
Classes
Functions
Module Contents
toil.test.batchSystems.test_lsf_helper
Classes
Module Contents
toil.test.batchSystems.test_slurm
Classes
Functions
Module Contents
toil.test.cactus
Submodules
toil.test.cactus.test_cactus_integration
Classes
Module Contents
toil.test.cwl
Submodules
toil.test.cwl.conftest
Attributes
Module Contents
toil.test.cwl.cwlTest
Attributes
Classes
Functions
Module Contents
toil.test.docs
Submodules
toil.test.docs.scriptsTest
Attributes
Classes
Module Contents
toil.test.jobStores
Submodules
toil.test.jobStores.jobStoreTest
Attributes
Classes
Functions
Module Contents
toil.test.lib
Submodules
toil.test.lib.aws
Submodules
toil.test.lib.aws.test_iam
Attributes
Classes
Module Contents
toil.test.lib.aws.test_s3
Attributes
Classes
Module Contents
toil.test.lib.aws.test_utils
Attributes
Classes
Module Contents
toil.test.lib.dockerTest
Attributes
Classes
Module Contents
toil.test.lib.test_conversions
Attributes
Classes
Module Contents
toil.test.lib.test_ec2
Attributes
Classes
Module Contents
toil.test.lib.test_integration
Attributes
Classes
Module Contents
toil.test.lib.test_misc
Attributes
Classes
Module Contents
toil.test.mesos
Submodules
toil.test.mesos.MesosDataStructuresTest
Classes
Module Contents
toil.test.mesos.helloWorld
Attributes
Functions
Module Contents
toil.test.mesos.stress
Classes
Functions
Module Contents
toil.test.options
Submodules
toil.test.options.options
Classes
Module Contents
toil.test.provisioners
Submodules
toil.test.provisioners.aws
Submodules
toil.test.provisioners.aws.awsProvisionerTest
Attributes
Classes
Module Contents
toil.test.provisioners.clusterScalerTest
Attributes
Classes
Module Contents
toil.test.provisioners.clusterTest
Attributes
Classes
Module Contents
toil.test.provisioners.gceProvisionerTest
Attributes
Classes
Module Contents
toil.test.provisioners.provisionerTest
Attributes
Classes
Module Contents
toil.test.provisioners.restartScript
Attributes
Functions
Module Contents
toil.test.server
Submodules
toil.test.server.serverTest
Attributes
Classes
Module Contents
toil.test.sort
Submodules
toil.test.sort.restart_sort
Attributes
Functions
Module Contents
toil.test.sort.sort
Attributes
Functions
Module Contents
toil.test.sort.sortTest
Attributes
Classes
Functions
Module Contents
toil.test.src
Submodules
toil.test.src.autoDeploymentTest
Attributes
Classes
Module Contents
toil.test.src.busTest
Attributes
Classes
Functions
Module Contents
toil.test.src.checkpointTest
Classes
Module Contents
toil.test.src.deferredFunctionTest
Attributes
Classes
Module Contents
toil.test.src.dockerCheckTest
Classes
Module Contents
toil.test.src.environmentTest
Attributes
Classes
Functions
Module Contents
toil.test.src.fileStoreTest
Attributes
Classes
Module Contents
toil.test.src.helloWorldTest
Classes
Functions
Module Contents
toil.test.src.importExportFileTest
Classes
Module Contents
toil.test.src.jobDescriptionTest
Classes
Module Contents
toil.test.src.jobEncapsulationTest
Classes
Functions
Module Contents
toil.test.src.jobFileStoreTest
Attributes
Classes
Functions
Module Contents
toil.test.src.jobServiceTest
Attributes
Classes
Functions
Module Contents
toil.test.src.jobTest
Attributes
Classes
Functions
Module Contents
toil.test.src.miscTests
Attributes
Classes
Module Contents
toil.test.src.promisedRequirementTest
Attributes
Classes
Functions
Module Contents
toil.test.src.promisesTest
Classes
Functions
Module Contents
toil.test.src.realtimeLoggerTest
Classes
Module Contents
toil.test.src.regularLogTest
Attributes
Classes
Module Contents
toil.test.src.resourceTest
Classes
Functions
Module Contents
toil.test.src.restartDAGTest
Attributes
Classes
Functions
Module Contents
toil.test.src.resumabilityTest
Classes
Functions
Module Contents
toil.test.src.retainTempDirTest
Classes
Functions
Module Contents
toil.test.src.systemTest
Classes
Module Contents
toil.test.src.threadingTest
Attributes
Classes
Module Contents
toil.test.src.toilContextManagerTest
Classes
Functions
Module Contents
toil.test.src.userDefinedJobArgTypeTest
Classes
Functions
Module Contents
toil.test.src.workerTest
Classes
Module Contents
toil.test.utils
Submodules
toil.test.utils.toilDebugTest
Attributes
Classes
Functions
Module Contents
toil.test.utils.toilKillTest
Attributes
Classes
Module Contents
toil.test.utils.utilsTest
Attributes
Classes
Functions
Module Contents
toil.test.wdl
Submodules
toil.test.wdl.wdltoil_test
Attributes
Classes
Module Contents
toil.test.wdl.wdltoil_test_kubernetes
Classes
Module Contents
Attributes
Classes
Functions
Package Contents
toil.toilState
Attributes
Classes
Module Contents
toil.utils
Submodules
toil.utils.toilClean
Attributes
Functions
Module Contents
toil.utils.toilConfig
Attributes
Functions
Module Contents
toil.utils.toilDebugFile
Attributes
Functions
Module Contents
toil.utils.toilDebugJob
Attributes
Functions
Module Contents
toil.utils.toilDestroyCluster
Attributes
Functions
Module Contents
toil.utils.toilKill
Attributes
Functions
Module Contents
toil.utils.toilLaunchCluster
Attributes
Functions
Module Contents
toil.utils.toilMain
Functions
Module Contents
toil.utils.toilRsyncCluster
Attributes
Functions
Module Contents
toil.utils.toilServer
Attributes
Functions
Module Contents
toil.utils.toilSshCluster
Attributes
Functions
Module Contents
toil.utils.toilStats
Attributes
Classes
Functions
Module Contents
toil.utils.toilStatus
Attributes
Classes
Functions
Module Contents
toil.utils.toilUpdateEC2Instances
Attributes
Functions
Module Contents
toil.version
Attributes
Module Contents
toil.wdl
Submodules
toil.wdl.utils
Functions
Module Contents
toil.wdl.wdltoil
Attributes
Exceptions
Classes
Functions
Module Contents
toil.worker
Attributes
Classes
Functions
Module Contents
Attributes
Exceptions
Functions
Package Contents
AUTHOR
COPYRIGHT
NAME
toil - Toil Documentation
Toil is an open-source pure-Python workflow engine that lets people write better pipelines.
Check out our website for a comprehensive list of Toil's features and read our paper to learn what Toil can do in the real world. Please subscribe to our low-volume announce mailing list and feel free to also join us on GitHub and Gitter .
If using Toil for your research, please cite
Vivian, J., Rao, A. A., Nothaft, F. A., Ketchum, C., Armstrong, J., Novak, A., … Paten, B. (2017). Toil enables reproducible, open source, big biomedical data analyses. Nature Biotechnology, 35(4), 314–316. http://doi.org/10.1038/nbt.3772
QUICKSTART EXAMPLES
Running a basic CWL workflow
The Common Workflow Language (CWL) is an emerging standard for writing workflows that are portable across multiple workflow engines and platforms. Running CWL workflows using Toil is easy.
|
1. |
Copy and paste the following code block into example.cwl : |
cwlVersion:
v1.0
class: CommandLineTool
baseCommand: echo
stdout: output.txt
inputs:
message:
type: string
inputBinding:
position: 1
outputs:
output:
type: stdout
and this code into example-job.yaml :
message: Hello world!
|
2. |
To run the workflow simply enter |
$ toil-cwl-runner example.cwl example-job.yaml
Your output will be in output.txt :
$ cat
output.txt
Hello world!
Congratulations! You've run your first Toil workflow using the default Batch System , single_machine , and the default file job store (which was placed in a temporary directory for you by toil-cwl-runner ).
Toil uses batch systems to manage the jobs it creates.
The single_machine batch system is primarily used to prepare and debug workflows on a local machine. Once validated, try running them on a full-fledged batch system (see Batch System API ). Toil supports many different batch systems such as Kubernetes and Grid Engine; its versatility makes it easy to run your workflow in all kinds of places.
Toil's CWL runner is totally customizable! Run toil-cwl-runner --help to see a complete list of available options.
To learn more about CWL, see the CWL User Guide (from where this example was shamelessly borrowed). For information on using CWL with Toil see the section CWL in Toil . And for an example of CWL on an AWS cluster, have a look at Running a CWL Workflow on AWS .
Running a basic WDL workflow
The Workflow Description Language (WDL) is another emerging language for writing workflows that are portable across multiple workflow engines and platforms. Running WDL workflows using Toil is still in alpha, and currently experimental. Toil currently supports basic workflow syntax (see WDL in Toil for more details and examples). Here we go over running a basic WDL helloworld workflow.
|
1. |
Copy and paste the following code block into wdl-helloworld.wdl : |
workflow
write_simple_file {
call write_file
}
task write_file {
String message
command { echo ${message} > wdl-helloworld-output.txt }
output { File test = "wdl-helloworld-output.txt" }
}
and this code into wdl-helloworld.json :
{
"write_simple_file.write_file.message":
"Hello world!"
}
|
2. |
To run the workflow simply enter |
$ toil-wdl-runner wdl-helloworld.wdl wdl-helloworld.json
Your output will be in wdl-helloworld-output.txt :
$ cat
wdl-helloworld-output.txt
Hello world!
This will, like the CWL example above, use the single_machine batch system and an automatically-located file job store by default. You can customize Toil's execution of the workflow with command-line options; run toil-wdl-runner --help to learn about them.
To learn more about WDL in general, see the Terra WDL documentation . For more on using WDL in Toil, see WDL in Toil .
Running a basic Python workflow
In addition to workflow languages like CWL and WDL, Toil supports running workflows written against its Python API.
An example Toil Python workflow can be run with just three steps:
|
1. |
Install Toil (see Installation ) |
||
|
2. |
Copy and paste the following code block into a new file called helloWorld.py : |
from
toil.common import Toil
from toil.job import Job
def
helloWorld(message, memory="1G", cores=1,
disk="1G"):
return f"Hello, world!, here's a message:
{message}"
if __name__ ==
"__main__":
parser = Job.Runner.getDefaultArgumentParser()
options = parser.parse_args()
options.clean = "always"
with Toil(options) as toil:
output = toil.start(Job.wrapFn(helloWorld, "You did
it!"))
print(output)
|
3. |
Specify the name of the job store and run the workflow: |
$ python3 helloWorld.py file:my-job-store
For something beyond a "Hello, world!" example, refer to A (more) real-world example .
Toil's customization options are available in Python workflows. Run python3 helloWorld.py --help to see a complete list of available options.
A (more) real-world example
For a more detailed example and explanation, we've developed a sample pipeline that merge-sorts a temporary file. This is not supposed to be an efficient sorting program, rather a more fully worked example of what Toil is capable of.
Running the example
|
1. |
Download the example code |
|||
|
2. |
Run it with the default settings: |
$ python3 sort.py file:jobStore
The workflow created a file called sortedFile.txt in your current directory. Have a look at it and notice that it contains a whole lot of sorted lines!
This workflow does a smart merge sort on a file it generates, fileToSort.txt . The sort is smart because each step of the process---splitting the file into separate chunks, sorting these chunks, and merging them back together---is compartmentalized into a job . Each job can specify its own resource requirements and will only be run after the jobs it depends upon have run. Jobs without dependencies will be run in parallel.
NOTE:
Delete fileToSort.txt before moving on to #3. This example introduces options that specify dimensions for fileToSort.txt , if it does not already exist. If it exists, this workflow will use the existing file and the results will be the same as #2.
|
3. |
Run with custom options: |
$ python3
sort.py file:jobStore \
--numLines=5000 \
--lineLength=10 \
--overwriteOutput=True \
--workDir=/tmp/
Here we see that we can add our own options to a Toil Python workflow. As noted above, the first two options, --numLines and --lineLength , determine the number of lines and how many characters are in each line. --overwriteOutput causes the current contents of sortedFile.txt to be overwritten, if it already exists. The last option, --workDir , is an option built into Toil to specify where temporary files unique to a job are kept.
Describing the source code
To understand the details of what's going on inside. Let's start with the main() function. It looks like a lot of code, but don't worry---we'll break it down piece by piece.
def
main(options=None):
if not options:
# deal with command line arguments
parser = ArgumentParser()
Job.Runner.addToilOptions(parser)
parser.add_argument(
"--numLines",
default=defaultLines,
help="Number of lines in file to sort.",
type=int,
)
parser.add_argument(
"--lineLength",
default=defaultLineLen,
help="Length of lines in file to sort.",
type=int,
)
parser.add_argument("--fileToSort", help="The
file you wish to sort")
parser.add_argument("--outputFile",
help="Where the sorted output will go")
parser.add_argument(
"--overwriteOutput",
help="Write over the output file if it already
exists.",
default=True,
)
parser.add_argument(
"--N",
dest="N",
help="The threshold below which a serial sort function
is used to sort file. "
"All lines must of length less than or equal to N or
program will fail",
default=10000,
)
parser.add_argument(
"--downCheckpoints",
action="store_true",
help="If this option is set, the workflow will make
checkpoints on its way through"
'the recursive "down" part of the sort',
)
parser.add_argument(
"--sortMemory",
dest="sortMemory",
help="Memory for jobs that sort chunks of the
file.",
default=None,
)
parser.add_argument(
"--mergeMemory",
dest="mergeMemory",
help="Memory for jobs that collate results.",
default=None,
)
options =
parser.parse_args()
if not hasattr(options, "sortMemory") or not
options.sortMemory:
options.sortMemory = sortMemory
if not hasattr(options, "mergeMemory") or not
options.mergeMemory:
options.mergeMemory = sortMemory
# do some input
verification
sortedFileName = options.outputFile or
"sortedFile.txt"
if not options.overwriteOutput and
os.path.exists(sortedFileName):
print(
f"Output file {sortedFileName} already exists. "
f"Delete it to run the sort example again or use
--overwriteOutput=True"
)
exit()
fileName =
options.fileToSort
if options.fileToSort is None:
# make the file ourselves
fileName = "fileToSort.txt"
if os.path.exists(fileName):
print(f"Sorting existing file: {fileName}")
else:
print(
f"No sort file specified. Generating one automatically
called: {fileName}."
)
makeFileToSort(
fileName=fileName, lines=options.numLines,
lineLen=options.lineLength
)
else:
if not os.path.exists(options.fileToSort):
raise RuntimeError("File to sort does not exist:
%s" % options.fileToSort)
if
int(options.N) <= 0:
raise RuntimeError("Invalid value of N: %s" %
options.N)
# Now we are
ready to run
with Toil(options) as workflow:
sortedFileURL = "file://" +
os.path.abspath(sortedFileName)
if not workflow.options.restart:
sortFileURL = "file://" +
os.path.abspath(fileName)
sortFileID = workflow.importFile(sortFileURL)
sortedFileID = workflow.start(
Job.wrapJobFn(
setup,
sortFileID,
int(options.N),
options.downCheckpoints,
options=options,
memory=sortMemory,
)
)
else:
sortedFileID = workflow.restart()
workflow.exportFile(sortedFileID, sortedFileURL)
First we make a parser to process command line arguments using the - argparse module. It's important that we add the call to Job.Runner.addToilOptions() to initialize our parser with all of Toil's default options. Then we add the command line arguments unique to this workflow, and parse the input. The help message listed with the arguments should give you a pretty good idea of what they can do.
Next we do a little bit of verification of the input arguments. The option --fileToSort allows you to specify a file that needs to be sorted. If this option isn't given, it's here that we make our own file with the call to makeFileToSort() .
Finally we come to the context manager that initializes the workflow. We create a path to the input file prepended with 'file://' as per the documentation for toil.common.Toil() when staging a file that is stored locally. Notice that we have to check whether or not the workflow is restarting so that we don't import the file more than once. Finally we can kick off the workflow by calling toil.common.Toil.start() on the job setup . When the workflow ends we capture its output (the sorted file's fileID) and use that in toil.common.Toil.exportFile() to move the sorted file from the job store back into "userland".
Next let's look at the job that begins the actual workflow, setup .
def setup(job,
inputFile, N, downCheckpoints, options):
"""
Sets up the sort.
Returns the FileID of the sorted file
"""
RealtimeLogger.info("Starting the merge sort")
return job.addChildJobFn(
down,
inputFile,
N,
"root",
downCheckpoints,
options=options,
preemptible=True,
memory=sortMemory,
).rv()
setup really only does two things. First it writes to the logs using Job.log() and then calls addChildJobFn() . Child jobs run directly after the current job. This function turns the 'job function' down into an actual job and passes in the inputs including an optional resource requirement, memory . The job doesn't actually get run until the call to Job.rv() . Once the job down finishes, its output is returned here.
Now we can look at what down does.
def down(job,
inputFileStoreID, N, path, downCheckpoints, options,
memory=sortMemory):
"""
Input is a file, a subdivision size N, and a path in the
hierarchy of jobs.
If the range is larger than a threshold N the range is
divided recursively and
a follow on job is then created which merges back the
results else
the file is sorted and placed in the output.
"""
RealtimeLogger.info("Down job starting: %s" % path)
# Read the file
inputFile = job.fileStore.readGlobalFile(inputFileStoreID,
cache=False)
length = os.path.getsize(inputFile)
if length > N:
# We will subdivide the file
RealtimeLogger.critical(
"Splitting file: %s of size: %s" %
(inputFileStoreID, length)
)
# Split the file into two copies
midPoint = getMidPoint(inputFile, 0, length)
t1 = job.fileStore.getLocalTempFile()
with open(t1, "w") as fH:
fH.write(copySubRangeOfFile(inputFile, 0, midPoint + 1))
t2 = job.fileStore.getLocalTempFile()
with open(t2, "w") as fH:
fH.write(copySubRangeOfFile(inputFile, midPoint + 1,
length))
# Call down recursively. By giving the rv() of the two jobs
as inputs to the follow-on job, up,
# we communicate the dependency without hindering
concurrency.
result = job.addFollowOnJobFn(
up,
job.addChildJobFn(
down,
job.fileStore.writeGlobalFile(t1),
N,
path + "/0",
downCheckpoints,
checkpoint=downCheckpoints,
options=options,
preemptible=True,
memory=options.sortMemory,
).rv(),
job.addChildJobFn(
down,
job.fileStore.writeGlobalFile(t2),
N,
path + "/1",
downCheckpoints,
checkpoint=downCheckpoints,
options=options,
preemptible=True,
memory=options.mergeMemory,
).rv(),
path + "/up",
preemptible=True,
options=options,
memory=options.sortMemory,
).rv()
else:
# We can sort this bit of the file
RealtimeLogger.critical(
"Sorting file: %s of size: %s" %
(inputFileStoreID, length)
)
# Sort the copy and write back to the fileStore
shutil.copyfile(inputFile, inputFile + ".sort")
sort(inputFile + ".sort")
result = job.fileStore.writeGlobalFile(inputFile +
".sort")
RealtimeLogger.info("Down
job finished: %s" % path)
return result
Down is the recursive part of the workflow. First we read the file into the local filestore by calling job.fileStore.readGlobalFile() . This puts a copy of the file in the temp directory for this particular job. This storage will disappear once this job ends. For a detailed explanation of the filestore, job store, and their interfaces have a look at Managing files within a workflow .
Next down checks the base case of the recursion: is the length of the input file less than N (remember N was an option we added to the workflow in main )? In the base case, we just sort the file, and return the file ID of this new sorted file.
If the base case fails, then the file is split into two new tempFiles using job.fileStore.getLocalTempFile() and the helper function copySubRangeOfFile . Finally we add a follow on Job up with job.addFollowOnJobFn() . We've already seen child jobs. A follow-on Job is a job that runs after the current job and all of its children (and their children and follow-ons) have completed. Using a follow-on makes sense because up is responsible for merging the files together and we don't want to merge the files together until we know they are sorted. Again, the return value of the follow-on job is requested using Job.rv() .
Looking at up
def up(job,
inputFileID1, inputFileID2, path, options,
memory=sortMemory):
"""
Merges the two files and places them in the output.
"""
RealtimeLogger.info("Up job starting: %s" % path)
with
job.fileStore.writeGlobalFileStream() as (fileHandle,
outputFileStoreID):
fileHandle = codecs.getwriter("utf-8")(fileHandle)
with job.fileStore.readGlobalFileStream(inputFileID1) as
inputFileHandle1:
inputFileHandle1 =
codecs.getreader("utf-8")(inputFileHandle1)
with job.fileStore.readGlobalFileStream(inputFileID2) as
inputFileHandle2:
inputFileHandle2 =
codecs.getreader("utf-8")(inputFileHandle2)
RealtimeLogger.info(
"Merging %s and %s to %s"
% (inputFileID1, inputFileID2, outputFileStoreID)
)
merge(inputFileHandle1, inputFileHandle2, fileHandle)
# Cleanup up the input files - these deletes will occur
after the completion is successful.
job.fileStore.deleteGlobalFile(inputFileID1)
job.fileStore.deleteGlobalFile(inputFileID2)
RealtimeLogger.info("Up job finished: %s" % path)
return outputFileStoreID
we see that the two input files are merged together and the output is written to a new file using job.fileStore.writeGlobalFileStream() . After a little cleanup, the output file is returned.
Once the final up finishes and all of the rv() promises are fulfilled, main receives the sorted file's ID which it uses in exportFile to send it to the user.
There are other things in this example that we didn't go over such as Checkpoints and the details of much of the Toil Class API .
At the end of the script the lines
if __name__ ==
'__main__'
main()
are included to ensure that the main function is only run once in the '__main__' process invoked by you, the user. In Toil terms, by invoking the script you created the leader process in which the main() function is run. A worker process is a separate process whose sole purpose is to host the execution of one or more jobs defined in that script. In any Toil workflow there is always one leader process, and potentially many worker processes.
When using the single-machine batch system (the default), the worker processes will be running on the same machine as the leader process. With full-fledged batch systems like Kubernetes the worker processes will typically be started on separate machines. The boilerplate ensures that the pipeline is only started once---on the leader---but not when its job functions are imported and executed on the individual workers.
Typing python3 sort.py --help will show the complete list of arguments for the workflow which includes both Toil's and ones defined inside sort.py . A complete explanation of Toil's arguments can be found in Commandline Options .
Logging
By default, Toil logs a lot of information related to the current environment in addition to messages from the batch system and jobs. This can be configured with the --logLevel flag. For example, to only log CRITICAL level messages to the screen:
$ python3
sort.py file:jobStore \
--logLevel=critical \
--overwriteOutput=True
This hides most of the information we get from the Toil run. For more detail, we can run the pipeline with --logLevel=debug to see a comprehensive output. For more information, see Commandline Options .
Error Handling and Resuming Pipelines
With Toil, you can recover gracefully from a bug in your pipeline without losing any progress from successfully completed jobs. To demonstrate this, let's add a bug to our example code to see how Toil handles a failure and how we can resume a pipeline after that happens. Add a bad assertion at line 52 of the example (the first line of down() ):
def down(job,
inputFileStoreID, N, downCheckpoints, memory=sortMemory):
...
assert 1 == 2, "Test error!"
When we run the pipeline, Toil will show a detailed failure log with a traceback:
$ python3
sort.py file:jobStore
...
---TOIL WORKER OUTPUT LOG---
...
m/j/jobonrSMP Traceback (most recent call last):
m/j/jobonrSMP File "toil/src/toil/worker.py", line
340, in main
m/j/jobonrSMP job._runner(jobGraph=jobGraph,
jobStore=jobStore, fileStore=fileStore)
m/j/jobonrSMP File "toil/src/toil/job.py", line
1270, in _runner
m/j/jobonrSMP returnValues = self._run(jobGraph, fileStore)
m/j/jobonrSMP File "toil/src/toil/job.py", line
1217, in _run
m/j/jobonrSMP return self.run(fileStore)
m/j/jobonrSMP File "toil/src/toil/job.py", line
1383, in run
m/j/jobonrSMP rValue = userFunction(*((self,) +
tuple(self._args)), **self._kwargs)
m/j/jobonrSMP File "toil/example.py", line 30, in
down
m/j/jobonrSMP assert 1 == 2, "Test error!"
m/j/jobonrSMP AssertionError: Test error!
If we try and run the pipeline again, Toil will give us an error message saying that a job store of the same name already exists. By default, in the event of a failure, the job store is preserved so that the workflow can be restarted, starting from the previously failed jobs. We can restart the pipeline by running
$ python3
sort.py file:jobStore \
--restart \
--overwriteOutput=True
We can also change the number of times Toil will attempt to retry a failed job:
$ python3
sort.py file:jobStore \
--retryCount 2 \
--restart \
--overwriteOutput=True
You'll now see Toil attempt to rerun the failed job until it runs out of tries. --retryCount is useful for non-systemic errors, like downloading a file that may experience a sporadic interruption, or some other non-deterministic failure.
To successfully restart our pipeline, we can edit our script to comment out line 30, or remove it, and then run
$ python3
sort.py file:jobStore \
--restart \
--overwriteOutput=True
The pipeline will run successfully, and the job store will be removed on the pipeline's completion.
Collecting Statistics
Please see the Status Command section for more on gathering runtime and resource info on jobs.
Launching a Toil Workflow in AWS
After having installed the aws extra for Toil during the Installation and set up AWS (see Preparing your AWS environment ), the user can run the basic helloWorld.py script ( Running a basic Python workflow ) on a VM in AWS just by modifying the run command.
Note that when running in AWS, users can either run the workflow on a single instance or run it on a cluster (which is running across multiple containers on multiple AWS instances). For more information on running Toil workflows on a cluster, see Running in AWS .
Also! Remember to use the Destroy-Cluster Command command when finished to destroy the cluster! Otherwise things may not be cleaned up properly.
|
1. |
Launch a cluster in AWS using the Launch-Cluster Command command: |
$ toil
launch-cluster <cluster-name> \
--clusterType kubernetes \
--keyPairName <AWS-key-pair-name> \
--leaderNodeType t2.medium \
--nodeTypes t2.medium -w 1 \
--zone us-west-2a
The arguments keyPairName , leaderNodeType , and zone are required to launch a cluster.
|
2. |
Copy helloWorld.py to the /tmp directory on the leader node using the Rsync-Cluster Command command: |
$ toil rsync-cluster --zone us-west-2a <cluster-name> helloWorld.py :/tmp
Note that the command requires defining the file to copy as well as the target location on the cluster leader node.
|
3. |
Login to the cluster leader node using the Ssh-Cluster Command command: |
$ toil ssh-cluster --zone us-west-2a <cluster-name>
Note that this command will log you in as the root user.
|
4. |
Run the workflow on the cluster: |
$ python3 /tmp/helloWorld.py aws:us-west-2:my-S3-bucket
In this particular case, we create an S3 bucket called my-S3-bucket in the us-west-2 availability zone to store intermediate job results.
Along with some other INFO log messages, you should get the following output in your terminal window: Hello, world!, here's a message: You did it! .
|
5. |
Exit from the SSH connection. |
$ exit
|
6. |
Use the Destroy-Cluster Command command to destroy the cluster: |
$ toil destroy-cluster --zone us-west-2a <cluster-name>
Note that this command will destroy the cluster leader node and any resources created to run the job, including the S3 bucket.
Running a CWL Workflow on AWS
After having installed the aws and cwl extras for Toil during the Installation and set up AWS (see Preparing your AWS environment ), the user can run a CWL workflow with Toil on AWS.
Also! Remember to use the Destroy-Cluster Command command when finished to destroy the cluster! Otherwise things may not be cleaned up properly.
|
1. |
First launch a node in AWS using the Launch-Cluster Command command: |
$ toil
launch-cluster <cluster-name> \
--clusterType kubernetes \
--keyPairName <AWS-key-pair-name> \
--leaderNodeType t2.medium \
--nodeTypes t2.medium -w 1 \
--zone us-west-2a
|
2. |
Copy example.cwl and example-job.yaml from the CWL example to the node using the Rsync-Cluster Command command: |
toil
rsync-cluster --zone us-west-2a <cluster-name>
example.cwl :/tmp
toil rsync-cluster --zone us-west-2a <cluster-name>
example-job.yaml :/tmp
|
3. |
SSH into the cluster's leader node using the Ssh-Cluster Command utility: |
$ toil ssh-cluster --zone us-west-2a <cluster-name>
|
4. |
Once on the leader node, command line tools such as kubectl will be available to you. It's also a good idea to update and install the following: |
sudo apt-get
update
sudo apt-get -y upgrade
sudo apt-get -y dist-upgrade
sudo apt-get -y install git
|
5. |
Now create a new virtualenv with the --system-site-packages option and activate: |
virtualenv
--system-site-packages venv
source venv/bin/activate
|
6. |
Now run the CWL workflow with the Kubernetes batch system: |
(venv) $
toil-cwl-runner \
--provisioner aws \
--batchSystem kubernetes \
--jobStore aws:us-west-2:any-name \
/tmp/example.cwl /tmp/example-job.yaml
TIP:
When running a CWL workflow on AWS, input files can be provided either on the local file system or in S3 buckets using s3:// URI references. Final output files will be copied to the local file system of the leader node.
|
7. |
Finally, log out of the leader node and from your local computer, destroy the cluster: |
$ toil destroy-cluster --zone us-west-2a <cluster-name>
Running a Workflow with Autoscaling - Cactus
Cactus is a reference-free, whole-genome multiple alignment program that can be run on any of the cloud platforms Toil supports.
NOTE:
Cloud Independence :
This example provides a "cloud agnostic" view of running Cactus with Toil. Most options will not change between cloud providers. However, each provisioner has unique inputs for --leaderNodeType , --nodeType and --zone . We recommend the following:
When executing toil launch-cluster with gce specified for --provisioner , the option --boto must be specified and given a path to your .boto file. See Running in Google Compute Engine (GCE) for more information about the --boto option.
Also! Remember to use the Destroy-Cluster Command command when finished to destroy the cluster! Otherwise things may not be cleaned up properly.
|
1. |
Download pestis.tar.gz |
|||
|
2. |
Launch a cluster using the Launch-Cluster Command command: |
$ toil
launch-cluster <cluster-name> \
--provisioner <aws, gce> \
--keyPairName <key-pair-name> \
--leaderNodeType <type> \
--nodeType <type> \
-w 1-2 \
--zone <zone>
NOTE:
A Helpful Tip
When using AWS, setting the environment variable eliminates having to specify the --zone option for each command. This will be supported for GCE in the future.
$ export TOIL_AWS_ZONE=us-west-2c
|
3. |
Create appropriate directory for uploading files: |
$ toil
ssh-cluster --provisioner <aws, gce>
<cluster-name>
$ mkdir /root/cact_ex
$ exit
|
4. |
Copy the required files, i.e., seqFile.txt (a text file containing the locations of the input sequences as well as their phylogenetic tree, see here ), organisms' genome sequence files in FASTA format, and configuration files (e.g. blockTrim1.xml, if desired), up to the leader node: |
$ toil
rsync-cluster --provisioner <aws, gce>
<cluster-name> pestis-short-aws-seqFile.txt
:/root/cact_ex
$ toil rsync-cluster --provisioner <aws, gce>
<cluster-name> GCF_000169655.1_ASM16965v1_genomic.fna
:/root/cact_ex
$ toil rsync-cluster --provisioner <aws, gce>
<cluster-name> GCF_000006645.1_ASM664v1_genomic.fna
:/root/cact_ex
$ toil rsync-cluster --provisioner <aws, gce>
<cluster-name> GCF_000182485.1_ASM18248v1_genomic.fna
:/root/cact_ex
$ toil rsync-cluster --provisioner <aws, gce>
<cluster-name> GCF_000013805.1_ASM1380v1_genomic.fna
:/root/cact_ex
$ toil rsync-cluster --provisioner <aws, gce>
<cluster-name> setup_leaderNode.sh :/root/cact_ex
$ toil rsync-cluster --provisioner <aws, gce>
<cluster-name> blockTrim1.xml :/root/cact_ex
$ toil rsync-cluster --provisioner <aws, gce>
<cluster-name> blockTrim3.xml :/root/cact_ex
|
5. |
Log in to the leader node: |
$ toil ssh-cluster --provisioner <aws, gce> <cluster-name>
|
6. |
Set up the environment of the leader node to run Cactus: |
$ bash
/root/cact_ex/setup_leaderNode.sh
$ source cact_venv/bin/activate
(cact_venv) $ cd cactus
(cact_venv) $ pip install --upgrade .
|
7. |
Run Cactus as an autoscaling workflow: |
(cact_venv) $
cactus \
--retry 10 \
--batchSystem kubernetes \
--logDebug \
--logFile /logFile_pestis3 \
--configFile \
/root/cact_ex/blockTrim3.xml <aws,
google>:<zone>:cactus-pestis \
/root/cact_ex/pestis-short-aws-seqFile.txt \
/root/cact_ex/pestis_output3.hal
NOTE:
Pieces of the Puzzle :
--logDebug --- equivalent to --logLevel DEBUG .
--logFile /logFile_pestis3 --- writes logs in a file named logFile_pestis3 under / folder.
--configFile --- this is not required depending on whether a specific configuration file is intended to run the alignment.
<aws, google>:<zone>:cactus-pestis --- creates a bucket, named cactus-pestis , with the specified cloud provider to store intermediate job files and metadata. NOTE : If you want to use a GCE-based jobstore, specify google here, not gce .
The result file, named pestis_output3.hal , is stored under /root/cact_ex folder of the leader node.
Use cactus --help to see all the Cactus and Toil flags available.
|
8. |
Log out of the leader node: |
(cact_venv) $ exit
|
9. |
Download the resulted output to local machine: |
(venv) $ toil
rsync-cluster \
--provisioner <aws, gce> <cluster-name> \
:/root/cact_ex/pestis_output3.hal \
<path-of-folder-on-local-machine>
|
10. |
Destroy the cluster: |
(venv) $ toil destroy-cluster --provisioner <aws, gce> <cluster-name>
CWL IN TOIL
The Common Workflow Language (CWL) is an emerging standard for writing workflows that are portable across multiple workflow engines and platforms. Toil has full support for the CWL v1.0, v1.1, and v1.2 standards.
You can use Toil to run CWL workflows or develop and test new ones.
RUNNING CWL WORKFLOWS
The toil-cwl-runner command provides CWL parsing functionality using cwltool, and leverages the job-scheduling and batch system support of Toil. You can use it to run CWL workflows locally or in the cloud.
Running CWL Locally
To run in local batch mode, provide the CWL file and the input object file:
$ toil-cwl-runner example.cwl example-job.yml
For a simple example of CWL with Toil see Running a basic CWL workflow .
Note for macOS + Docker + Toil
When invoking CWL documents that make use of Docker containers if you see errors that look like
docker: Error
response from daemon: Mounts denied:
The paths /var/...tmp are not shared from OS X and are not
known to Docker.
you may need to add
export TMPDIR=/tmp/docker_tmp
either in your startup file ( .bashrc ) or add it manually in your shell before invoking toil.
Detailed Usage Instructions
Help information can be found by using this toil command:
$ toil-cwl-runner -h
A more detailed example shows how we can specify both Toil and cwltool arguments for our workflow:
$
toil-cwl-runner \
--singularity \
--jobStore my_jobStore \
--batchSystem lsf \
--workDir `pwd` \
--outdir `pwd` \
--logFile cwltoil.log \
--writeLogs `pwd` \
--logLevel DEBUG \
--retryCount 2 \
--maxLogFileSize 20000000000 \
--stats \
standard_bam_processing.cwl \
inputs.yaml
In this example, we set the following options, which are all passed to Toil:
--singularity : Specifies that all jobs with Docker format containers specified should be run using the Singularity container engine instead of the Docker container engine.
--jobStore : Path to a folder which doesn't exist yet, which will contain the Toil jobstore and all related job-tracking information.
--batchSystem : Use the specified HPC or Cloud-based cluster platform.
--workDir : The directory where all temporary files will be created for the workflow. A subdirectory of this will be set as the $TMPDIR environment variable and this subdirectory can be referenced using the CWL parameter reference $(runtime.tmpdir) in CWL tools and workflows.
--outdir : Directory where final File and Directory outputs will be written. References to these and other output types will be in the JSON object printed to the stdout stream after workflow execution.
--logFile : Path to the main logfile.
--writeLogs : Directory where job logs will be stored. At DEBUG log level, this will contain logs for each Toil job run, as well as stdout / stderr logs for each CWL CommandLineTool that didn't use the stdout / stderr directives to redirect output.
--retryCount : How many times to retry each Toil job.
--maxLogFileSize : Logs that get larger than this value will be truncated.
--stats : Save resources usages in json files that can be collected with the toil stats command after the workflow is done.
Extra Toil CWL Options
Besides the normal Toil options and the options supported by cwltool, toil-cwl-runner adds some of its own options:
--bypass-file-store
Do not use Toil's file store system and assume all paths are accessible in place from all nodes. This can avoid possibly-redundant file copies into Toil's job store storage, and is required for CWL's InplaceUpdateRequirement . But, it allows a failed job execution to leave behind a partially-modified state, which means that a restarted workflow might not work correctly.
--reference-inputs
Do not copy remote inputs into Toil's file store and assume they are accessible in place from all nodes.
--disable-streaming
Do not allow streaming of job input files. By default, files marked with streamable True are streamed from remote job stores.
--cwl-default-ram
Apply CWL specification default ramMin.
--no-cwl-default-ram
Do not apply CWL specification default ramMin, so that Toil --defaultMemory applies.
Running CWL in the Cloud
To run in cloud and HPC configurations, you may need to provide additional command line parameters to select and configure the batch system to use.
To run a CWL workflow in AWS with toil see Running a CWL Workflow on AWS .
Running CWL workflows with InplaceUpdateRequirement
Some CWL workflows use the InplaceUpdateRequirement feature, which requires that operations on files have visible side effects that Toil's file store cannot support. If you need to run a workflow like this, you can make sure that all of your worker nodes have a shared filesystem, and use the --bypass-file-store option to toil-cwl-runner . This will make it leave all CWL intermediate files on disk and share them between jobs using file paths, instead of storing them in the file store and downloading them when jobs need them.
Toil & CWL Tips
See logs for just one job by using the full log file
This requires knowing the job's toil-generated ID, which can be found in the log files.
cat cwltoil.log | grep jobVM1fIs
Grep for full tool commands from toil logs
This gives you a more concise view of the commands being run (note that this information is only available from Toil when running with --logDebug ).
pcregrep -M
"\[job .*\.cwl.*$\n(.* .*$\n)*" cwltoil.log
# ˆallows for multiline matching
Find Bams that have been generated for specific step while pipeline is running:
find . | grep -P 'ˆ./out_tmpdir.*_MD\.bam$'
See what jobs have been run
cat log/cwltoil.log | grep -oP "\[job .*.cwl\]" | sort | uniq
or:
cat log/cwltoil.log | grep -i "issued job"
Get status of a workflow
$ toil status
/home/johnsoni/TEST_RUNS_3/TEST_run/tmp/jobstore-09ae0acc-c800-11e8-9d09-70106fb1697e
<hostname> 2018-10-04 15:01:44,184 MainThread INFO
toil.lib.bioio: Root logger is at level 'INFO', 'toil'
logger at level 'INFO'.
<hostname> 2018-10-04 15:01:44,185 MainThread INFO
toil.utils.toilStatus: Parsed arguments
<hostname> 2018-10-04 15:01:47,081 MainThread INFO
toil.utils.toilStatus: Traversing the job graph gathering
jobs. This may take a couple of minutes.
Of the 286 jobs considered, there are 179 jobs with children, 107 jobs ready to run, 0 zombie jobs, 0 jobs with services, 0 services, and 0 jobs with log files currently in file:/home/user/jobstore-09ae0acc-c800-11e8-9d09-70106fb1697e.
Toil Stats
You can get run statistics broken down by CWL file. This only works once the workflow is finished:
$ toil stats /path/to/jobstore
This will report resource usage information for all the CWL jobs executed by the workflow.
See Stats Command for an explanation of what the different fields mean.
Understanding toil log files
There is a worker_log.txt file for each Toil job. This file is written to while the job is running, and uploaded at the end if the job finishes or if running at debug log level. If uploaded, the contents are printed to the main log file and transferred to a log file in the --logDir folder.
The new log file will be named something like:
CWLJob_<name of the CWL job>_<attempt number>.log
Standard output/error files will be named like:
<name of the CWL job>.stdout_<attempt number>.log
If you have a workflow revsort.cwl which has a step rev which calls the tool revtool.cwl , the CWL job name ends up being all those parts strung together with . : revsort.cwl.rev.revtool.cwl .
WDL IN TOIL
The Workflow Description Language (WDL) is a programming language designed for writing workflows that execute a set of tasks in a pipeline distributed across multiple computers. Workflows enable scientific analyses to be reproducible, by wrapping up a whole sequence of commands, whose outputs feed into other commands, into a workflow that can be executed the same way every time.
Toil can be used to run and to develop WDL workflows. The Toil team also maintains a set of WDL conformance tests for evaluating Toil and other WDL runners.
RUNNING WDL WITH TOIL
Toil has beta support for running WDL workflows, using the toil-wdl-runner command. This command comes with the [wdl] extra; see Installing Toil with Extra Features for how to install it if you do not have it.
You can run WDL workflows with toil-wdl-runner . Currently, toil-wdl-runner works by using MiniWDL to parse and interpret the WDL workflow, and has support for workflows in WDL 1.0 or later (which are required to declare a version , and which use inputs and outputs sections).
TIP:
The last release of Toil that supported unversioned, draft-2 WDL workflows was 5.12.0 .
Toil is, for compatible workflows, a drop-in replacement for the - Cromwell WDL runner. Instead of running a workflow with Cromwell:
java -jar Cromwell.jar run myWorkflow.wdl --inputs myWorkflow_inputs.json
You can run the workflow with toil-wdl-runner :
toil-wdl-runner myWorkflow.wdl --input myWorkflow_inputs.json
(We're here running Toil with --input , but it can also accept the Cromwell-style --inputs .)
This will default to executing on the current machine, with a job store in an automatically determined temporary location, but you can add a few Toil options to use other Toil-supported batch systems, such as Kubernetes:
toil-wdl-runner --jobStore aws:us-west-2:wdl-job-store --batchSystem kubernetes myWorkflow.wdl --input myWorkflow_inputs.json
For Toil, the --input is optional, and inputs can be passed as a positional argument:
toil-wdl-runner myWorkflow.wdl myWorkflow_inputs.json
You can also run workflows from URLs. For example, to run the MiniWDL self test workflow, you can do:
toil-wdl-runner https://raw.githubusercontent.com/DataBiosphere/toil/36b54c45e8554ded5093bcdd03edb2f6b0d93887/src/toil/test/wdl/miniwdl_self_test/self_test.wdl https://raw.githubusercontent.com/DataBiosphere/toil/36b54c45e8554ded5093bcdd03edb2f6b0d93887/src/toil/test/wdl/miniwdl_self_test/inputs.json
Toil WDL Runner Options
--jobStore : Specifies where to keep the Toil state information while running the workflow. Must be accessible from all machines.
-o or --outputDirectory : Specifies the output folder or URI prefix to save workflow output files in. Defaults to a new directory in the current directory.
-m or --outputFile : Specifies a JSON file name or URI to save workflow output values at. Defaults to standard output.
-i , --input , or --inputs : Alternative to the positional argument for the input JSON file, for compatibility with other WDL runners.
--outputDialect : Specifies an output format dialect. Can be cromwell to just return the workflow's output values as JSON or miniwdl to nest that under an outputs key and includes a dir key.
--referenceInputs : Specifies whether input files to Toil should be passed around by URL reference instead of being imported into Toil's storage. Defaults to off. Can be True or False or other similar words.
--container : Specifies the container engine to use to run tasks. By default this is auto , which tries Singularity if it is installed and Docker if it isn't. Can also be set to docker or singularity explicitly.
--allCallOutputs : Specifies whether outputs from all calls in a workflow should be included alongside the outputs from the output section, when an output section is defined. For strict WDL spec compliance, should be set to False . Usually defaults to False . If the workflow includes metadata for the Cromwell Output Organizer (croo) , will default to True .
Any number of other Toil options may also be specified. For defined Toil options, see Commandline Options .
Managing Workflow Logs
At the default settings, if a WDL task succeeds, the standard output and standard error will be printed in the toil-wdl-runner output, unless they are captured by the workflow (with the stdout() and stderr() WDL built-in functions). If a WDL task fails, they will be printed whether they were meant to be captured or not. Complete logs from Toil for failed jobs will also be printed.
If you would like to save the logs organized by WDL task, you can use the --writeLogs or --writeLogsGzip options to specify a directory where the log files should be saved. Log files will be named after the same dotted, hierarchical workflow and task names used to set values from the input JSON, except that scatters will add an additional numerical component. In addition to the logs for WDL tasks, Toil job logs for failed jobs will also appear here when running at the default log level.
For example, if you run:
toil-wdl-runner --writeLogs logs https://raw.githubusercontent.com/DataBiosphere/toil/36b54c45e8554ded5093bcdd03edb2f6b0d93887/src/toil/test/wdl/miniwdl_self_test/self_test.wdl https://raw.githubusercontent.com/DataBiosphere/toil/36b54c45e8554ded5093bcdd03edb2f6b0d93887/src/toil/test/wdl/miniwdl_self_test/inputs.json
You will end up with a logs/ directory containing:
hello_caller.0.hello.stderr_000.log
hello_caller.1.hello.stderr_000.log
hello_caller.2.hello.stderr_000.log
The final number is a sequential counter: if a step has to be retried, or if you run the workflow multiple times without clearing out the logs directory, it will increment.
DEVELOPING A WDL WORKFLOW
Toil can be used as a development tool for writing and locally testing WDL workflows. These workflows can then be run on Toil against a cloud or cluster backend, or used with other WDL implementations such as - Terra , Cromwell , or MiniWDL .
The easiest way to get started with writing WDL workflows is by following a tutorial.
Using the UCSC Genomics Institute Tutorial
The UCSC Genomics Institute (home of the Toil project) has a tutorial on writing WDL workflows with Toil . You can follow this tutorial to be walked through writing your own WDL workflow with Toil. They also have tips on debugging WDL workflows with Toil .
These tutorials and tips are aimed at users looking to run WDL workflows with Toil in a Slurm environment, but they can also apply in other situations.
Using the Official WDL tutorials
You can also learn to write WDL workflows for Toil by following the - official WDL tutorials .
When you reach the point of executing your workflow , instead of running with Cromwell:
java -jar Cromwell.jar run myWorkflow.wdl --inputs myWorkflow_inputs.json
you can instead run with toil-wdl-runner :
toil-wdl-runner myWorkflow.wdl --input myWorkflow_inputs.json
Using the Learn WDL Video Tutorials
For people who prefer video tutorials, Lynn Langit has a Learn WDL video course that will teach you how to write and run WDL workflows. The course is taught using Cromwell, but Toil should also be compatible with the course's workflows.
WDL Specifications
WDL language specifications can be found here: - https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md
Toil is not yet fully conformant with the WDL specification, but it inherits most of the functionality of MiniWDL .
WDL CONFORMANCE TESTING
The Toil team maintains a set of WDL Conformance Tests . Much like the - CWL Conformance Tests for CWL, the WDL Conformance Tests are useful for determining if a WDL implementation actually follows the WDL specification .
The WDL Conformance Tests include a runner harness that is able to test toil-wdl-runner , as well as Cromwell and MiniWDL, and supports testing conformance with the 1.1, 1.0, and draft-2 versions of WDL.
If you would like to evaluate Toil's WDL conformance for yourself, first make sure that you have toil-wdl-runner installed. It comes with the [wdl] extra; see Installing Toil with Extra Features .
Then, you can check out the test repository:
$ git clone
https://github.com/DataBiosphere/wdl-conformance-tests
$ cd wdl-conformance-tests
Most tests will need a Docker daemon available, so make sure yours is working properly:
$ docker info
$ docker run --rm docker/whalesay cowsay "Docker is
working"
Then, you can test toil-wdl-runner against a particular WDL spec version, say 1.1:
$ python3 run.py --runner toil-wdl-runner --versions 1.1
For any failed tests, the test number and the log of the failing test will be reported.
After the tests run, you can clean up intermediate files with:
$ make clean
For more options, see:
$ python3 run.py --help
Or, consult the conformance test documentation .
INTRODUCTION
Toil runs in various environments, including locally and in the cloud (Amazon Web Services and Google Compute Engine). Toil also supports workflows written in two DSLs: CWL and WDL , as well as workflows written in Python (see Developing a Python Workflow ).
Toil is built in a modular way so that it can be used on lots of different systems, and with different configurations. The three configurable pieces are the
|
• |
Job Store : A filepath or url that can host and centralize all files for a workflow (e.g. a local folder, or an AWS s3 bucket url). |
||
|
• |
Batch System : Specifies either a local single-machine or a currently supported HPC environment (lsf, mesos, slurm, torque, htcondor, kubernetes, or grid_engine). |
||
|
• |
Provisioner : For running in the cloud only. This specifies which cloud provider provides instances to do the "work" of your workflow. |
Job Store
The job store is a storage abstraction which contains all of the information used in a Toil run. This centralizes all of the files used by jobs in the workflow and also the details of the progress of the run. If a workflow crashes or fails, the job store contains all of the information necessary to resume with minimal repetition of work.
Several different job stores are supported, including the file job store and cloud job stores. For information on developing job stores, see Job Store API .
File Job Store
The file job store is for use locally, and keeps the workflow information in a directory on the machine where the workflow is launched. This is the simplest and most convenient job store for testing or for small runs.
For an example that uses the file job store, see Running a basic CWL workflow .
Cloud Job Stores
Toil currently supports the following cloud storage systems as job stores:
|
• |
AWS Job Store : An AWS S3 bucket formatted as "aws:<zone>:<bucketname>" where only numbers, letters, and dashes are allowed in the bucket name. Example: aws:us-west-2:my-aws-jobstore-name . |
||
|
• |
Google Job Store : A Google Cloud Storage bucket formatted as "gce:<zone>:<bucketname>" where only numbers, letters, and dashes are allowed in the bucket name. Example: gce:us-west2-a:my-google-jobstore-name . |
These use cloud buckets to house all of the files. This is useful if there are several different worker machines all running jobs that need to access the job store.
Batch System
A Toil batch system is either a local single-machine (one computer) or a currently supported cluster of computers (lsf, mesos, slurm, torque, htcondor, or grid_engine) These environments manage individual worker nodes under a leader node to process the work required in a workflow. The leader and its workers all coordinate their tasks and files through a centralized job store location.
See Batch System API for a more detailed description of different batch systems, or information on developing batch systems.
Provisioner
The Toil provisioner provides a tool set for running a Toil workflow on a particular cloud platform.
The Toil Cluster Utilities are command line tools used to provision nodes in your desired cloud platform. They allows you to launch nodes, ssh to the leader, and rsync files back and forth.
For detailed instructions for using the provisioner see Running in AWS or Running in Google Compute Engine (GCE) .
COMMANDLINE OPTIONS
A quick way to see all of Toil's commandline options is by executing the following on a workflow language front-end:
$ toil-wdl-runner --help
Or a Toil Python workflow:
$ python3 example.py --help
For a basic toil workflow, Toil has one mandatory argument, the job store. All other arguments are optional.
The Config File
Instead of changing the arguments on the command line, Toil offers support for using a configuration file.
Options will be applied with priority:
|
1. |
Command line options |
|||
|
2. |
Environmental Variables |
|||
|
3. |
Config file values |
a.
|
Provided config file through --config |
||||
|
b. |
Default config value in $HOME/.toil/default.yaml |
|||
|
4. |
Defaults
You can manually generate an example configuration file to a path you select. To generate a configuration file, run:
$ toil config [filename].yaml
Then uncomment options as necessary and change/provide new values.
After editing the config file, you can run Toil with its settings by passing it on the command line:
$ python3 example.py --config=[filename].yaml
Alternatively, you can edit the default config file, which is located at $HOME/.toil/default.yaml
If CLI options are used in addition to the configuration file, the CLI options will overwrite the configuration file options. For example:
$ python3 example.py --config=[filename].yaml --defaultMemory 80Gi
This will result in a default memory per job of 80GiB no matter what is in the configuration file provided.
The Job Store
Running Toil workflows requires a file path or URL to a central location for all of the intermediate files for the workflow: the job store. For toil-cwl-runner and toil-wdl-runner a job store can often be selected automatically or can be specified with the --jobStore option; Toil Python workflows generally require the job store as a positional command line argument. To use the Python quickstart example, if you're on a node that has a large /scratch volume, you can specify that the jobstore be created there by executing: python3 HelloWorld.py /scratch/my-job-store , or more explicitly, python3 HelloWorld.py file:/scratch/my-job-store .
Syntax for specifying different job stores:
Local: file:job-store-name
AWS: aws:region-here:job-store-name
Google: google:projectID-here:job-store-name
Different types of job store options can be found below.
Commandline Options
Core Toil Options Options to specify the location of the Toil workflow and turn on stats collation about the performance of jobs.
--workDir WORKDIR
Absolute path to directory where temporary files generated during the Toil run should be placed. Standard output and error from batch system jobs (unless --noStdOutErr is set) will be placed in this directory. A cache directory may be placed in this directory. Temp files and folders will be placed in a directory toil-<workflowID> within workDir. The workflowID is generated by Toil and will be reported in the workflow logs. Default is determined by the variables (TMPDIR, TEMP, TMP) via mkdtemp. For CWL, the temporary output directory is used instead (see CWL option --tmp-outdir-prefix ). This directory needs to exist on all machines running jobs; if capturing standard output and error from batch system jobs is desired, it will generally need to be on a shared file system. When sharing a cache between containers on a host, this directory must be shared between the containers.
--coordinationDir COORDINATION_DIR
Absolute path to directory where Toil will keep state and lock files. When sharing a cache between containers on a host, this directory must be shared between the containers.
--noStdOutErr
Do not capture standard output and error from batch system jobs.
--stats
Records statistics about the toil workflow to be used by 'toil stats'.
--clean= STATE
Determines the deletion of the jobStore upon completion of the program. Choices: 'always', 'onError','never', or 'onSuccess'. The --stats option requires information from the jobStore upon completion so the jobStore will never be deleted with that flag. If you wish to be able to restart the run, choose 'never' or 'onSuccess'. Default is 'never' if stats is enabled, and 'onSuccess' otherwise
--cleanWorkDir STATE
Determines deletion of temporary worker directory upon completion of a job. Choices: 'always', 'onError', 'never', or 'onSuccess'. Default = always. WARNING: This option should be changed for debugging only. Running a full pipeline with this option could fill your disk with intermediate data.
--clusterStats FILEPATH
If enabled, writes out JSON resource usage statistics to a file. The default location for this file is the current working directory, but an absolute path can also be passed to specify where this file should be written. This option only applies when using scalable batch systems.
--restart
If --restart is specified then will attempt to restart existing workflow at the location pointed to by the --jobStore option. Will raise an exception if the workflow does not exist.
Logging Options Toil hides stdout and stderr by default except in case of job failure. Log levels in toil are based on priority from the logging module:
--logOff
Only CRITICAL log messages are shown. Equivalent to --logLevel=OFF or --logLevel=CRITICAL .
--logCritical
Only CRITICAL log messages are shown. Equivalent to --logLevel=OFF or --logLevel=CRITICAL .
--logError
Only ERROR, and CRITICAL log messages are shown. Equivalent to --logLevel=ERROR .
--logWarning
Only WARN, ERROR, and CRITICAL log messages are shown. Equivalent to --logLevel=WARNING .
--logInfo
All non-debugging-related log messages are shown. Equivalent to --logLevel=INFO .
--logDebug
Log messages at DEBUG level and above are shown. Equivalent to --logLevel=DEBUG .
--logTrace
Log messages at TRACE level and above are shown. Equivalent to --logLevel=TRACE .
--logLevel= LOGLEVEL
May be set to: OFF (or CRITICAL ), ERROR , WARN (or WARNING ), INFO , DEBUG , or TRACE .
--logFile FILEPATH
Specifies a file path to write the logging output to.
--rotatingLogging
Turn on rotating logging, which prevents log files from getting too big (set using --maxLogFileSize BYTESIZE ).
--maxLogFileSize BYTESIZE
The maximum size of a job log file to keep (in bytes), log files larger than this will be truncated to the last X bytes. Setting this option to zero will prevent any truncation. Setting this option to a negative value will truncate from the beginning. Default=100MiB Sets the maximum log file size in bytes ( --rotatingLogging must be active).
--log-dir DIRPATH
For CWL and local file system only. Log stdout and stderr (if tool requests stdout/stderr) to the DIRPATH.
--logColors BOOL
Enable or disable colored logging. Default=True.
Batch System Options
--batchSystem BATCHSYSTEM
The type of batch system to run the job(s) with. Default = single_machine.
--disableAutoDeployment
Should auto-deployment of Toil Python workflows be deactivated? If True, the workflow's Python code should be present at the same location on all workers. Default = False.
--maxJobs MAXJOBS
Specifies the maximum number of jobs to submit to the backing scheduler at once. Not supported on Mesos or AWS Batch. Use 0 for unlimited. Defaults to unlimited.
--maxLocalJobs MAXLOCALJOBS
Specifies the maximum number of housekeeping jobs to run simultaneously on the local system. Use 0 for unlimited. Defaults to the number of local cores.
--manualMemArgs
Do not add the default arguments: 'hv=MEMORY' & 'h_vmem=MEMORY' to the qsub call, and instead rely on TOIL_GRIDGENGINE_ARGS to supply alternative arguments. Requires that TOIL_GRIDGENGINE_ARGS be set.
--memoryIsProduct
If the batch system understands requested memory as a product of the requested memory and the number of cores, set this flag to properly allocate memory. This can be fairly common with grid engine clusters (Ex: SGE, PBS, Torque).
--runCwlInternalJobsOnWorkers
Whether to run CWL internal jobs (e.g. CWLScatter) on the worker nodes instead of the primary node. If false (default), then all such jobs are run on the primary node. Setting this to true can speed up the pipeline for very large workflows with many sub-workflows and/or scatters, provided that the worker pool is large enough.
--statePollingWait STATEPOLLINGWAIT
Time, in seconds, to wait before doing a scheduler query for job state. Return cached results if within the waiting period. Only works for grid engine batch systems such as gridengine, htcondor, torque, slurm, and lsf.
--statePollingTimeout STATEPOLLINGTIMEOUT
Time, in seconds, to retry against a broken scheduler. Only works for grid engine batch systems such as gridengine, htcondor, torque, slurm, and lsf.
--batchLogsDir BATCHLOGSDIR
Directory to tell the backing batch system to log into. Should be available on both the leader and the workers, if the backing batch system writes logs to the worker machines' filesystems, as many HPC schedulers do. If unset, the Toil work directory will be used. Only works for grid engine batch systems such as gridengine, htcondor, torque, slurm, and lsf.
--mesosEndpoint MESOSENDPOINT
The host and port of the Mesos server separated by a colon. (default: <leader IP>:5050)
--mesosFrameworkId MESOSFRAMEWORKID
Use a specific Mesos framework ID.
--mesosRole MESOSROLE
Use a Mesos role.
--mesosName MESOSNAME
The Mesos name to use. (default: toil)
--scale SCALE
A scaling factor to change the value of all submitted tasks' submitted cores. Used in single_machine batch system. Useful for running workflows on smaller machines than they were designed for, by setting a value less than 1. (default: 1)
--slurmAllocateMem SLURM_ALLOCATE_MEM
If False, do not use --mem. Used as a workaround for Slurm clusters that reject jobs with memory allocations.
--slurmTime SLURM_TIME
Slurm job time limit, in [DD-]HH:MM:SS format.
--slurmPE SLURM_PE
Special partition to send Slurm jobs to if they ask for more than 1 CPU. Useful for Slurm clusters that do not offer a partition accepting both single-core and multi-core jobs.
--slurmArgs SLURM_ARGS
Extra arguments to pass to Slurm.
--kubernetesHostPath KUBERNETES_HOST_PATH
Path on Kubernetes hosts to use as shared inter-pod temp directory.
--kubernetesOwner KUBERNETES_OWNER
Username to mark Kubernetes jobs with.
--kubernetesServiceAccount KUBERNETES_SERVICE_ACCOUNT
Service account to run jobs as.
--kubernetesPodTimeout KUBERNETES_POD_TIMEOUT
Seconds to wait for a scheduled Kubernetes pod to start running. (default: 120s)
--kubernetesPrivileged BOOL
Whether to allow Kubernetes pods to run as privileged. This can be used to enable FUSE mounts for faster runtimes with Singularity. When launching Toil-managed clusters, this will be set to true by --allowFuse. (default: False)
--awsBatchRegion AWS_BATCH_REGION
The AWS region containing the AWS Batch queue to submit to.
--awsBatchQueue AWS_BATCH_QUEUE
The name or ARN of the AWS Batch queue to submit to.
--awsBatchJobRoleArn AWS_BATCH_JOB_ROLE_ARN
The ARN of an IAM role to run AWS Batch jobs as, so they can e.g. access a job store. Must be assumable by ecs-tasks.amazonaws.com
Data Storage Options Allows configuring Toil's data storage.
--symlinkImports BOOL
When using a filesystem based job store, CWL input files are by default symlinked in. Setting this option to True instead copies the files into the job store, which may protect them from being modified externally. When set to False and as long as caching is enabled, Toil will protect the file automatically by changing the permissions to read-only. (Default=True)
--moveOutputs BOOL
When using a filesystem based job store, output files are by default moved to the output directory, and a symlink to the moved exported file is created at the initial location. Setting this option to True instead copies the files into the output directory. Applies to filesystem-based job stores only. (Default=False)
--caching BOOL
Set caching options. This must be set to "false" to use a batch system that does not support cleanup. Set to "true" if caching is desired.
--symlinkJobStoreReads BOOL
Allow reads and container mounts from a JobStore's shared filesystem directly via symlink. Can be turned off if the shared filesystem can't support the IO load of all the jobs reading from it at once, and you want to use --caching=True to make jobs on each node read from node-local cache storage. (Default=True)
Autoscaling Options Allows the specification of the minimum and maximum number of nodes in an autoscaled cluster, as well as parameters to control the level of provisioning.
--provisioner CLOUDPROVIDER
The provisioner for cluster auto-scaling. This is the main Toil --provisioner option, and defaults to None for running on single_machine and non-auto-scaling batch systems. The currently supported choices are 'aws' or 'gce'.
--nodeTypes NODETYPES
Specifies a list of
comma-separated node types, each of which is composed of
slash-separated instance types, and an optional spot bid set
off by a colon, making the node type preemptible. Instance
types may appear in multiple node types, and the same node
type may appear as both preemptible and non-preemptible.
Valid argument specifying two node types:
c5.4xlarge/c5a.4xlarge:0.42,t2.large
Node types:
c5.4xlarge/c5a.4xlarge:0.42 and t2.large
Instance types:
c5.4xlarge, c5a.4xlarge, and t2.large
Semantics:
Bid $0.42/hour for either c5.4xlarge or c5a.4xlarge instances, treated interchangeably, while they are available at that price, and buy t2.large instances at full price
--minNodes MINNODES
Minimum number of nodes of each type in the cluster, if using auto-scaling. This should be provided as a comma-separated list of the same length as the list of node types. default=0
--maxNodes MAXNODES
Maximum number of nodes of each type in the cluster, if using autoscaling, provided as a comma-separated list. The first value is used as a default if the list length is less than the number of nodeTypes. default=10
--targetTime TARGETTIME
Sets how rapidly you aim to complete jobs in seconds. Shorter times mean more aggressive parallelization. The autoscaler attempts to scale up/down so that it expects all queued jobs will complete within targetTime seconds. (Default: 1800)
--betaInertia BETAINERTIA
A smoothing parameter to prevent unnecessary oscillations in the number of provisioned nodes. This controls an exponentially weighted moving average of the estimated number of nodes. A value of 0.0 disables any smoothing, and a value of 0.9 will smooth so much that few changes will ever be made. Must be between 0.0 and 0.9. (Default: 0.1)
--scaleInterval SCALEINTERVAL
The interval (seconds) between assessing if the scale of the cluster needs to change. (Default: 60)
--preemptibleCompensation PREEMPTIBLECOMPENSATION
The preference of the autoscaler to replace preemptible nodes with non-preemptible nodes, when preemptible nodes cannot be started for some reason. Defaults to 0.0. This value must be between 0.0 and 1.0, inclusive. A value of 0.0 disables such compensation, a value of 0.5 compensates two missing preemptible nodes with a non-preemptible one. A value of 1.0 replaces every missing pre-emptable node with a non-preemptible one.
--nodeStorage NODESTORAGE
Specify the size of the root volume of worker nodes when they are launched in gigabytes. You may want to set this if your jobs require a lot of disk space. The default value is 50.
--nodeStorageOverrides NODESTORAGEOVERRIDES
Comma-separated list of nodeType:nodeStorage that are used to override the default value from --nodeStorage for the specified nodeType(s). This is useful for heterogeneous jobs where some tasks require much more disk than others.
--metrics
Enable the prometheus/grafana dashboard for monitoring CPU/RAM usage, queue size, and issued jobs.
--assumeZeroOverhead
Ignore scheduler and OS overhead and assume jobs can use every last byte of memory and disk on a node when autoscaling.
Service Options Allows the specification of the maximum number of service jobs in a cluster. By keeping this limited we can avoid nodes occupied with services causing deadlocks. (Not for CWL).
--maxServiceJobs MAXSERVICEJOBS
The maximum number of service jobs that can be run concurrently, excluding service jobs running on preemptible nodes. default=9223372036854775807
--maxPreemptibleServiceJobs MAXPREEMPTIBLESERVICEJOBS
The maximum number of service jobs that can run concurrently on preemptible nodes. default=9223372036854775807
--deadlockWait DEADLOCKWAIT
Time, in seconds, to tolerate the workflow running only the same service jobs, with no jobs to use them, before declaring the workflow to be deadlocked and stopping. default=60
--deadlockCheckInterval DEADLOCKCHECKINTERVAL
Time, in seconds, to wait between checks to see if the workflow is stuck running only service jobs, with no jobs to use them. Should be shorter than --deadlockWait . May need to be increased if the batch system cannot enumerate running jobs quickly enough, or if polling for running jobs is placing an unacceptable load on a shared cluster. default=30
Resource Options The options to specify default cores/memory requirements (if not specified by the jobs themselves), and to limit the total amount of memory/cores requested from the batch system.
--defaultMemory INT
The default amount of memory to request for a job. Only applicable to jobs that do not specify an explicit value for this requirement. Standard suffixes like K, Ki, M, Mi, G or Gi are supported. Default is 2.0Gi
--defaultCores FLOAT
The default number of CPU cores to dedicate a job. Only applicable to jobs that do not specify an explicit value for this requirement. Fractions of a core (for example 0.1) are supported on some batch systems, namely Mesos and single_machine. Default is 1.0
--defaultDisk INT
The default amount of disk space to dedicate a job. Only applicable to jobs that do not specify an explicit value for this requirement. Standard suffixes like K, Ki, M, Mi, G or Gi are supported. Default is 2.0Gi
--defaultAccelerators ACCELERATOR
The default amount of accelerators to request for a job. Only applicable to jobs that do not specify an explicit value for this requirement. Each accelerator specification can have a type (gpu [default], nvidia, amd, cuda, rocm, opencl, or a specific model like nvidia-tesla-k80), and a count [default: 1]. If both a type and a count are used, they must be separated by a colon. If multiple types of accelerators are used, the specifications are separated by commas. Default is [].
--defaultPreemptible BOOL
Make all jobs able to run on preemptible (spot) nodes by default.
--maxCores INT
The maximum number of CPU cores to request from the batch system at any one time. Standard suffixes like K, Ki, M, Mi, G or Gi are supported.
--maxMemory INT
The maximum amount of memory to request from the batch system at any one time. Standard suffixes like K, Ki, M, Mi, G or Gi are supported.
--maxDisk INT
The maximum amount of disk space to request from the batch system at any one time. Standard suffixes like K, Ki, M, Mi, G or Gi are supported.
Options for rescuing/killing/restarting jobs. The options for jobs that either run too long/fail or get lost (some batch systems have issues!).
--retryCount INT
Number of times to retry a failing job before giving up and labeling job failed. default=1
--enableUnlimitedPreemptibleRetries
If set, preemptible failures (or any failure due to an instance getting unexpectedly terminated) will not count towards job failures and --retryCount .
--doubleMem
If set, batch jobs which die due to reaching memory limit on batch schedulers will have their memory doubled and they will be retried. The remaining retry count will be reduced by 1. Currently only supported by LSF. default=False.
--maxJobDuration INT
Maximum runtime of a job (in seconds) before we kill it (this is a lower bound, and the actual time before killing the job may be longer).
--rescueJobsFrequency INT
Period of time to wait (in seconds) between checking for missing/overlong jobs, that is jobs which get lost by the batch system. Expert parameter.
--jobStoreTimeout FLOAT
Maximum time (in seconds) to wait for a job's update to the job store before declaring it failed.
Log Management Options
--maxLogFileSize MAXLOGFILESIZE
The maximum size of a job log file to keep (in bytes), log files larger than this will be truncated to the last X bytes. Setting this option to zero will prevent any truncation. Setting this option to a negative value will truncate from the beginning. Default=62.5 K
--writeLogs FILEPATH
Write worker logs received by the leader into their own files at the specified path. Any non-empty standard output and error from failed batch system jobs will also be written into files at this path. The current working directory will be used if a path is not specified explicitly. Note: By default only the logs of failed jobs are returned to leader. Set log level to 'debug' or enable --writeLogsFromAllJobs to get logs back from successful jobs, and adjust --maxLogFileSize to control the truncation limit for worker logs.
--writeLogsGzip FILEPATH
Identical to --writeLogs except the logs files are gzipped on the leader.
--writeMessages FILEPATH
File to send messages from the leader's message bus to.
--realTimeLogging BOOL
Enable real-time logging from workers to leader.
Miscellaneous Options
--disableChaining
Disables chaining of jobs (chaining uses one job's resource allocation for its successor job if possible).
--disableJobStoreChecksumVerification
Disables checksum verification for files transferred to/from the job store. Checksum verification is a safety check to ensure the data is not corrupted during transfer. Currently only supported for non-streaming AWS files
--sseKey SSEKEY
Path to file containing 32 character key to be used for server-side encryption on awsJobStore or googleJobStore. SSE will not be used if this flag is not passed.
--setEnv NAME , -e NAME
NAME=VALUE or NAME, -e NAME=VALUE or NAME are also valid. Set an environment variable early on in the worker. If VALUE is omitted, it will be looked up in the current environment. Independently of this option, the worker will try to emulate the leader's environment before running a job, except for some variables known to vary across systems. Using this option, a variable can be injected into the worker process itself before it is started.
--servicePollingInterval SERVICEPOLLINGINTERVAL
Interval of time service jobs wait between polling for the existence of the keep-alive flag (default=60)
--forceDockerAppliance
Disables sanity checking the existence of the docker image specified by TOIL_APPLIANCE_SELF, which Toil uses to provision mesos for autoscaling.
--statusWait INT
Seconds to wait between reports of running jobs. (default=3600)
--disableProgress
Disables the progress bar shown when standard error is a terminal.
Debug Options Debug options for finding problems or helping with testing.
--debugWorker
Experimental no forking mode for local debugging. Specifically, workers are not forked and stderr/stdout are not redirected to the log. (default=False)
--disableWorkerOutputCapture
Let worker output go to worker's standard out/error instead of per-job logs.
--badWorker BADWORKER
For testing purposes randomly kill --badWorker proportion of jobs using SIGKILL. (Default: 0.0)
--badWorkerFailInterval BADWORKERFAILINTERVAL
When killing the job pick uniformly within the interval from 0.0 to --badWorkerFailInterval seconds after the worker starts. (Default: 0.01)
--kill_polling_interval KILL_POLLING_INTERVAL
Interval of time (in seconds) the leader waits between polling for the kill flag inside the job store set by the "toil kill" command. (default=5)
Restart Option
In the event of failure, Toil can resume the pipeline by adding the argument --restart and rerunning the workflow. Toil Python workflows (but not CWL or WDL workflows) can even be edited and resumed, which is useful for development or troubleshooting.
Running Workflows with Services
Toil supports jobs, or clusters of jobs, that run as services to other accessor jobs. Example services include server databases or Apache Spark Clusters. As service jobs exist to provide services to accessor jobs their runtime is dependent on the concurrent running of their accessor jobs. The dependencies between services and their accessor jobs can create potential deadlock scenarios, where the running of the workflow hangs because only service jobs are being run and their accessor jobs can not be scheduled because of too limited resources to run both simultaneously. To cope with this situation Toil attempts to schedule services and accessors intelligently, however to avoid a deadlock with workflows running service jobs it is advisable to use the following parameters:
|
• |
--maxServiceJobs : The maximum number of service jobs that can be run concurrently, excluding service jobs running on preemptible nodes. |
||
|
• |
--maxPreemptibleServiceJobs : The maximum number of service jobs that can run concurrently on preemptible nodes. |
Specifying these parameters so that at a maximum cluster size there will be sufficient resources to run accessors in addition to services will ensure that such a deadlock can not occur.
If too low a limit is specified then a deadlock can occur in which toil can not schedule sufficient service jobs concurrently to complete the workflow. Toil will detect this situation if it occurs and throw a toil.DeadlockException exception. Increasing the cluster size and these limits will resolve the issue.
Setting Options directly in a Python Workflow
It's good to remember that commandline options can be overridden in the code of a Python workflow. For example, toil.job.Job.Runner.getDefaultOptions() can be used to get the default Toil options, ignoring what was passed on the command line. In this example, this is used to ignore command-line options and always run with the "./toilWorkflow" directory as the jobstore:
options = Job.Runner.getDefaultOptions("./toilWorkflow") # Get the options object
with
Toil(options) as toil:
toil.start(Job()) # Run the root job
However, each option can be explicitly set within the workflow by modifying the options object. In this example, we are setting logLevel = "DEBUG" (all log statements are shown) and clean="ALWAYS" (always delete the jobstore) like so:
options =
Job.Runner.getDefaultOptions("./toilWorkflow") #
Get the options object
options.logLevel = "DEBUG" # Set the log level to
the debug level.
options.clean = "ALWAYS" # Always delete the
jobStore after a run
with
Toil(options) as toil:
toil.start(Job()) # Run the root job
However, the usual incantation is to accept commandline args from the user with the following:
parser =
Job.Runner.getDefaultArgumentParser() # Get the parser
options = parser.parse_args() # Parse user args to create
the options object
with
Toil(options) as toil:
toil.start(Job()) # Run the root job
We can also have code in the workflow to overwrite user supplied arguments:
parser =
Job.Runner.getDefaultArgumentParser() # Get the parser
options = parser.parse_args() # Parse user args to create
the options object
options.logLevel = "DEBUG" # Set the log level to
the debug level.
options.clean = "ALWAYS" # Always delete the
jobStore after a run
with
Toil(options) as toil:
toil.start(Job()) # Run the root job
TOIL UTILITIES
Toil includes some utilities for inspecting or manipulating workflows during and after their execution. (There are additional Toil Cluster Utilities available for working with Toil-managed clusters in the cloud.)
The generic toil subcommand utilities are:
stats --- Reports runtime and resource usage for all jobs in a specified jobstore (workflow must have originally been run using the --stats option).
status --- Inspects a job store to see which jobs have failed, run successfully, etc.
debug-job --- Runs a failing job on your local machine.
clean --- Delete the job store used by a previous Toil workflow invocation.
kill --- Kills any running jobs in a rogue toil.
For information on a specific utility, run it with the --help option:
toil stats --help
Stats Command
To use the stats command, a workflow must first be run using the --stats option. Using this command makes certain that toil does not delete the job store, no matter what other options are specified (i.e. normally the option --clean=always would delete the job store, but --stats will override this).
Running an Example
We can run an example workflow and record stats:
python3 tutorial_stats.py file:my-jobstore --stats
Where tutorial_stats.py is the following:
import math
import time
from multiprocessing import Process
from
toil.common import Toil
from toil.job import Job
def
think(seconds):
start = time.time()
while time.time() - start < seconds:
# Use CPU
math.sqrt(123456)
class
TimeWaster(Job):
def __init__(self, time_to_think, time_to_waste,
space_to_waste, *args, **kwargs):
self.time_to_think = time_to_think
self.time_to_waste = time_to_waste
self.space_to_waste = space_to_waste
super().__init__(*args, **kwargs)
def run(self,
fileStore):
# Waste some space
file_path = fileStore.getLocalTempFile()
with open(file_path, "w") as stream:
for i in range(self.space_to_waste):
stream.write("X")
# Do some
"useful" compute
processes = []
for core_number in range(max(1, self.cores)):
# Use all the assigned cores to think
p = Process(target=think, args=(self.time_to_think,))
p.start()
processes.append(p)
for p in processes:
p.join()
# Also waste
some time
time.sleep(self.time_to_waste)
def main():
options =
Job.Runner.getDefaultArgumentParser().parse_args()
job1 =
TimeWaster(0, 0, 0, displayName="doNothing")
job2 = TimeWaster(10, 0, 4096,
displayName="efficientJob")
job3 = TimeWaster(10, 0, 1024, cores=4,
displayName="multithreadedJob")
job4 = TimeWaster(1, 9, 65536,
displayName="inefficientJob")
job1.addChild(job2)
job1.addChild(job3)
job3.addChild(job4)
with
Toil(options) as toil:
if not toil.options.restart:
toil.start(job1)
else:
toil.restart()
if __name__ ==
"__main__":
main()
Notice the displayName key, which can rename a job, giving it an alias when it is finally displayed in stats.
Displaying Stats
To see the runtime and resources used for each job when it was run, type
toil stats file:my-jobstore
This should output something like the following:
Batch System:
single_machine
Default Cores: 1 Default Memory: 2097152KiB
Max Cores: unlimited
Local CPU Time: 55.54 core·s Overall Runtime: 26.23 s
Worker
Count | Real Time (s)* | CPU Time (core·s) | CPU Wait
(core·s) | Memory (B) | Disk (B)
n | min med* ave max total | min med ave max total | min med
ave max total | min med ave max total | min med ave max
total
3 | 0.34 10.83 10.80 21.23 32.40 | 0.33 10.43 17.94 43.07
53.83 | 0.01 0.40 14.08 41.85 42.25 | 177168Ki 179312Ki
178730Ki 179712Ki 536192Ki | 0Ki 4Ki 22Ki 64Ki 68Ki
Job
Worker Jobs | min med ave max
| 1 1 1.3333 2
Count | Real Time (s)* | CPU Time (core·s) | CPU Wait
(core·s) | Memory (B) | Disk (B)
n | min med* ave max total | min med ave max total | min med
ave max total | min med ave max total | min med ave max
total
4 | 0.33 10.83 8.10 10.85 32.38 | 0.33 10.43 13.46 41.70
53.82 | 0.01 1.68 2.78 9.02 11.10 | 177168Ki 179488Ki
178916Ki 179696Ki 715664Ki | 0Ki 4Ki 18Ki 64Ki 72Ki
multithreadedJob
Total Cores: 4.0
Count | Real Time (s)* | CPU Time (core·s) | CPU Wait
(core·s) | Memory (B) | Disk (B)
n | min med* ave max total | min med ave max total | min med
ave max total | min med ave max total | min med ave max
total
1 | 10.85 10.85 10.85 10.85 10.85 | 41.70 41.70 41.70 41.70
41.70 | 1.68 1.68 1.68 1.68 1.68 | 179488Ki 179488Ki
179488Ki 179488Ki 179488Ki | 4Ki 4Ki 4Ki 4Ki 4Ki
efficientJob
Total Cores: 1.0
Count | Real Time (s)* | CPU Time (core·s) | CPU Wait
(core·s) | Memory (B) | Disk (B)
n | min med* ave max total | min med ave max total | min med
ave max total | min med ave max total | min med ave max
total
1 | 10.83 10.83 10.83 10.83 10.83 | 10.43 10.43 10.43 10.43
10.43 | 0.40 0.40 0.40 0.40 0.40 | 179312Ki 179312Ki
179312Ki 179312Ki 179312Ki | 4Ki 4Ki 4Ki 4Ki 4Ki
inefficientJob
Total Cores: 1.0
Count | Real Time (s)* | CPU Time (core·s) | CPU Wait
(core·s) | Memory (B) | Disk (B)
n | min med* ave max total | min med ave max total | min med
ave max total | min med ave max total | min med ave max
total
1 | 10.38 10.38 10.38 10.38 10.38 | 1.36 1.36 1.36 1.36 1.36
| 9.02 9.02 9.02 9.02 9.02 | 179696Ki 179696Ki 179696Ki
179696Ki 179696Ki | 64Ki 64Ki 64Ki 64Ki 64Ki
doNothing
Total Cores: 1.0
Count | Real Time (s)* | CPU Time (core·s) | CPU Wait
(core·s) | Memory (B) | Disk (B)
n | min med* ave max total | min med ave max total | min med
ave max total | min med ave max total | min med ave max
total
1 | 0.33 0.33 0.33 0.33 0.33 | 0.33 0.33 0.33 0.33 0.33 |
0.01 0.01 0.01 0.01 0.01 | 177168Ki 177168Ki 177168Ki
177168Ki 177168Ki | 0Ki 0Ki 0Ki 0Ki 0Ki
This report gives information on the resources used by your workflow. Note that right now it does NOT track CPU and memory used inside Docker containers , only Singularity containers.
There are three parts to this report.
Overall Summary
At the top is a section with overall summary statistics for the run:
Batch System:
single_machine
Default Cores: 1 Default Memory: 2097152KiB
Max Cores: unlimited
Local CPU Time: 55.54 core·s Overall Runtime: 26.23
s
This lists some important the settings for the Toil batch system that actually executed jobs. It also lists:
|
• |
The CPU time used on the local machine, in core seconds. This includes time used by the Toil leader itself (excluding some startup time), and time used by jobs that run under the leader (which, for the single_machine batch system, is all jobs). It does not include CPU used by jobs that ran on other machines. |
||
|
• |
The overall wall-clock runtime of the workflow in seconds, as measured by the leader. |
These latter two numbers don't count some startup/shutdown time spent loading and saving files, so you still may want to use the time shell built-in to time your Toil runs overall.
Worker Summary
After the overall summary, there is a section with statistics about the Toil worker processes, which Toil used to execute your workflow's jobs:
Worker
Count | Real Time (s)* | CPU Time (core·s) | CPU Wait
(core·s) | Memory (B) | Disk (B)
n | min med* ave max total | min med ave max total | min med
ave max total | min med ave max total | min med ave max
total
3 | 0.34 10.83 10.80 21.23 32.40 | 0.33 10.43 17.94 43.07
53.83 | 0.01 0.40 14.08 41.85 42.25 | 177168Ki 179312Ki
178730Ki 179712Ki 536192Ki | 0Ki 4Ki 22Ki 64Ki 68Ki
|
• |
The Count column shows that, to run this workflow, Toil had to submit 3 Toil worker processes to the backing scheduler. (In this case, it ran them all on the local machine.) |
||
|
• |
The Real Time column shows satistics about the wall clock times that all the worker process took. All the sub-column values are in seconds. |
||
|
• |
The CPU Time column shows statistics about the CPU usage amounts of all the worker processes. All the sub-column values are in core seconds. |
||
|
• |
The CPU Wait column shows statistics about CPU time reserved for but not consumed by worker processes. In this example, the max and total are relatively high compared to both real time and CPU time, indicating that a lot of reserved CPU time went unused. This can indicate that the workflow is overestimating its required cores, that small jobs are running in the same resource reservations as large jobs via chaining, or that the workflow is having to wait around for slow disk I/O. |
||
|
• |
The Memory column shows the peak memory usage of each worker process and its child processes. |
||
|
• |
The Disk column shows the disk usage in each worker. This is polled at the end of each job that is run by the worker, so it may not always reflect the actual peak disk usage. |
Job Breakdown
Finally, there is the breakdown of resource usage by jobs. This starts with a table summarizing the counts of jobs that ran on each worker:
Job
Worker Jobs | min med ave max
| 1 1 1.3333 2
In this example, most of the workers ran one job each, but one worker managed to run two jobs, via chaining. (Jobs will chain when a job has only one dependant job, which in turn depends on only that first job, and the second job needs no more resources than the first job did.)
Next, we have statistics for resource usage over all jobs together:
Count | Real
Time (s)* | CPU Time (core·s) | CPU Wait
(core·s) | Memory (B) | Disk (B)
n | min med* ave max total | min med ave max total | min med
ave max total | min med ave max total | min med ave max
total
4 | 0.33 10.83 8.10 10.85 32.38 | 0.33 10.43 13.46 41.70
53.82 | 0.01 1.68 2.78 9.02 11.10 | 177168Ki 179488Ki
178916Ki 179696Ki 715664Ki | 0Ki 4Ki 18Ki 64Ki 72Ki
And finally, for each kind of job (as determined by the job's displayName ), we have statistics summarizing the resources used by the instances of that kind of job:
multithreadedJob
Total Cores: 4.0
Count | Real Time (s)* | CPU Time (core·s) | CPU Wait
(core·s) | Memory (B) | Disk (B)
n | min med* ave max total | min med ave max total | min med
ave max total | min med ave max total | min med ave max
total
1 | 10.85 10.85 10.85 10.85 10.85 | 41.70 41.70 41.70 41.70
41.70 | 1.68 1.68 1.68 1.68 1.68 | 179488Ki 179488Ki
179488Ki 179488Ki 179488Ki | 4Ki 4Ki 4Ki 4Ki 4Ki
efficientJob
Total Cores: 1.0
Count | Real Time (s)* | CPU Time (core·s) | CPU Wait
(core·s) | Memory (B) | Disk (B)
n | min med* ave max total | min med ave max total | min med
ave max total | min med ave max total | min med ave max
total
1 | 10.83 10.83 10.83 10.83 10.83 | 10.43 10.43 10.43 10.43
10.43 | 0.40 0.40 0.40 0.40 0.40 | 179312Ki 179312Ki
179312Ki 179312Ki 179312Ki | 4Ki 4Ki 4Ki 4Ki 4Ki
inefficientJob
Total Cores: 1.0
Count | Real Time (s)* | CPU Time (core·s) | CPU Wait
(core·s) | Memory (B) | Disk (B)
n | min med* ave max total | min med ave max total | min med
ave max total | min med ave max total | min med ave max
total
1 | 10.38 10.38 10.38 10.38 10.38 | 1.36 1.36 1.36 1.36 1.36
| 9.02 9.02 9.02 9.02 9.02 | 179696Ki 179696Ki 179696Ki
179696Ki 179696Ki | 64Ki 64Ki 64Ki 64Ki 64Ki
doNothing
Total Cores: 1.0
Count | Real Time (s)* | CPU Time (core·s) | CPU Wait
(core·s) | Memory (B) | Disk (B)
n | min med* ave max total | min med ave max total | min med
ave max total | min med ave max total | min med ave max
total
1 | 0.33 0.33 0.33 0.33 0.33 | 0.33 0.33 0.33 0.33 0.33 |
0.01 0.01 0.01 0.01 0.01 | 177168Ki 177168Ki 177168Ki
177168Ki 177168Ki | 0Ki 0Ki 0Ki 0Ki 0Ki
For each job, we first list its name, and then the total cores that it asked for, summed across all instances of it. Then we show a table of statistics.
Here the * marker in the table headers becomes relevant; it shows that jobs are being sorted by the median of the real time used. You can control this with the --sortCategory option.
The columns meanings are the same as for the workers:
|
• |
The Count column shows the number of jobs of each type that ran. |
||
|
• |
The Real Time column shows satistics about the wall clock times that instances of the job type took. All the sub-column values are in seconds. |
||
|
• |
The CPU Time column shows statistics about the CPU usage amounts of each job. Note that multithreadedJob managed to use CPU time at faster than one core second per second, because it reserved multiple cores and ran multiple threads. |
||
|
• |
The CPU Wait column shows statistics about CPU time reserved for but not consumed by jobs. Note that inefficientJob used hardly any of the cores it requested for most of its real time. |
||
|
• |
The Memory column shows the peak memory usage of each job. |
||
|
• |
The Disk column shows the disk usage at the end of each job. It may not always reflect the actual peak disk usage. |
Example Cleanup
Once we're done looking at the stats, we can clean up the job store by running:
toil clean file:my-jobstore
Status Command
Continuing the example from the stats section above, if we ran our workflow with the command
python3 tutorial_stats.py file:my-jobstore --stats
We could interrogate our jobstore with the status command, for example:
toil status file:my-jobstore
If the run was successful, this would not return much valuable information, something like
2018-01-11
19:31:29,739 - toil.lib.bioio - INFO - Root logger is at
level 'INFO', 'toil' logger at level 'INFO'.
2018-01-11 19:31:29,740 - toil.utils.toilStatus - INFO -
Parsed arguments
2018-01-11 19:31:29,740 - toil.utils.toilStatus - INFO -
Checking if we have files for Toil
The root job of the job store is absent, the workflow
completed successfully.
Otherwise, the toil status command will return something like the following:
Of the 3 jobs considered, there are 1 completely failed jobs, 1 jobs with children, 2 jobs ready to run, 0 zombie jobs, 0 jobs with services, 0 services, and 0 jobs with log files currently in FileJobStore(/Users/anovak/workspace/toil/tree).
The toil status command supports several useful flags, including --perJob to get per-job status information, --logs to print stored worker logs, and --failed to list all failed jobs in the workflow. For more information, run toil status --help .
One use case of toil status is with the --printStatus argument. Running toil status --printStatus file:my-jobstore at any point of the workflow's lifecycle can tell you the progress of the workflow. Note: This command will output all current running jobs but not any finished or failed jobs.
For example, after running workflow.py in another terminal:
$ toil status
--printStatus file:my-jobstore
[2024-05-31T13:59:13-0700] [MainThread] [I]
[toil.utils.toilStatus] Traversing the job graph gathering
jobs. This may take a couple of minutes.
Of the 2 jobs considered, there are 0 completely failed
jobs, 1 jobs with children, 1 jobs ready to run, 0 zombie
jobs, 0 jobs with services, 0 services, and 0 jobs with log
files currently in FileJobStore(/path/to/my-jobstore).
Message bus
path: /tmp/tmp9cnaq3bm
Job ID kind-TimeWaster/instance-zvdsdkm_ with name
TimeWaster is running on SingleMachineBatchSystem as ID
101349.
Job ID kind-TimeWaster/instance-7clm8cv2 with name
TimeWaster is running on SingleMachineBatchSystem as ID
101350.
At this moment in time, two jobs with the name "TimeWaster" is running on my local machine.
Clean Command
If a Toil pipeline didn't finish successfully, or was run using --clean=always or --stats , the job store will exist until it is deleted. toil clean <jobStore> ensures that all artifacts associated with a job store are removed. This is particularly useful for deleting AWS job stores, which reserves an SDB domain as well as an S3 bucket.
The deletion of the job store can be modified by the --clean argument, and may be set to always , onError , never , or onSuccess (default).
Temporary directories where jobs are running can also be saved from deletion using the --cleanWorkDir , which has the same options as --clean . This option should only be run when debugging, as intermediate jobs will fill up disk space.
Debug Job Command
If a Toil worklfow fails, and it wasn't run with --clean=always , the failing job will be waiting in the job store to be debugged. (With WDL or CWL workflows, you may have needed to manually set a --jobStore location you can find again.)
You can use toil debug-job on a job in the job store to run it on your local machine, to locally reproduce any error that may have happened during a remote workflow.
The toil debug-job command takes a job store, and the ID or a name of a job in it. If multiple jobs match a job name, and only one seems to have run out of retries and completely failed, it will run that one.
You can also pass the --printJobInfo flag to dump information about the job instead of running it.
Kill Command
To kill all currently running jobs for a given jobstore, use the command
toil kill file:my-jobstore
TOIL DEBUGGING
Toil has a number of tools to assist in debugging. Here we provide help in working through potential problems that a user might encounter in attempting to run a workflow.
Reading the Log
Usually, at the end of a failed Toil worklfow, Toil will reproduce the job logs for the jobs that failed. You can look at the end of your workflow log and use the job logs to identify which jobs are failing and why.
Finding Failed Jobs in the Jobstore
The toil status command ( Status Command ) can be used with the --failed option to list all failed jobs in a Toil job store.
You can also use it with the --logs option to retrieve per-job logs from the job store, for failed jobs that left them. These logs might be useful for diagnosing and fixing the problem.
Running a Job Locally
If you have a failing job's ID or name, you can reproduce its failure on your local machine with toil debug-job . See Debug Job Command .
For example, say you have this WDL workflow in test.wdl . This workflow cannot succeed , due to the typo in the echo command:
version 1.0
workflow test {
call hello
}
task hello {
input {
}
command <<<
set -e
echoo "Hello"
>>>
output {
}
}
You could try to run it with:
toil-wdl-runner --jobStore ./store test.wdl --retryCount 0
But it will fail.
If you want to reproduce the failure later, or on another machine, you can first find out what jobs failed with toil status :
toil status --failed --noAggStats ./store
This will produce something like:
[2024-03-14T17:45:15-0400] [MainThread] [I] [toil.utils.toilStatus] Traversing the job graph gathering jobs. This may take a couple of minutes. Failed jobs: 'WDLTaskJob' test.hello.command kind-WDLTaskJob/instance-r9u6_dcs v6
And we can see a failed job with the display name test.hello.command , which describes the job's location in the WDL workflow as the command section of the hello task called from the test workflow. (If you are writing a Toil Python script, this is the job's displayName .) We can then run that job again locally by name with:
toil debug-job ./store test.hello.command
If there were multiple failed jobs with that name (perhaps because of a WDL scatter), we would need to select one by Toil job ID instead:
toil debug-job ./store kind-WDLTaskJob/instance-r9u6_dcs
And if we know there's only one failed WDL task, we can just tell Toil to rerun the failed WDLTaskJob by Python class name:
toil debug-job ./store WDLTaskJob
Any of these will run the job (including any containers) on the local machine, where its execution can be observed live or monitored with a debugger.
Fetching Job Inputs
The --retrieveTaskDirectory option to toil debug-job allows you to send the input files for a job to a directory, and then stop running the job. It works for CWL and WDL jobs, and for Python workflows that call toil.job.Job.files_downloaded_hook() after downloading their files. It will make the worker work in the specified directory, so the job's temporary directory will be at worker/job inside it. For WDL and CWL jobs that mount files into containers, there will also be an inside directory populated with symlinks to the files as they would be visible from the root of the container's filesystem.
For example, say you have a broken WDL workflow named example_alwaysfail_with_files.wdl , like this:
version 1.0
workflow test {
call make_file as f1
call make_file as f2
call hello {
input:
name_file=f1.out,
unused_file=f2.out
}
}
task make_file {
input {
}
command <<<
echo "These are the contents" >test.txt
>>>
output {
File out = "test.txt"
}
}
task hello {
input {
File name_file
File? unused_file
}
command <<<
set -e
echoo "Hello" "$(cat ˜{name_file})"
>>>
output {
File out = stdout()
}
}
You can try and fail to run it like this:
toil-wdl-runner --jobStore ./store example_alwaysfail_with_files.wdl --retryCount 0
If you then dump the files from the failing job:
toil debug-job ./store WDLTaskJob --retrieveTaskDirectory dumpdir
You will end up with a directory tree that looks, accorfing to tree , something like this:
dumpdir
├── inside
│ └── mnt
│ └── miniwdl_task_container
│ └── work
│ └── _miniwdl_inputs
│ ├── 0
│ │ └── test.txt ->
../../../../../../worker/job/2c6b3dc4-1d21-4abf-9937-db475e6a6bc2/test.txt
│ └── 1
│ └── test.txt ->
../../../../../../worker/job/e3d724e1-e6cc-4165-97f1-6f62ab0fb1ef/test.txt
└── worker
└── job
├── 2c6b3dc4-1d21-4abf-9937-db475e6a6bc2
│ └── test.txt
├── e3d724e1-e6cc-4165-97f1-6f62ab0fb1ef
│ └── test.txt
├── tmpr2j5yaic
├── tmpxqr9__y4
└── work
15 directories, 4 files
You can see where Toil downloaded the input files for the job to the worker's temporary directory, and how they would be mounted into the container.
Interactively Investigating Running Jobs
Say you have a broken WDL workflow that can't complete. Whenever you run tutorial_debugging_hangs.wdl , it hangs:
version 1.1
workflow TutorialDebugging {
input {
Array[String] messages = ["Uh-oh!", "Oh
dear", "Oops"]
}
scatter(message in messages) {
call WhaleSay {
input:
message = message
}
call CountLines
{
input:
to_count = WhaleSay.result
}
}
Array[File] to_compress = flatten([CountLines.result, WhaleSay.result])
call
CompressFiles {
input:
files = to_compress
}
output {
File compressed = CompressFiles.result
}
}
# Draw ASCII
art
task WhaleSay {
input {
String message
}
command
<<<
cowsay "˜{message}"
>>>
output {
File result = stdout()
}
runtime {
container: "docker/whalesay"
}
}
# Count the
lines in a file
task CountLines {
input {
File to_count
}
command
<<<
wc -l ˜{to_count}
>>>
output {
File result = stdout()
}
runtime {
container: ["ubuntu:latest",
"https://gcr.io/standard-images/ubuntu:latest"]
}
}
# Compress
files into a ZIP
task CompressFiles {
input {
Array[File] files
}
command
<<<
set -e
cat >script.py <<'EOF'
import sys
from zipfile import ZipFile
import os
# Interpret
command line arguments
to_compress = list(reversed(sys.argv[1:]))
with
ZipFile("compressed.zip", "w") as z:
while to_compress != []:
# Grab the file to add off the end of the list
input_filename = to_compress[-1]
# Now we need to write this to the zip file.
# What internal filename should we use?
basename = os.path.basename(input_filename)
disambiguation_number = 0
while True:
target_filename = str(disambiguation_number) + basename
try:
z.getinfo(target_filename)
except KeyError:
# Filename is free
break
# Otherwise try another name
disambiguation_number += 1
# Now we can actually make the compressed file
with z.open(target_filename, 'w') as out_stream:
with open(input_filename) as in_stream:
for line in in_stream:
# Prefix each line of text with the original input file
# it came from.
# Also remember to encode the text as the zip file
# stream is in binary mode.
out_stream.write(f"{basename}:
{line}".encode("utf-8"))
EOF
python script.py ˜{sep(" ", files)}
>>>
output {
File result = "compressed.zip"
}
runtime {
container: "python:3.11"
}
}
You can try to run it like this, using Docker containers. Pretend this was actually a run on a large cluster:
$ toil-wdl-runner --jobStore ./store tutorial_debugging_hangs.wdl --container docker
If you run this, it will hang at the TutorialDebugging.CompressFiles.command step:
[2024-06-18T12:12:49-0400] [MainThread] [I] [toil.leader] Issued job 'WDLTaskJob' TutorialDebugging.CompressFiles.command kind-WDLTaskJob/instance-y0ga_907 v1 with job batch system ID: 16 and disk: 2.0 Gi, memory: 2.0 Gi, cores: 1, accelerators: [], preemptible: False
Workflow Progress 94%|██████████▎| 15/16 (0 failures) [00:36<00:02, 0.42 jobs/s]
Say you want to find out why it is stuck. First, you need to kill the workflow. Open a new shell in the same directory and run:
# toil kill ./store
You can also hit Control+C in its terminal window and wait for it to stop.
Then, you need to use toil debug-job to run the stuck job on your local machine:
$ toil debug-job ./store TutorialDebugging.CompressFiles.command
This produces some more informative logging messages, showing that the Docker container is managing to start up, but that it stays running indefinitely, with a repeating message:
[2024-06-18T12:18:00-0400]
[MainThread] [N] [MiniWDLContainers] docker task running ::
service: "lhui2bdzmzmg", task:
"sg371eb2yk", node: "zyu9drdp6a",
message: "started"
[2024-06-18T12:18:01-0400] [MainThread] [D]
[MiniWDLContainers] docker task status :: Timestamp:
"2024-06-18T16:17:58.545272049Z", State:
"running", Message: "started",
ContainerStatus: {"ContainerID":
"b7210b346637210b49e7b6353dd24108bc3632bbf2ce7479829d450df6ee453a",
"PID": 36510, "ExitCode": 0},
PortStatus: {}
[2024-06-18T12:18:03-0400] [MainThread] [D]
[MiniWDLContainers] docker task status :: Timestamp:
"2024-06-18T16:17:58.545272049Z", State:
"running", Message: "started",
ContainerStatus: {"ContainerID":
"b7210b346637210b49e7b6353dd24108bc3632bbf2ce7479829d450df6ee453a",
"PID": 36510, "ExitCode": 0},
PortStatus: {}
[2024-06-18T12:18:04-0400] [MainThread] [D]
[MiniWDLContainers] docker task status :: Timestamp:
"2024-06-18T16:17:58.545272049Z", State:
"running", Message: "started",
ContainerStatus: {"ContainerID":
"b7210b346637210b49e7b6353dd24108bc3632bbf2ce7479829d450df6ee453a",
"PID": 36510, "ExitCode": 0},
PortStatus: {}
...
This also gives you the Docker container ID of the running container, b7210b346637210b49e7b6353dd24108bc3632bbf2ce7479829d450df6ee453a . You can use that to get a shell inside the running container:
$ docker exec
-ti
b7210b346637210b49e7b6353dd24108bc3632bbf2ce7479829d450df6ee453a
bash
root@b7210b346637:/mnt/miniwdl_task_container/work#
Your shell is already in the working directory of the task, so we can inspect the files there to get an idea of how far the task has gotten. Has it managed to create script.py ? Has the script managed to create compressed.zip ? Let's check:
# ls -lah
total 6.1M
drwxrwxr-x 6 root root 192 Jun 18 16:17 .
drwxr-xr-x 3 root root 4.0K Jun 18 16:17 ..
drwxr-xr-x 3 root root 96 Jun 18 16:17 .toil_wdl_runtime
drwxrwxr-x 8 root root 256 Jun 18 16:17 _miniwdl_inputs
-rw-r--r-- 1 root root 6.0M Jun 18 16:23 compressed.zip
-rw-r--r-- 1 root root 1.3K Jun 18 16:17 script.py
So we can see that the script exists, and the zip file also exists. So maybe the script is still running? We can check with ps , but we need the -x option to include processes not under the current shell. We can also include the -u option to get statistics:
# ps -xu
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 2316 808 ? Ss 16:17 0:00 /bin/sh -c /bin/
root 7 0.0 0.0 4208 3056 ? S 16:17 0:00 /bin/bash ../com
root 8 0.1 0.0 4208 1924 ? S 16:17 0:00 /bin/bash ../com
root 20 95.0 0.4 41096 36428 ? R 16:17 7:09 python script.py
root 645 0.0 0.0 4472 3492 pts/0 Ss 16:21 0:00 bash
root 1379 0.0 0.0 2636 764 ? S 16:25 0:00 sleep 1
root 1380 0.0 0.0 8584 3912 pts/0 R+ 16:25 0:00 ps -xu
Here we can see that python is indeed running, and it is using 95% of a CPU core. So we can surmise that Python is probably stuck spinning around in an infinite loop . Let's look at our files again:
# ls -lah
total 8.1M
drwxrwxr-x 6 root root 192 Jun 18 16:17 .
drwxr-xr-x 3 root root 4.0K Jun 18 16:17 ..
drwxr-xr-x 3 root root 96 Jun 18 16:17 .toil_wdl_runtime
drwxrwxr-x 8 root root 256 Jun 18 16:17 _miniwdl_inputs
-rw-r--r-- 1 root root 7.6M Jun 18 2024 compressed.zip
-rw-r--r-- 1 root root 1.3K Jun 18 16:17 script.py
Note that, while we've been investigating, our compressed.zip file has grown from 6.0M to 7.6M . So we now know that, not only is the Python script stuck in a loop, it is also writing to the ZIP file inside that loop.
Let's inspect the inputs:
# ls -lah
_miniwdl_inputs/*
_miniwdl_inputs/0:
total 4.0K
drwxrwxr-x 3 root root 96 Jun 18 16:17 .
drwxrwxr-x 8 root root 256 Jun 18 16:17 ..
-rw-r--r-- 1 root root 65 Jun 18 16:15 stdout.txt
_miniwdl_inputs/1:
total 4.0K
drwxrwxr-x 3 root root 96 Jun 18 16:17 .
drwxrwxr-x 8 root root 256 Jun 18 16:17 ..
-rw-r--r-- 1 root root 65 Jun 18 16:15 stdout.txt
_miniwdl_inputs/2:
total 4.0K
drwxrwxr-x 3 root root 96 Jun 18 16:17 .
drwxrwxr-x 8 root root 256 Jun 18 16:17 ..
-rw-r--r-- 1 root root 65 Jun 18 16:15 stdout.txt
_miniwdl_inputs/3:
total 4.0K
drwxrwxr-x 3 root root 96 Jun 18 16:17 .
drwxrwxr-x 8 root root 256 Jun 18 16:17 ..
-rw-r--r-- 1 root root 384 Jun 18 16:15 stdout.txt
_miniwdl_inputs/4:
total 4.0K
drwxrwxr-x 3 root root 96 Jun 18 16:17 .
drwxrwxr-x 8 root root 256 Jun 18 16:17 ..
-rw-r--r-- 1 root root 387 Jun 18 16:15 stdout.txt
_miniwdl_inputs/5:
total 4.0K
drwxrwxr-x 3 root root 96 Jun 18 16:17 .
drwxrwxr-x 8 root root 256 Jun 18 16:17 ..
-rw-r--r-- 1 root root 378 Jun 18 16:15 stdout.txt
There are the files that are meant to be being compressed into that ZIP file. But, hang on, there are only six of these files, and none of them is over 400 bytes in size. How did we get a multi-megabyte ZIP file? The script must be putting more data than we expected into the ZIP file it is writing.
Taking what we know, we can now inspect the Python script again and see if we can find a way in which it could get stuck in an infinite loop, writing much more data to the ZIP than is actually in the input files . We can also inspect it for WDL variable substitutions (there aren't any). Let's look at it with line numbers using the nl tool, numbering even blank lines with -b a :
# nl -b a
script.py
1 import sys
2 from zipfile import ZipFile
3 import os
4
5 # Interpret command line arguments
6 to_compress = list(reversed(sys.argv[1:]))
7
8 with ZipFile("compressed.zip", "w") as
z:
9 while to_compress != []:
10 # Grab the file to add off the end of the list
11 input_filename = to_compress[-1]
12 # Now we need to write this to the zip file.
13 # What internal filename should we use?
14 basename = os.path.basename(input_filename)
15 disambiguation_number = 0
16 while True:
17 target_filename = str(disambiguation_number) + basename
18 try:
19 z.getinfo(target_filename)
20 except KeyError:
21 # Filename is free
22 break
23 # Otherwise try another name
24 disambiguation_number += 1
25 # Now we can actually make the compressed file
26 with z.open(target_filename, 'w') as out_stream:
27 with open(input_filename) as in_stream:
28 for line in in_stream:
29 # Prefix each line of text with the original input file
30 # it came from.
31 # Also remember to encode the text as the zip file
32 # stream is in binary mode.
33 out_stream.write(f"{basename}:
{line}".encode("utf-8"))
We have three loops here: while to_compress != [] on line 9, while True on line 16, and for line in in_stream on line 28.
The while True loop is immediately suspicious, but none of the code inside it writes to the ZIP file, so we know we can't be stuck in there.
The for line in in_stream loop contains the only call that writes data to the ZIP, so we must be spending time inside it, but it is constrained to loop over a single file at a time, so it can't be the infinite loop we're looking for.
So then we must be infinitely looping at while to_compress != [] , and indeed we can see that to_compress is never modified , so it can never become [] .
So now we have a theory as to what the problem is, and we can exit out of our shell in the container, and stop toil debug-job with Control+C . Then we can make the following change to our workflow, adding code to the script to actually pop the handled files off the end of the list:
|
--- tutorial_debugging_works.wdl |
2024-06-18 12:03:32 |
|||
|
+++ tutorial_debugging_hangs.wdl |
2024-06-18 12:03:53 |
@@ -112,9 +112,6 @@
# Also remember to encode the text as the zip file
# stream is in binary mode.
out_stream.write(f"{basename}:
{line}".encode("utf-8"))
- # Even though we got distracted by zip file manipulation,
remember
- # to pop off the file we just did.
- to_compress.pop()
EOF
python script.py ˜{sep(" ", files)}
>>>
If we apply that change and produce a new file, tutorial_debugging_works.wdl , we can clean up from the old failed run and run a new one:
$ toil clean
./store
$ toil-wdl-runner --jobStore ./store
tutorial_debugging_works.wdl --container docker
This will produce a successful log, ending with something like:
[2024-06-18T12:42:20-0400] [MainThread] [I] [toil.leader] Finished toil run successfully.
Workflow
Progress
100%|███████████|
17/17 (0 failures) [00:24<00:00, 0.72 jobs/s]
{"TutorialDebugging.compressed":
"/Users/anovak/workspace/toil/src/toil/test/docs/scripts/wdl-out-u7fkgqbe/f5e16468-0cf6-4776-a5c1-d93d993c4db2/compressed.zip"}
[2024-06-18T12:42:20-0400] [MainThread] [I] [toil.common]
Successfully deleted the job store:
FileJobStore(/Users/anovak/workspace/toil/src/toil/test/docs/scripts/store)
Note the line to standard output giving us the path on disk where the TutorialDebugging.compressed output from the workflow is. If you look at that ZIP file, you can see it contains the expected files, such as 3stdout.txt , which should contain this suitably prefixed dismayed whale:
stdout.txt:
________
stdout.txt: < Uh-oh! >
stdout.txt: --------
stdout.txt: \
stdout.txt: \
stdout.txt: \
stdout.txt: ## .
stdout.txt: ## ## ## ==
stdout.txt: ## ## ## ## ===
stdout.txt:
/""""""""""""""""___/
===
stdout.txt: ˜˜˜ {˜˜
˜˜˜˜ ˜˜˜
˜˜˜˜ ˜˜ ˜ / ===-
˜˜˜
stdout.txt: \______ o __/
stdout.txt: \ \ __/
stdout.txt: \____\______/
When we're done inspecting the output, and satisfied that the workflow now works, we might want to clean up all the auto-generated WDL output directories from the successful and failed run(s):
$ rm -Rf wdl-out-*
Introspecting the Job Store
Note: Currently these features are only implemented for use locally (single machine) with the fileJobStore.
To view what files currently reside in the jobstore, run the following command:
$ toil
debug-file file:path-to-jobstore-directory \
--listFilesInJobStore
When run from the commandline, this should generate a file containing the contents of the job store (in addition to displaying a series of log messages to the terminal). This file is named "jobstore_files.txt" by default and will be generated in the current working directory.
If one wishes to copy any of these files to a local directory, one can run for example:
$ toil
debug-file file:path-to-jobstore \
--fetch overview.txt *.bam *.fastq \
--localFilePath=/home/user/localpath
To fetch overview.txt , and all .bam and .fastq files. This can be used to recover previously used input and output files for debugging or reuse in other workflows, or use in general debugging to ensure that certain outputs were imported into the jobStore.
Stats and Status
See Stats Command and Status Command for more about gathering statistics about job success, runtime, and resource usage from workflows.
Using a Python debugger
If you execute a workflow using the --debugWorker flag, or if you use toil debug-job , Toil will run the job in the process you started from the command line. This means you can either use pdb , or an IDE that supports debugging Python to interact with the Python process as it runs your job. Note that the --debugWorker flag will only work with the single_machine batch system (the default), and not any of the custom job schedulers.
RUNNING IN THE CLOUD
Toil supports Amazon Web Services (AWS) and Google Compute Engine (GCE) in the cloud and has autoscaling capabilities that can adapt to the size of your workflow, whether your workflow requires 10 instances or 20,000.
Toil does this by creating a virtual cluster running Kubernetes . Kubernetes requires a leader node to coordinate the workflow, and worker nodes to execute the various tasks within the workflow. As the workflow runs, Kubernetes will "autoscale", creating and terminating workers as needed to meet the demands of the workflow. Historically, Toil has spun up clusters with Apache Mesos , but it is no longer recommended.
Once a user is familiar with the basics of running Toil locally (specifying a jobStore , and how to write a workflow), they can move on to the guides below to learn how to translate these workflows into cloud ready workflows.
Managing a Cluster of Virtual Machines (Provisioning)
Toil can launch and manage a cluster of virtual machines to run using the provisioner to run a workflow distributed over several nodes. The provisioner also has the ability to automatically scale up or down the size of the cluster to handle dynamic changes in computational demand (autoscaling). Currently we have working provisioners with AWS and GCE (Azure support has been deprecated).
Toil uses Kubernetes as the Batch System .
See here for instructions for Running in AWS .
See here for instructions for Running in Google Compute Engine (GCE) .
Toil offers a suite of commands for using the provisioners to manage clusters.
Toil Cluster Utilities
In addition to the generic Toil Utilities , there are several utilities used for starting and managing a Toil cluster using the AWS or GCE provisioners. They are installed via the [aws] or [google] extra. For installation details see Toil Provisioner .
The toil cluster subcommands are:
destroy-cluster --- For autoscaling. Terminates the specified cluster and associated resources.
launch-cluster --- For autoscaling. This is used to launch a toil leader instance with the specified provisioner.
rsync-cluster --- For autoscaling. Used to transfer files to a cluster launched with toil launch-cluster .
ssh-cluster --- SSHs into the toil appliance container running on the leader of the cluster.
For information on a specific utility, run it with the --help option:
toil launch-cluster --help
The cluster utilities can be used for Running in Google Compute Engine (GCE) and Running in AWS .
TIP:
By default, all of the cluster utilities expect to be running on AWS. To run with Google you will need to specify the --provisioner gce option for each utility.
NOTE:
Boto must be configured with AWS credentials before using cluster utilities.
Running in Google Compute Engine (GCE) contains instructions for
Launch-Cluster Command
Running toil launch-cluster starts up a leader for a cluster. Workers can be added to the initial cluster by specifying the -w option. An example would be
$ toil
launch-cluster my-cluster \
--leaderNodeType t2.small -z us-west-2a \
--keyPairName your-AWS-key-pair-name \
--nodeTypes m3.large,t2.micro -w 1,4
Options are listed below. These can also be displayed by running
$ toil launch-cluster --help
launch-cluster's main positional argument is the clusterName. This is simply the name of your cluster. If it does not exist yet, Toil will create it for you.
Launch-Cluster Options
|
--help |
-h also accepted. Displays this help menu. |
--tempDirRoot TEMPDIRROOT
Path to the temporary directory where all temp files are created, by default uses the current working directory as the base.
--version
Display version.
--provisioner CLOUDPROVIDER
-p CLOUDPROVIDER also accepted. The provisioner for cluster auto-scaling. Both AWS and GCE are currently supported.
--zone ZONE
-z ZONE also accepted. The availability zone of the leader. This parameter can also be set via the TOIL_AWS_ZONE or TOIL_GCE_ZONE environment variables, or by the ec2_region_name parameter in your .boto file if using AWS, or derived from the instance metadata if using this utility on an existing EC2 instance.
--leaderNodeType LEADERNODETYPE
Non-preemptable node type to use for the cluster leader.
--keyPairName KEYPAIRNAME
The name of the AWS or ssh key pair to include on the instance.
--owner OWNER
The owner tag for all instances. If not given, the value in TOIL_OWNER_TAG will be used, or else the value of --keyPairName .
--boto BOTOPATH
The path to the boto credentials directory. This is transferred to all nodes in order to access the AWS jobStore from non-AWS instances.
--tag KEYVALUE
KEYVALUE is specified as KEY=VALUE. -t KEY=VALUE also accepted. Tags are added to the AWS cluster for this node and all of its children. Tags are of the form: -t key1=value1 --tag key2=value2 . Multiple tags are allowed and each tag needs its own flag. By default the cluster is tagged with: { "Name": clusterName, "Owner": IAM username }.
--vpcSubnet VPCSUBNET
VPC subnet ID to launch cluster leader in. Uses default subnet if not specified. This subnet needs to have auto assign IPs turned on.
--nodeTypes NODETYPES
Comma-separated list of node types to create while launching the leader. The syntax for each node type depends on the provisioner used. For the AWS provisioner this is the name of an EC2 instance type followed by a colon and the price in dollars to bid for a spot instance, for example 'c3.8xlarge:0.42'. Must also provide the --workers argument to specify how many workers of each node type to create.
--workers WORKERS
-w WORKERS also accepted. Comma-separated list of the number of workers of each node type to launch alongside the leader when the cluster is created. This can be useful if running toil without auto-scaling but with need of more hardware support.
--leaderStorage LEADERSTORAGE
Specify the size (in gigabytes) of the root volume for the leader instance. This is an EBS volume.
--nodeStorage NODESTORAGE
Specify the size (in gigabytes) of the root volume for any worker instances created when using the -w flag. This is an EBS volume.
--nodeStorageOverrides NODESTORAGEOVERRIDES
Comma-separated list of nodeType:nodeStorage that are used to override the default value from --nodeStorage for the specified nodeType(s). This is useful for heterogeneous jobs where some tasks require much more disk than others.
--allowFuse BOOL
Whether to allow FUSE mounts for faster runtimes with Singularity. Note: This will result in the Toil container running as privileged. For Kubernetes, pods will be asked to run as privileged. If this is not allowed, Singularity containers will use sandbox directories instead.
Logging Options
--logOff
Same as --logCritical .
--logCritical
Turn on logging at level CRITICAL and above. (default is INFO)
--logError
Turn on logging at level ERROR and above. (default is INFO)
--logWarning
Turn on logging at level WARNING and above. (default is INFO)
--logInfo
Turn on logging at level INFO and above. (default is INFO)
--logDebug
Turn on logging at level DEBUG and above. (default is INFO)
--logDebug
Turn on logging at level TRACE and above. (default is INFO)
--logLevel LOGLEVEL
Log at given level (may be either OFF (or CRITICAL), ERROR, WARN (or WARNING), INFO, DEBUG, or TRACE). (default is INFO)
--logFile LOGFILE
File to log in.
--rotatingLogging
Turn on rotating logging, which prevents log files getting too big.
Ssh-Cluster Command
Toil provides the ability to ssh into the leader of the cluster. This can be done as follows:
$ toil ssh-cluster CLUSTER-NAME-HERE
This will open a shell on the Toil leader and is used to start an Running a Workflow with Autoscaling run. Issues with docker prevent using screen and tmux when sshing the cluster (The shell doesn't know that it is a TTY which prevents it from allocating a new screen session). This can be worked around via
$ script
$ screen
Simply running screen within script will get things working properly again.
Finally, you can execute remote commands with the following syntax:
$ toil ssh-cluster CLUSTER-NAME-HERE remoteCommand
It is not advised that you run your Toil workflow using remote execution like this unless a tool like nohup is used to ensure the process does not die if the SSH connection is interrupted.
For an example usage, see Running a Workflow with Autoscaling .
Rsync-Cluster Command
The most frequent use case for the rsync-cluster utility is deploying your workflow code to the Toil leader. Note that the syntax is the same as traditional rsync with the exception of the hostname before the colon. This is not needed in toil rsync-cluster since the hostname is automatically determined by Toil.
Here is an example of its usage:
$ toil
rsync-cluster CLUSTER-NAME-HERE \
˜/localFile :/remoteDestination
Destroy-Cluster Command
The destroy-cluster command is the advised way to get rid of any Toil cluster launched using the Launch-Cluster Command command. It ensures that all attached nodes, volumes, security groups, etc. are deleted. If a node or cluster is shut down using Amazon's online portal residual resources may still be in use in the background. To delete a cluster run
$ toil destroy-cluster CLUSTER-NAME-HERE
Storage (Toil jobStore)
Toil can make use of cloud storage such as AWS or Google buckets to take care of storage needs.
This is useful when running Toil in single machine mode on any cloud platform since it allows you to make use of their integrated storage systems.
For an overview of the job store see Job Store .
For instructions configuring a particular job store see:
|
• |
AWS Job Store |
|||
|
• |
Google Job Store |
CLOUD PLATFORMS
Running on Kubernetes
Kubernetes is a very popular container orchestration tool that has become a de facto cross-cloud-provider API for accessing cloud resources. Major cloud providers like Amazon , Microsoft , Kubernetes owner Google , and DigitalOcean have invested heavily in making Kubernetes work well on their platforms, by writing their own deployment documentation and developing provider-managed Kubernetes-based products. Using minikube , Kubernetes can even be run on a single machine.
Toil supports running Toil workflows against a Kubernetes cluster, either in the cloud or deployed on user-owned hardware.
Preparing your Kubernetes environment
|
1. |
Get a Kubernetes cluster |
To run Toil workflows on Kubernetes, you need to have a Kubernetes cluster set up. This will not be covered here, but there are many options available, and which one you choose will depend on which cloud ecosystem if any you use already, and on pricing. If you are just following along with the documentation, use minikube on your local machine.
Alternatively, Toil can set up a Kubernetes cluster for you with the Toil provisioner . Follow this guide to get started with a Toil-managed Kubernetes cluster on AWS.
Note that currently the only way to run a Toil workflow on Kubernetes is to use the AWS Job Store, so your Kubernetes workflow will currently have to store its data in Amazon's cloud regardless of where you run it. This can result in significant egress charges from Amazon if you run it outside of Amazon.
Kubernetes Cluster Providers:
|
• |
Your own institution |
|||
|
• |
Amazon EKS |
|||
|
• |
Microsoft Azure AKS |
|||
|
• |
Google GKE |
|||
|
• |
DigitalOcean Kubernetes |
|||
|
• |
minikube |
|||
|
2. |
Get a Kubernetes context on your local machine
There are two main ways to run Toil workflows on Kubernetes. You can either run the Toil leader on a machine outside the cluster, with jobs submitted to and run on the cluster, or you can submit the Toil leader itself as a job and have it run inside the cluster. Either way, you will need to configure your own machine to be able to submit jobs to the Kubernetes cluster. Generally, this involves creating and populating a file named .kube/config in your user's home directory, and specifying the cluster to connect to, the certificate and token information needed for mutual authentication, and the Kubernetes namespace within which to work. However, Kubernetes configuration can also be picked up from other files in the .kube directory, environment variables, and the enclosing host when running inside a Kubernetes-managed container.
You will have to do different things here depending on where you got your Kubernetes cluster:
|
• |
Configuring for Amazon EKS |
|||
|
• |
Configuring for Microsoft Azure AKS |
|||
|
• |
Configuring for Google GKE |
|||
|
• |
Configuring for DigitalOcean Kubernetes Clusters |
|||
|
• |
Configuring for minikube |
Toil's internal Kubernetes configuration logic mirrors that of the kubectl command. Toil workflows will use the current kubectl context to launch their Kubernetes jobs.
|
3. |
If running the Toil leader in the cluster, get a service account |
If you are going to run your workflow's leader within the Kubernetes cluster (see Option 1: Running the Leader Inside Kubernetes ), you will need a service account in your chosen Kubernetes namespace. Most namespaces should have a service account named default which should work fine. If your cluster requires you to use a different service account, you will need to obtain its name and use it when launching the Kubernetes job containing the Toil leader.
|
4. |
Set up appropriate permissions |
Your local Kubernetes context and/or the service account you are using to run the leader in the cluster will need to have certain permissions in order to run the workflow. Toil needs to be able to interact with jobs and pods in the cluster, and to retrieve pod logs. You as a user may need permission to set up an AWS credentials secret, if one is not already available. Additionally, it is very useful for you as a user to have permission to interact with nodes, and to shell into pods.
The appropriate permissions may already be available to you and your service account by default, especially in managed or ease-of-use-optimized setups such as EKS or minikube.
However, if the appropriate permissions are not already available, you or your cluster administrator will have to grant them manually. The following Role ( toil-user ) and ClusterRole ( node-reader ), to be applied with kubectl apply -f filename.yaml , should grant sufficient permissions to run Toil workflows when bound to your account and the service account used by Toil workflows. Be sure to replace YOUR_NAMESPACE_HERE with the namespace you are running your workflows in
apiVersion:
rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: YOUR_NAMESPACE_HERE
name: toil-user
rules:
- apiGroups: ["*"]
resources: ["*"]
verbs: ["explain", "get",
"watch", "list", "describe",
"logs", "attach", "exec",
"port-forward", "proxy", "cp",
"auth"]
- apiGroups: ["batch"]
resources: ["*"]
verbs: ["get", "watch",
"list", "create", "run",
"set", "delete"]
- apiGroups: [""]
resources: ["secrets", "pods",
"pods/attach", "podtemplates",
"configmaps", "events",
"services"]
verbs: ["patch", "get",
"update", "watch", "list",
"create", "run", "set",
"delete", "exec"]
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
apiVersion:
rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: node-reader
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list",
"describe"]
- apiGroups: [""]
resources: ["namespaces"]
verbs: ["get", "list",
"describe"]
- apiGroups: ["metrics.k8s.io"]
resources: ["*"]
verbs: ["*"]
To bind a user or service account to the Role or ClusterRole and actually grant the permissions, you will need a RoleBinding and a ClusterRoleBinding , respectively. Make sure to fill in the namespace, username, and service account name, and add more user stanzas if your cluster is to support multiple Toil users.
apiVersion:
rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: toil-developer-member
namespace: toil
subjects:
- kind: User
name: YOUR_KUBERNETES_USERNAME_HERE
apiGroup: rbac.authorization.k8s.io
- kind: ServiceAccount
name: YOUR_SERVICE_ACCOUNT_NAME_HERE
namespace: YOUR_NAMESPACE_HERE
roleRef:
kind: Role
name: toil-user
apiGroup: rbac.authorization.k8s.io
apiVersion:
rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: read-nodes
subjects:
- kind: User
name: YOUR_KUBERNETES_USERNAME_HERE
apiGroup: rbac.authorization.k8s.io
- kind: ServiceAccount
name: YOUR_SERVICE_ACCOUNT_NAME_HERE
namespace: YOUR_NAMESPACE_HERE
roleRef:
kind: ClusterRole
name: node-reader
apiGroup: rbac.authorization.k8s.io
AWS Job Store for Kubernetes
Currently, the only job store, which is what Toil uses to exchange data between jobs, that works with jobs running on Kubernetes is the AWS Job Store. This requires that the Toil leader and Kubernetes jobs be able to connect to and use Amazon S3 and Amazon SimpleDB. It also requires that you have an Amazon Web Services account.
|
1. |
Get access to AWS S3 and SimpleDB |
In your AWS account, you need to create an AWS access key. First go to the IAM dashboard; for "us-west1", the link would be:
https://console.aws.amazon.com/iam/home?region=us-west-1#/home
Then create an access key, and save the Access Key ID and the Secret Key. As documented in the AWS documentation :
|
1. |
On the IAM Dashboard page, choose your account name in the navigation bar, and then choose My Security Credentials. |
||
|
2. |
Expand the Access keys (access key ID and secret access key) section. |
||
|
3. |
Choose Create New Access Key. Then choose Download Key File to save the access key ID and secret access key to a file on your computer. After you close the dialog box, you can't retrieve this secret access key again. |
Make sure that, if your AWS infrastructure requires your user to authenticate with a multi-factor authentication (MFA) token, you obtain a second secret key and access key that don't have this requirement. The secret key and access key used to populate the Kubernetes secret that allows the jobs to contact the job store need to be usable without human intervention.
|
2. |
Configure AWS access from the local machine |
This only really needs to happen if you run the leader on the local machine. But we need the files in place to fill in the secret in the next step. Run:
$ aws configure
Then when prompted, enter your secret key and access key. This should create a file ˜/.aws/credentials that looks like this:
[default]
aws_access_key_id = BLAH
aws_secret_access_key = blahblahblah
|
3. |
Create a Kubernetes secret to give jobs access to AWS |
Go into the directory where the credentials file is:
$ cd ˜/.aws
Then, create a Kubernetes secret that contains it. We'll call it aws-credentials :
$ kubectl create secret generic aws-credentials --from-file credentials
Configuring Toil for your Kubernetes environment
To configure your workflow to run on Kubernetes, you will have to configure several environment variables, in addition to passing the --batchSystem kubernetes option. Doing the research to figure out what values to give these variables may require talking to your cluster provider.
|
1. |
TOIL_AWS_SECRET_NAME is the most important, and must be set to the secret that contains your AWS credentials file, if your cluster nodes don't otherwise have access to S3 and SimpleDB (such as through IAM roles). This is required for the AWS job store to work, which is currently the only job store that can be used on Kubernetes. In this example we are using aws-credentials . |
||
|
2. |
TOIL_KUBERNETES_HOST_PATH can be set to allow Toil jobs on the same physical host to share a cache. It should be set to a path on the host where the shared cache should be stored. It will be mounted as /var/lib/toil , or at TOIL_WORKDIR if specified, inside the container. This path must already exist on the host, and must have as much free space as your Kubernetes node offers to jobs. In this example, we are using /data/scratch . To actually make use of caching, make sure not to use --disableCaching . |
||
|
3. |
TOIL_KUBERNETES_OWNER should be set to the username of the user running the Toil workflow. The jobs that Toil creates will include this username, so they can be more easily recognized, and cleaned up by the user if anything happens to the Toil leader. In this example we are using demo-user . |
||
|
4. |
TOIL_KUBERNETES_PRIVILEGED can be set to True or False. When true, this allows pods to run as privileged, enabling FUSE mounts for Singularity for faster runtimes. If this is not set to true, Singularity will extract images to sandbox directories. This is unset/False by default except in Toil-managed clusters. |
Note that Docker containers cannot be run inside of unprivileged Kubernetes pods (which are themselves containers). The Docker daemon does not (yet) support this. Other tools, such as Singularity in its user-namespace mode, are able to run containers from within containers. If using Singularity to run containerized tools, and you want downloaded container images to persist between Toil jobs, some setup may be required:
On non-Toil managed clusters: You will also want to set TOIL_KUBERNETES_HOST_PATH , and make sure that Singularity is downloading its containers under the Toil work directory ( /var/lib/toil by default) by setting SINGULARITY_CACHEDIR .
On Toil-managed clusters: On clusters created with the launch-cluster command, no setup is required. TOIL_KUBERNETES_HOST_PATH is already set to /var/lib/toil . SINGULARITY_CACHEDIR is set to /var/lib/toil/singularity which is a shared location; however, you may need to implement Singularity locking as shown below or change the Singularity cache location to somewhere else.
If using toil-wdl-runner , all the necessary locking for Singularity is already in place and no work should be necessary. Else, for both Toil managed and non-Toil managed clusters, you will need to make sure that no two jobs try to download the same container at the same time; Singularity has no synchronization or locking around its cache, but the cache is also not safe for simultaneous access by multiple Singularity invocations. Some Toil workflows use their own custom workaround logic for this problem; for example, see this section in toil-wdl-runner .
Running workflows
To run the workflow, you will need to run the Toil leader process somewhere. It can either be run inside Kubernetes as a Kubernetes job, or outside Kubernetes as a normal command.
Option 1: Running the Leader Inside Kubernetes
Once you have determined a set of environment variable values for your workflow run, write a YAML file that defines a Kubernetes job to run your workflow with that configuration. Some configuration items (such as your username, and the name of your AWS credentials secret) need to be written into the YAML so that they can be used from the leader as well.
Note that the leader pod will need your workflow, its other dependencies, and Toil all installed. An easy way to get Toil installed is to start with the Toil appliance image for the version of Toil you want to use. In this example, we use quay.io/ucsc_cgl/toil:5.5.0 .
Here's an example YAML file to run a test workflow:
apiVersion:
batch/v1
kind: Job
metadata:
# It is good practice to include your username in your job
name.
# Also specify it in TOIL_KUBERNETES_OWNER
name: demo-user-toil-test
# Do not try and rerun the leader job if it fails
spec:
backoffLimit: 0
template:
spec:
# Do not restart the pod when the job fails, but keep it
around so the
# log can be retrieved
restartPolicy: Never
volumes:
- name: aws-credentials-vol
secret:
# Make sure the AWS credentials are available as a volume.
# This should match TOIL_AWS_SECRET_NAME
secretName: aws-credentials
# You may need to replace this with a different service
account name as
# appropriate for your cluster.
serviceAccountName: default
containers:
- name: main
image: quay.io/ucsc_cgl/toil:5.5.0
env:
# Specify your username for inclusion in job names
- name: TOIL_KUBERNETES_OWNER
value: demo-user
# Specify where to find the AWS credentials to access the
job store with
- name: TOIL_AWS_SECRET_NAME
value: aws-credentials
# Specify where per-host caches should be stored, on the
Kubernetes hosts.
# Needs to be set for Toil's caching to be efficient.
- name: TOIL_KUBERNETES_HOST_PATH
value: /data/scratch
volumeMounts:
# Mount the AWS credentials volume
- mountPath: /root/.aws
name: aws-credentials-vol
resources:
# Make sure to set these resource limits to values large
enough
# to accommodate the work your workflow does in the leader
# process, but small enough to fit on your cluster.
#
# Since no request values are specified, the limits are also
used
# for the requests.
limits:
cpu: 2
memory: "4Gi"
ephemeral-storage: "10Gi"
command:
- /bin/bash
- -c
- |
# This Bash script will set up Toil and the workflow to run,
and run them.
set -e
# We make sure to create a work directory; Toil can't
hot-deploy a
# Python file from the root of the filesystem, which is
where we start.
mkdir /tmp/work
cd /tmp/work
# We make a virtual environment to allow workflow
dependencies to be
# hot-deployed.
#
# We don't really make use of it in this example, but for
workflows
# that depend on PyPI packages we will need this.
#
# We use --system-site-packages so that the Toil installed
in the
# appliance image is still available.
virtualenv --python python3 --system-site-packages venv
. venv/bin/activate
# Now we install the workflow. Here we're using a demo
workflow
# from Toil itself.
wget
https://raw.githubusercontent.com/DataBiosphere/toil/releases/4.1.0/src/toil/test/docs/scripts/tutorial_helloworld.py
# Now we run the workflow. We make sure to use the
Kubernetes batch
# system and an AWS job store, and we set some generally
useful
# logging options. We also make sure to enable caching.
python3 tutorial_helloworld.py \
aws:us-west-2:demouser-toil-test-jobstore \
--batchSystem kubernetes \
--realTimeLogging \
--logInfo
You can save this YAML as leader.yaml , and then run it on your Kubernetes installation with:
$ kubectl apply -f leader.yaml
To monitor the progress of the leader job, you will want to read its logs. If you are using a Kubernetes dashboard such as k9s , you can simply find the pod created for the job in the dashboard, and view its logs there. If not, you will need to locate the pod by hand.
Monitoring and Debugging Kubernetes Jobs and Pods
The following techniques are most useful for looking at the pod which holds the Toil leader, but they can also be applied to individual Toil jobs on Kubernetes, even when the leader is outside the cluster.
Kubernetes names pods for jobs by appending a short random string to the name of the job. You can find the name of the pod for your job by doing:
$ kubectl get
pods | grep demo-user-toil-test
demo-user-toil-test-g5496 1/1 Running 0 2m
Assuming you have set TOIL_KUBERNETES_OWNER correctly, you should be able to find all of your workflow's pods by searching for your username:
$ kubectl get pods | grep demo-user
If the status of a pod is anything other than Pending , you will be able to view its logs with:
$ kubectl logs demo-user-toil-test-g5496
This will dump the pod's logs from the beginning to now and terminate. To follow along with the logs from a running pod, add the -f option:
$ kubectl logs -f demo-user-toil-test-g5496
A status of ImagePullBackoff suggests that you have requested to use an image that is not available. Check the image section of your YAML if you are looking at a leader, or the value of TOIL_APPLIANCE_SELF if you are delaying with a worker job. You also might want to check your Kubernetes node's Internet connectivity and DNS function; in Kubernetes, DNS depends on system-level pods which can be terminated or evicted in cases of resource oversubscription, just like user workloads.
If your pod seems to be stuck Pending , ContainerCreating , you can get information on what is wrong with it by using kubectl describe pod :
$ kubectl describe pod demo-user-toil-test-g5496
Pay particular attention to the Events: section at the end of the output. An indication that a job is too big for the available nodes on your cluster, or that your cluster is too busy for your jobs, is FailedScheduling events:
Type Reason Age
From Message
---- ------ ---- ---- -------
Warning FailedScheduling 13s (x79 over 100m)
default-scheduler 0/4 nodes are available: 1 Insufficient
cpu, 1 Insufficient ephemeral-storage, 4 Insufficient
memory.
If a pod is running but seems to be behaving erratically, or seems stuck, you can shell into it and look around:
$ kubectl exec -ti demo-user-toil-test-g5496 /bin/bash
One common cause of stuck pods is attempting to use more memory than allowed by Kubernetes (or by the Toil job's memory resource requirement), but in a way that does not trigger the Linux OOM killer to terminate the pod's processes. In these cases, the pod can remain stuck at nearly 100% memory usage more or less indefinitely, and attempting to shell into the pod (which needs to start a process within the pod, using some of its memory) will fail. In these cases, the recommended solution is to kill the offending pod and increase its (or its Toil job's) memory requirement, or reduce its memory needs by adapting user code.
When Things Go Wrong
The Toil Kubernetes batch system includes cleanup code to terminate worker jobs when the leader shuts down. However, if the leader pod is removed by Kubernetes, is forcibly killed or otherwise suffers a sudden existence failure, it can go away while its worker jobs live on. It is not recommended to restart a workflow in this state, as jobs from the previous invocation will remain running and will be trying to modify the job store concurrently with jobs from the new invocation.
To clean up dangling jobs, you can use the following snippet:
$ kubectl get jobs | grep demo-user | cut -f1 -d' ' | xargs -n10 kubectl delete job
This will delete all jobs with demo-user 's username in their names, in batches of 10. You can also use the UUID that Toil assigns to a particular workflow invocation in the filter, to clean up only the jobs pertaining to that workflow invocation.
Option 2: Running the Leader Outside Kubernetes
If you don't want to run your Toil leader inside Kubernetes, you can run it locally instead. This can be useful when developing a workflow; files can be hot-deployed from your local machine directly to Kubernetes. However, your local machine will have to have (ideally role-assumption- and MFA-free) access to AWS, and access to Kubernetes. Real time logging will not work unless your local machine is able to listen for incoming UDP packets on arbitrary ports on the address it uses to contact the IPv4 Internet; Toil does no NAT traversal or detection.
Note that if you set TOIL_WORKDIR when running your workflow like this, it will need to be a directory that exists both on the host and in the Toil appliance.
Here is an example of running our test workflow leader locally, outside of Kubernetes:
$ export
TOIL_KUBERNETES_OWNER=demo-user # This defaults to your
local username if not set
$ export TOIL_AWS_SECRET_NAME=aws-credentials
$ export TOIL_KUBERNETES_HOST_PATH=/data/scratch
$ virtualenv --python python3 --system-site-packages venv
$ . venv/bin/activate
$ wget
https://raw.githubusercontent.com/DataBiosphere/toil/releases/4.1.0/src/toil/test/docs/scripts/tutorial_helloworld.py
$ python3 tutorial_helloworld.py \
aws:us-west-2:demouser-toil-test-jobstore \
--batchSystem kubernetes \
--realTimeLogging \
--logInfo
Running CWL Workflows
Running CWL workflows on Kubernetes can be challenging, because executing CWL can require toil-cwl-runner to orchestrate containers of its own, within a Kubernetes job running in the Toil appliance container.
Normally, running a CWL workflow should Just Work, as long as the workflow's Docker containers are able to be executed with Singularity, your Kubernetes cluster does not impose extra capability-based confinement (i.e. SELinux, AppArmor) that interferes with Singularity's use of user-mode namespaces, and you make sure to configure Toil so that its workers know where to store their data within the Kubernetes pods (which would be done for you if using a Toil-managed cluster). For example, you should be able to run a CWL workflow like this:
$ export
TOIL_KUBERNETES_OWNER=demo-user # This defaults to your
local username if not set
$ export TOIL_AWS_SECRET_NAME=aws-credentials
$ export TOIL_KUBERNETES_HOST_PATH=/data/scratch
$ virtualenv --python python3 --system-site-packages venv
$ . venv/bin/activate
$ pip install toil[kubernetes,cwl]==5.8.0
$ toil-cwl-runner \
--jobStore aws:us-west-2:demouser-toil-test-jobstore \
--batchSystem kubernetes \
--realTimeLogging \
--logInfo \
--disableCaching \
path/to/cwl/workflow \
path/to/cwl/input/object
Additional cwltool options that your workflow might require, such as --no-match-user , can be passed to toil-cwl-runner , which inherits most cwltool options.
AppArmor and Singularity
Kubernetes clusters based on Ubuntu hosts often will have AppArmor enabled on the host. AppArmor is a capability-based security enhancement system that integrates with the Linux kernel to enforce lists of things which programs may or may not do, called profiles . For example, an AppArmor profile could be applied to a web server process to stop it from using the mount() system call to manipulate the filesystem, because it has no business doing that under normal circumstances but might attempt to do it if compromised by hackers.
Kubernetes clusters also often use Docker as the backing container runtime, to run pod containers. When AppArmor is enabled, Docker will load an AppArmor profile and apply it to all of its containers by default, with the ability for the profile to be overridden on a per-container basis. This profile unfortunately prevents some of the mount() system calls that Singularity uses to set up user-mode containers from working inside the pod, even though these calls would be allowed for an unprivileged user under normal circumstances.
On the UCSC Kubernetes cluster, we configure our Ubuntu hosts with an alternative default AppArmor profile for Docker containers which allows these calls. Other solutions include turning off AppArmor on the host, configuring Kubernetes with a container runtime other than Docker, or - using Kubernetes's AppArmor integration to apply a more permissive profile or the unconfined profile to pods that Toil launches.
Toil does not yet have a way to apply a container.apparmor.security.beta.kubernetes.io/runner-container: unconfined annotation to its pods, as described in the Kubernetes AppArmor documentation . This feature is tracked in issue #4331 .
Running in AWS
Toil jobs can be run on a variety of cloud platforms. Of these, Amazon Web Services (AWS) is currently the best-supported solution. Toil provides the Toil Cluster Utilities to conveniently create AWS clusters, connect to the leader of the cluster, and then launch a workflow. The leader handles distributing the jobs over the worker nodes and autoscaling to optimize costs.
The Running a Workflow with Autoscaling section details how to create a cluster and run a workflow that will dynamically scale depending on the workflow's needs.
The Static Provisioning section explains how a static cluster (one that won't automatically change in size) can be created and provisioned (grown, shrunk, destroyed, etc.).
Preparing your AWS environment
To use Amazon Web Services (AWS) to run Toil or to just use S3 to host the files during the computation of a workflow, first set up and configure an account with AWS:
|
1. |
If necessary, create and activate an AWS account |
||
|
2. |
Next, generate a key pair for AWS with the command (do NOT generate your key pair with the Amazon browser): |
$ ssh-keygen -t rsa
|
3. |
This should prompt you to save your key. Please save it in |
˜/.ssh/id_rsa
|
4. |
Now move this to where your OS can see it as an authorized key: |
$ cat ˜/.ssh/id_rsa.pub >> ˜/.ssh/authorized_keys
|
5. |
Next, you'll need to add your key to the ssh-agent : |
$ eval
`ssh-agent -s`
$ ssh-add
If your key has a passphrase, you will be prompted to enter it here once.
|
6. |
You'll also need to chmod your private key (good practice but also enforced by AWS): |
$ chmod 400 id_rsa
|
7. |
Now you'll need to add the key to AWS via the browser. For example, on us-west1, this address would accessible at: |
https://us-west-1.console.aws.amazon.com/ec2/v2/home?region=us-west-1#KeyPairs:sort=keyName
|
8. |
Now click on the "Import Key Pair" button to add your key: |
Adding an Amazon Key Pair .UNINDENT
|
9. |
Next, you need to create an AWS access key. First go to the IAM dashboard, again; for "us-west1", the example link would be here: |
https://console.aws.amazon.com/iam/home?region=us-west-1#/home
|
10. |
The directions (transcribed from: - https://docs.aws.amazon.com/general/latest/gr/managing-aws-access-keys.html ) are now: |
1.
|
On the IAM Dashboard page, choose your account name in the navigation bar, and then choose My Security Credentials. |
|||
|
2. |
Expand the Access keys (access key ID and secret access key) section. |
||
|
3. |
Choose Create New Access Key. Then choose Download Key File to save the access key ID and secret access key to a file on your computer. After you close the dialog box, you can't retrieve this secret access key again. |
||
|
11. |
Now you should have a newly generated "AWS Access Key ID" and "AWS Secret Access Key". We can now install the AWS CLI and make sure that it has the proper credentials:
$ pip install awscli --upgrade --user
|
12. |
Now configure your AWS credentials with: |
$ aws configure
|
13. |
Add your "AWS Access Key ID" and "AWS Secret Access Key" from earlier and your region and output format: |
" AWS
Access Key ID [****************Q65Q]: "
" AWS Secret Access Key [****************G0ys]: "
" Default region name [us-west-1]: "
" Default output format [json]: "
This will create the files ˜/.aws/config and ˜/.aws/credentials .
|
14. |
If not done already, install toil (example uses version 5.12.0, but we recommend the latest release): |
$ virtualenv
venv
$ source venv/bin/activate
$ pip install toil[all]==5.12.0
|
15. |
Now that toil is installed and you are running a virtualenv, an example of launching a toil leader node would be the following (again, note that we set TOIL_APPLIANCE_SELF to toil version 5.3.0 in this example, but please set the version to the installed version that you are using if you're using a different version): |
$ toil
launch-cluster <cluster-name> \
--clusterType kubernetes \
--leaderNodeType t2.medium \
--nodeTypes t2.medium -w 1 \
--zone us-west-1a \
--keyPairName id_rsa
To further break down each of these commands:
toil launch-cluster --- Base command in toil to launch a cluster.
<cluster-name> --- Just choose a name for your cluster.
--clusterType kubernetes --- Specify the type of cluster to coordinate and execute your workflow. Kubernetes is the recommended option.
--leaderNodeType t2.medium --- Specify the leader node type. Make a t2.medium (2CPU; 4Gb RAM; $0.0464/Hour). List of available AWS instances: https://aws.amazon.com/ec2/pricing/on-demand/
--nodeTypes t2.medium -w 1 --- Specify the worker node type and the number of worker nodes to launch. The Kubernetes cluster requires at least 1 worker node.
--zone us-west-1a --- Specify the AWS zone you want to launch the instance in. Must have the same prefix as the zone in your awscli credentials (which, in the example of this tutorial is: "us-west-1").
--keyPairName id_rsa --- The name of your key pair, which should be "id_rsa" if you've followed this tutorial.
NOTE:
You can set the TOIL_AWS_TAGS environment variable to a JSON object to specify arbitrary tags for AWS resources. For example, if you export TOIL_AWS_TAGS='{"project-name": "variant-calling"}' in your shell before using Toil, AWS resources created by Toil will be tagged with a project-name tag with the value variant-calling .
You can also set the TOIL_APPLIANCE_SELF environment variable to one of the Toil project's Docker images , if you would like to launch a cluster using a different version of Toil than the one you have installed.
AWS Job Store
Using the AWS job store is straightforward after you've finished Preparing your AWS environment ; all you need to do is specify the prefix for the job store name.
To run the sort example sort example with the AWS job store you would type
$ python3 sort.py aws:us-west-2:my-aws-sort-jobstore
Toil Provisioner
The Toil provisioner is the component responsible for creating resources in Amazon's cloud. It is included in Toil alongside the [aws] extra and allows us to spin up a cluster.
Getting started with the provisioner is simple:
|
1. |
Make sure you have Toil installed with the AWS extras. For detailed instructions see Installing Toil with Extra Features . |
||
|
2. |
You will need an AWS account and you will need to save your AWS credentials on your local machine. For help setting up an AWS account see here . For setting up your AWS credentials follow instructions here . |
The Toil provisioner makes heavy use of the Toil Appliance, a Docker image that bundles Toil and all its requirements (e.g. Kubernetes). This makes deployment simple across platforms, and you can even simulate a cluster locally (see Developing with Docker for details).
Choosing Toil Appliance Image
When using the Toil provisioner, the appliance image will be automatically chosen based on the pip-installed version of Toil on your system. That choice can be overridden by setting the environment variables TOIL_DOCKER_REGISTRY and TOIL_DOCKER_NAME or TOIL_APPLIANCE_SELF . See Environment Variables for more information on these variables. If you are developing with autoscaling and want to test and build your own appliance have a look at Developing with Docker .
For information on using the Toil Provisioner have a look at Running a Workflow with Autoscaling .
Details about Launching a Cluster in AWS
Using the provisioner to launch a Toil leader instance is simple using the launch-cluster command. For example, to launch a Kubernetes cluster named "my-cluster" with a t2.medium leader in the us-west-2a zone, run
(venv) $ toil
launch-cluster my-cluster \
--clusterType kubernetes \
--leaderNodeType t2.medium \
--nodeTypes t2.medium -w 1 \
--zone us-west-2a \
--keyPairName <AWS-key-pair-name>
The cluster name is used to uniquely identify your cluster and will be used to populate the instance's Name tag. Also, the Toil provisioner will automatically tag your cluster with an Owner tag that corresponds to your keypair name to facilitate cost tracking. In addition, the ToilNodeType tag can be used to filter "leader" vs. "worker" nodes in your cluster.
The leaderNodeType is an EC2 instance type . This only affects the leader node.
The --zone parameter specifies which EC2 availability zone to launch the cluster in. Alternatively, you can specify this option via the TOIL_AWS_ZONE environment variable. Note: the zone is different from an EC2 region. A region corresponds to a geographical area like us-west-2 (Oregon) , and availability zones are partitions of this area like us-west-2a .
By default, Toil creates an IAM role for each cluster with sufficient permissions to perform cluster operations (e.g. full S3, EC2, and SDB access). If the default permissions are not sufficient for your use case (e.g. if you need access to ECR), you may create a custom IAM role with all necessary permissions and set the --awsEc2ProfileArn parameter when launching the cluster. Note that your custom role must at least have these permissions in order for the Toil cluster to function properly.
In addition, Toil creates a new security group with the same name as the cluster name with default rules (e.g. opens port 22 for SSH access). If you require additional security groups, you may use the --awsEc2ExtraSecurityGroupId parameter when launching the cluster. Note: Do not use the same name as the cluster name for the extra security groups as any security group matching the cluster name will be deleted once the cluster is destroyed.
For more information on options try:
(venv) $ toil launch-cluster --help
Static Provisioning
Toil can be used to manage a cluster in the cloud by using the Toil Cluster Utilities . The cluster utilities also make it easy to run a toil workflow directly on this cluster. We call this static provisioning because the size of the cluster does not change. This is in contrast with Running a Workflow with Autoscaling .
To launch worker nodes alongside the leader we use the -w option:
(venv) $ toil
launch-cluster my-cluster \
--clusterType kubernetes \
--leaderNodeType t2.small -z us-west-2a \
--keyPairName <AWS-key-pair-name> \
--nodeTypes m3.large,t2.micro -w 1,4 \
--zone us-west-2a
This will spin up a leader node of type t2.small with five additional workers --- one m3.large instance and four t2.micro.
Currently static provisioning is only possible during the cluster's creation. The ability to add new nodes and remove existing nodes via the native provisioner is in development. Of course the cluster can always be deleted with the Destroy-Cluster Command utility.
Uploading Workflows
Now that our cluster is launched, we use the Rsync-Cluster Command utility to copy the workflow to the leader. For a simple workflow in a single file this might look like
(venv) $ toil rsync-cluster -z us-west-2a my-cluster toil-workflow.py :/
NOTE:
If your toil workflow has dependencies have a look at the Auto-Deployment section for a detailed explanation on how to include them.
Running a Workflow with Autoscaling
Toil can create an autoscaling Kubernetes cluster for you using the AWS provisioner. Autoscaling is a feature of running Toil in a cloud whereby additional cloud instances are launched as needed to run the workflow.
NOTE:
Make sure you've done the AWS setup in Preparing your AWS environment .
To set up a Kubernetes cluster, simply use the --clusterType=kubernetes command line option to toil launch-cluster . To make it autoscale, specify a range of possible node counts for a node type (such as -w 1-4 ). The cluster will automatically add and remove nodes, within that range, depending on how many seem to be needed to run the jobs submitted to the cluster.
For example, to launch a Toil cluster with a Kubernetes scheduler, run:
(venv) $ toil
launch-cluster <cluster-name> \
--provisioner=aws \
--clusterType kubernetes \
--zone us-west-2a \
--keyPairName <AWS-key-pair-name> \
--leaderNodeType t2.medium \
--leaderStorage 50 \
--nodeTypes t2.medium -w 1-4 \
--nodeStorage 20 \
--logDebug
Behind the scenes, Toil installs kubeadm and configures the kubelet on the Toil leader and all worker nodes. This Toil cluster can then schedule jobs using Kubernetes.
NOTE:
You should set at least one worker node, otherwise Kubernetes would not be able to schedule any jobs. It is also normal for this step to take a while.
As a demonstration, we will use sort.py again, but run it on a Toil cluster with Kubernetes. First, download this file and put it to the current working directory.
We then need to copy over the workflow file and SSH into the cluster:
(venv) $ toil
rsync-cluster -z us-west-2a <cluster-name> sort.py
:/root
(venv) $ toil ssh-cluster -z us-west-2a
<cluster-name>
Remember to replace <cluster-name> with your actual cluster name, and feel free to use your own cluster configuration and/or workflow files. For more information on this step, see the corresponding section of the Static Provisioning tutorial.
IMPORTANT:
Some important caveats about starting a toil run through an ssh session are explained in the Ssh-Cluster Command section.
Now that we are inside the cluster, a Kubernetes environment should already be configured and running. To verify this, simply run:
$ kubectl get nodes
You should see a leader node with the Ready status. Depending on the number of worker nodes you set to create upfront, you should also see them displayed here.
Additionally, you can also verify that the metrics server is running:
$ kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"
If there is a JSON response (similar to the output below), and you are not seeing any errors, that means the metrics server is set up and running, and you are good to start running workflows.
{"kind":"NodeMetricsList","apiVersion":"metrics.k8s.io/v1beta1", ...}
NOTE:
It'll take a while for all nodes to get set up and running, so you might not be able to see all nodes running at first. You can start running workflows already, but Toil might complain until the necessary resources are set up and running.
Now we can run the workflow:
$ python3
sort.py \
--batchSystem kubernetes \
aws:<region>:<job-store-name>
Make sure to replace <region> and <job-store-name> . It is required to use a cloud-accessible job store like AWS or Google when using the Kubernetes batch system.
The sort workflow should start running on the Kubernetes cluster set up by Toil. This workflow would take a while to execute, so you could put the job in the background and monitor the Kubernetes cluster using kubectl . For example, you can check out the pods that are running:
$ kubectl get pods
You should see an output like:
NAME READY
STATUS RESTARTS AGE
root-toil-a864e1b0-2e1f-48db-953c-038e5ad293c7-11-4cwdl 0/1
ContainerCreating 0 85s
root-toil-a864e1b0-2e1f-48db-953c-038e5ad293c7-14-5dqtk 0/1
Completed 0 18s
root-toil-a864e1b0-2e1f-48db-953c-038e5ad293c7-7-gkwc9 0/1
ContainerCreating 0 107s
root-toil-a864e1b0-2e1f-48db-953c-038e5ad293c7-9-t7vsb 1/1
Running 0 96s
If a pod failed for whatever reason or if you want to make sure a pod isn't stuck, you can use kubectl describe pod <pod-name> or kubectl logs <pod-name> to inspect the pod.
If everything is successful, you should be able to see an output file from the sort workflow:
$ head sortedFile.txt
You can now run your own workflows!
Preemptibility
Toil can run on a heterogeneous cluster of both preemptible and non-preemptible nodes. Being a preemptible node simply means that the node may be shut down at any time, while jobs are running. These jobs can then be restarted later somewhere else.
A node type can be specified as preemptible by adding a spot bid in dollars, after a colon, to its entry in the list of node types provided with the --nodeTypes flag. If spot instance prices rise above your bid, the preemptible nodes will be shut down.
For example, this cluster will have both preemptible and non-preemptible nodes:
(venv) $ toil
launch-cluster <cluster-name> \
--provisioner=aws \
--clusterType kubernetes \
--zone us-west-2a \
--keyPairName <AWS-key-pair-name> \
--leaderNodeType t2.medium \
--leaderStorage 50 \
--nodeTypes t2.medium -w 1-4 \
--nodeTypes t2.large:0.20 -w 1-4 \
--nodeStorage 20 \
--logDebug
Individual jobs can explicitly specify whether they should be run on preemptible nodes via the boolean preemptible resource requirement in Toil's Python API. In CWL, this is exposed as a hint UsePreemptible in the http://arvados.org/cwl# namespace (usually imported as arv ). In WDL, this is exposed as a runtime attribute preemptible as recognized by Cromwell. Toil's Kubernetes batch system will prefer to schedule preemptible jobs on preemptible nodes.
If a job is not specified to be preemptible, the job will not run on preemptible nodes even if preemptible nodes are available, unless the workflow is run with the --defaultPreemptible flag. The --defaultPreemptible flag will allow jobs without an explicit preemptible requirement to run on preemptible machines. For example:
$ python3
/root/sort.py aws:us-west-2:<my-jobstore-name> \
--batchSystem kubernetes \
--defaultPreemptible
Specify Preemptibility Carefully
Ensure that your choices for --nodeTypes and --maxNodes <> make sense for your workflow and won't cause it to hang. You should make sure the provisioner is able to create nodes large enough to run the largest job in the workflow, and that non-preemptible node types are allowed if there are non-preemptible jobs in the workflow.
Using MinIO and S3-Compatible object stores
Toil can be configured to access files stored in an S3-compatible object store such as MinIO . The following environment variables can be used to configure the S3 connection used:
|
• |
TOIL_S3_HOST : the IP address or hostname to use for connecting to S3 |
||
|
• |
TOIL_S3_PORT : the port number to use for connecting to S3, if needed |
||
|
• |
TOIL_S3_USE_SSL : enable or disable the usage of SSL for connecting to S3 ( True by default) |
Examples:
TOIL_S3_HOST=127.0.0.1
TOIL_S3_PORT=9010
TOIL_S3_USE_SSL=False
In-Workflow Autoscaling with Mesos
Instead of the normal Kubernetes-based autoscaling, you can also use Toil's old Mesos-based autoscaling method, where the scaling logic runs inside the Toil workflow. With this approach, a Toil cluster can only run one workflow at a time. This method also does not work on the ARM architecture.
In this mode, the --preemptibleCompensation flag can be used to handle cases where preemptible nodes may not be available but are required for your workflow. With this flag enabled, the autoscaler will attempt to compensate for a shortage of preemptible nodes of a certain type by creating non-preemptible nodes of that type, if non-preemptible nodes of that type were specified in --nodeTypes .
NOTE:
This approach is deprecated, because the Mesos project is no longer publishing up-to-date builds.
|
1. |
Download sort.py |
||
|
2. |
Launch a Mesos leader node in AWS using the Launch-Cluster Command command, without using any ranges of node counts: |
(venv) $ toil
launch-cluster <cluster-name> \
--clusterType mesos \
--keyPairName <AWS-key-pair-name> \
--leaderNodeType t2.medium \
--zone us-west-2a
|
3. |
Copy the sort.py workflow up to the leader node: |
(venv) $ toil rsync-cluster -z us-west-2a <cluster-name> sort.py :/root
|
4. |
Login to the leader node: |
(venv) $ toil ssh-cluster -z us-west-2a <cluster-name>
|
5. |
Run the workflow with in-workflow autoscaling, specifying a provisioner and node types and counts as workflow arguments: |
$ python3
/root/sort.py aws:us-west-2:<my-jobstore-name> \
--provisioner aws \
--nodeTypes c3.large \
--maxNodes 2 \
--batchSystem mesos
NOTE:
In this example, the autoscaling Toil code creates up to two instances of type c3.large and launches Mesos agent containers inside them. The containers are then available to run jobs defined by the sort.py workflow. Toil also creates a bucket in S3 called aws:us-west-2:autoscaling-sort-jobstore to store intermediate job results. The Toil autoscaler can also provision multiple different node types, which is useful for workflows that have jobs with varying resource requirements. For example, one could execute the workflow with --nodeTypes c3.large,r3.xlarge --maxNodes 5,1 , which would allow the provisioner to create up to five c3.large nodes and one r3.xlarge node for memory-intensive jobs. In this situation, the autoscaler would avoid creating the more expensive r3.xlarge node until needed, running most jobs on the c3.large nodes.
|
1. |
View the generated file to sort: |
$ head fileToSort.txt
|
2. |
View the sorted file: |
$ head sortedFile.txt
Dashboard
Toil provides a dashboard for viewing the RAM and CPU usage of each node, the number of issued jobs of each type, the number of failed jobs, and the size of the jobs queue. To launch this dashboard for a Toil workflow, pass the --metrics flag on the workflow's command line. The dashboard can then be viewed in your browser at localhost:3000 while connected to the leader node through toil ssh-cluster :
To change the default port number, you can use the --grafana_port argument:
(venv) $ toil ssh-cluster -z us-west-2a --grafana_port 8000 <cluster-name>
On AWS, the dashboard keeps track of every node in the cluster to monitor CPU and RAM usage, but it can also be used while running a workflow on a single machine. The dashboard uses Grafana as the front end for displaying real-time plots, and Prometheus for tracking metrics exported by toil: [image]
In order to use the dashboard for a non-released toil version, you will have to build the containers locally with make docker , since the prometheus, grafana, and mtail containers used in the dashboard are tied to a specific toil version.
Running in Google Compute Engine (GCE)
Toil supports a provisioner with Google, and a Google Job Store . To get started, follow instructions for Preparing your Google environment .
Preparing your Google environment
Toil supports using the Google Cloud Platform . Setting this up is easy!
|
1. |
Make sure that the google extra ( Installing Toil with Extra Features ) is installed |
||
|
2. |
Follow Google's Instructions to download credentials and set the GOOGLE_APPLICATION_CREDENTIALS environment variable |
||
|
3. |
Create a new ssh key with the proper format. To create a new ssh key run the command |
$ ssh-keygen -t rsa -f ˜/.ssh/id_rsa -C [USERNAME]
where [USERNAME] is something like jane@example.com . Make sure to leave your password blank.
WARNING:
This command could overwrite an old ssh key you may be using. If you have an existing ssh key you would like to use, it will need to be called id_rsa and it needs to have no password set.
Make sure only you can read the SSH keys:
$ chmod 400 ˜/.ssh/id_rsa ˜/.ssh/id_rsa.pub
|
4. |
Add your newly formatted public key to Google. To do this, log into your Google Cloud account and go to metadata section under the Compute tab. [image] |
Near the top of the screen click on 'SSH Keys', then edit, add item, and paste the key. Then save: [image]
For more details look at Google's instructions for adding SSH keys .
Google Job Store
To use the Google Job Store you will need to set the GOOGLE_APPLICATION_CREDENTIALS environment variable by following - Google's instructions .
Then to run the sort example with the Google job store you would type
$ python3 sort.py google:my-project-id:my-google-sort-jobstore
Running a Workflow with Autoscaling
WARNING:
Google Autoscaling is in beta!
The steps to run a GCE workflow are similar to those of AWS ( Running a Workflow with Autoscaling ), except you will need to explicitly specify the --provisioner gce option which otherwise defaults to aws .
|
1. |
Download sort.py |
||
|
2. |
Launch the leader node in GCE using the Launch-Cluster Command command: |
(venv) $ toil
launch-cluster <CLUSTER-NAME> \
--provisioner gce \
--leaderNodeType n1-standard-1 \
--keyPairName <SSH-KEYNAME> \
--zone us-west1-a
Where <SSH-KEYNAME> is the first part of [USERNAME] used when setting up your ssh key. For example if [USERNAME] was - jane@example.com , <SSH-KEYNAME> should be jane .
The --keyPairName option is for an SSH key that was added to the Google account. If your ssh key [USERNAME] was jane@example.com , then your key pair name will be just jane .
|
3. |
Upload the sort example and ssh into the leader: |
(venv) $ toil
rsync-cluster --provisioner gce <CLUSTER-NAME> sort.py
:/root
(venv) $ toil ssh-cluster --provisioner gce
<CLUSTER-NAME>
|
4. |
Run the workflow: |
$ python3
/root/sort.py
google:<PROJECT-ID>:<JOBSTORE-NAME> \
--provisioner gce \
--batchSystem mesos \
--nodeTypes n1-standard-2 \
--maxNodes 2
|
5. |
Clean up: |
$ exit # this
exits the ssh from the leader node
(venv) $ toil destroy-cluster --provisioner gce
<CLUSTER-NAME>
HPC ENVIRONMENTS
Toil is a flexible framework that can be leveraged in a variety of environments, including high-performance computing (HPC) environments. Toil provides support for a number of batch systems, including Grid Engine , Slurm , Torque and LSF , which are popular schedulers used in these environments. Toil also supports HTCondor , which is a popular scheduler for high-throughput computing (HTC). To use one of these batch systems specify the --batchSystem argument to the workflow.
Due to the cost and complexity of maintaining support for these schedulers we currently consider all but Slurm to be "community supported", that is the core development team does not regularly test or develop support for these systems. However, there are members of the Toil community currently deploying Toil in a wide variety of HPC environments and we welcome external contributions.
Developing the support of a new or existing batch system involves extending the abstract batch system class toil.batchSystems.abstractBatchSystem.AbstractBatchSystem .
Running on Slurm
When running Toil workflows on Slurm, you usually want to run the workflow itself from the head node. Toil will take care of running all the required sbatch commands for you. You probably do not want to submit the Toil workflow as a Slurm job with sbatch (although you can if you have a large number of workflows to run). You also probably do not want to manually allocate resources with sallocate .
To run a Toil workflow on Slurm, include --batchSystem slurm in your command line arguments. Generally Slurm clusters have shared filesystems, meaning the file job store would be appropriate. You want to make sure to use a job store location that is shared across your Slurm cluster. Additionally, you will likely want to provide another shared directory with the --batchLogsDir option, to allow the Slurm job logs to be retrieved by Toil in case something goes wrong with a job.
For example, to run the sort example sort example on Slurm, assuming you are currently in a shared directory, you would type, on the cluster head node:
$ mkdir -p logs
$ python3 sort.py ./store --batchSystem slurm --batchLogsDir
./logs
Slurm Tips
|
1. |
If using Toil workflows that run containers with Singularity on Slurm (such as WDL workflows), you will want to make sure that Singularity caching, and Toil's MiniWDL caching, use a shared directory across your cluster nodes. By default, Toil will configure Singularity to cache per-workflow and per-node, but in Slurm a shared filesystem is almost always available. Assuming your home directory is shared, to set this up, you can: |
$ echo 'export
SINGULARITY_CACHEDIR="${HOME}/.singularity/cache"'
>>˜/.bashrc
$ echo 'export
MINIWDL__SINGULARITY__IMAGE_CACHE="${HOME}/.cache/miniwdl"'
>>˜/.bashrc
Then make sure to log out and back in again for the setting to take effect.
|
2. |
If your home directory is not shared across the cluster nodes, make sure that you have installed Toil in such a way that it is in your PATH on the cluster nodes. |
||
|
3. |
Slurm sandboxing and resource limitation does not apply to Docker containers, because there is no relationship between the sandbox cgroup that your Toil job runs in and the sandbox cgroup that the Docker daemon creates to run the Docker container your job requested to run. If you want your Toil jobs' containers to actually be inside their Slurm job resource allocations, you should make sure to run containers with Singularity or another user-mode or daemon-less containerization system. |
||
|
4. |
Slurm can sometimes report that a job has finished before that job's changes to the cluster's shared filesystem are visible to other nodes or to the head node. Toil tries to anticipate and compensate for this situation, but there is no amount of waiting or retrying that Toil could do to guarantee correct behavior in theory in these situations; the shared filesystem could in theory be days or months behind. In practice, the delay is usually no more than a few seconds, and Toil can handle it. But if you are seeing odd behavior from Toil related to files not existing when they should or still existing when they shouldn't, your problem could be that your cluster's filesystem is unusually slow to reach consistency across nodes. |
||
|
5. |
If you see warnings about XDG_RUNTIME_DIR , your Slurm cluster might not be managing XDG login sessions correctly for Slurm jobs. Toil can work around this, but as a result of the workaround it might have trouble finding an appropriate "coordination directory" where it can store state files local to each Slurm node. If you are seeing unusual behavior like Toil jobs on one node waiting for operations on a different node, you can try giving Toil a path to a per-node, writable directory with the --coordinationDir option, to tell it where to put those files explicitly. |
||
|
6. |
With a shared filesystem, Toil's caching system is not necessarily going to help your workflow. Try running and timing test workflows with --caching true and with --caching false , to determine whether it is worth it for your workload to copy files from the shared filesystem to local storage on each node. |
||
|
7. |
If running CWL workflows on Slurm, with a shared filesystem, you can try the --bypass-file-store option to toil-cwl-runner . It may speed up your workflow, but you may also need to make sure to change Toil's work directory to a shared directory provided with the --workDir option in order for it to work properly across machines. |
Standard Output/Error from Batch System Jobs
Standard output and error from batch system jobs (except for the Mesos batch system) are redirected to files in the toil-<workflowID> directory created within the temporary directory specified by the --workDir option; see Commandline Options . Each file is named as follows: toil_job_<Toil job ID>_batch_<name of batch system>_<job ID from batch system>_<file description>.log , where <file description> is std_output for standard output, and std_error for standard error. HTCondor will also write job event log files with <file description> = job_events .
If capturing standard output and error is desired, --workDir will generally need to be on a shared file system; otherwise if these are written to local temporary directories on each node (e.g. /tmp ) Toil will not be able to retrieve them. Alternatively, the --noStdOutErr option forces Toil to discard all standard output and error from batch system jobs.
WORKFLOW EXECUTION SERVICE (WES)
The GA4GH Workflow Execution Service (WES) is a standardized API for submitting and monitoring workflows. Toil has experimental support for setting up a WES server and executing CWL, WDL, and Toil workflows using the WES API. More information about the WES API specification can be found here .
To get started with the Toil WES server, make sure that the server extra ( Installing Toil with Extra Features ) is installed.
Preparing your WES environment
The WES server requires Celery to distribute and execute workflows. To set up Celery:
|
1. |
Start RabbitMQ, which is the broker between the WES server and Celery workers: |
docker run -d --name wes-rabbitmq -p 5672:5672 rabbitmq:3.9.5
|
2. |
Start Celery workers: |
celery -A toil.server.celery_app worker --loglevel=INFO
Starting a WES server
To start a WES server on the default port 8080, run the Toil command:
$ toil server
The WES API will be hosted on the following URL:
http://localhost:8080/ga4gh/wes/v1
To use another port, e.g.: 3000, you can specify the --port argument:
$ toil server --port 3000
There are many other command line options. Help information can be found by using this command:
$ toil server --help
Below is a
detailed summary of all server-specific options:
--debug
Enable debug mode.
--bypass_celery
Skip sending workflows to Celery and just run them under the server. For testing.
--host HOST
The host interface that the Toil server binds on. (default: "127.0.0.1").
--port PORT
The port that the Toil server listens on. (default: 8080).
--swagger_ui
If True, the swagger UI will be enabled and hosted on the {api_base_path}/ui endpoint. (default: False)
|
--cors |
Enable Cross Origin Resource Sharing (CORS). This should only be turned on if the server is intended to be used by a website or domain. (default: False). |
--cors_origins CORS_ORIGIN
Ignored if --cors is False. This sets the allowed origins for CORS. For details about CORS and its security risks, see the - GA4GH docs on CORS . (default: "*").
--workers WORKERS , -w WORKERS
Ignored if --debug is True. The number of worker processes launched by the WSGI server. (default: 2).
--work_dir WORK_DIR
The directory where workflows should be stored. This directory should be empty or only contain previous workflows. (default: './workflows').
--state_store STATE_STORE
The local path or S3 URL where workflow state metadata should be stored. (default: in --work_dir )
--opt OPT , -o OPT
Specify the default parameters to be sent to the workflow engine for each run. Options taking arguments must use = syntax. Accepts multiple values. Example: --opt=--logLevel=CRITICAL --opt=--workDir=/tmp .
--dest_bucket_base DEST_BUCKET_BASE
Direct CWL workflows to save output files to dynamically generated unique paths under the given URL. Supports AWS S3.
--wes_dialect DIALECT
Restrict WES responses to a dialect compatible with clients that do not fully implement the WES standard. (default: 'standard')
Running the Server with docker-compose
Instead of manually setting up the server components ( toil server , RabbitMQ, and Celery), you can use the following docker-compose.yml file to orchestrate and link them together.
Make sure to change the credentials for basic authentication by updating the traefik.http.middlewares.auth.basicauth.users label. The passwords can be generated with tools like htpasswd like this . (Note that single $ signs need to be replaced with $$ in the yaml file).
When running on a different host other than localhost , make sure to change the Host to your tartget host in the traefik.http.routers.wes.rule and traefik.http.routers.wespublic.rule labels.
You can also change /tmp/toil-workflows if you want Toil workflows to live somewhere else, and create the directory before starting the server.
In order to run workflows that require Docker, the docker.sock socket must be mounted as volume for Celery. Additionally, the TOIL_WORKDIR directory (defaults to: /var/lib/toil ) and /var/lib/cwl (if running CWL workflows with DockerRequirement ) should exist on the host and also be mounted as volumes.
Also make sure to run it behind a firewall; it opens up the Toil server on port 8080 to anyone who connects.
#
docker-compose.yml
version: "3.8"
services:
rabbitmq:
image: rabbitmq:3.9.5
hostname: rabbitmq
celery:
image: ${TOIL_APPLIANCE_SELF}
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- /var/lib/docker:/var/lib/docker
- /var/lib/toil:/var/lib/toil
- /var/lib/cwl:/var/lib/cwl
- /tmp/toil-workflows:/tmp/toil-workflows
command: celery --broker=amqp://guest:guest@rabbitmq:5672//
-A toil.server.celery_app worker --loglevel=INFO
depends_on:
- rabbitmq
wes-server:
image: ${TOIL_APPLIANCE_SELF}
volumes:
- /tmp/toil-workflows:/tmp/toil-workflows
environment:
- TOIL_WES_BROKER_URL=amqp://guest:guest@rabbitmq:5672//
command: toil server --host 0.0.0.0 --port 8000 --work_dir
/tmp/toil-workflows
expose:
- 8000
labels:
- "traefik.enable=true"
-
"traefik.http.routers.wes.rule=Host(`localhost`)"
- "traefik.http.routers.wes.entrypoints=web"
- "traefik.http.routers.wes.middlewares=auth"
-
"traefik.http.middlewares.auth.basicauth.users=test:$$2y$$12$$ci.4U63YX83CwkyUrjqxAucnmi2xXOIlEF6T/KdP9824f1Rf1iyNG"
-
"traefik.http.routers.wespublic.rule=Host(`localhost`)
&& Path(`/ga4gh/wes/v1/service-info`)"
depends_on:
- rabbitmq
- celery
traefik:
image: traefik:v2.2
command:
- "--providers.docker"
- "--providers.docker.exposedbydefault=false"
- "--entrypoints.web.address=:8080"
ports:
- "8080:8080"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Further customization can also be made as needed. For example, if you have a domain, you can set up HTTPS with Let's Encrypt .
Once everything is configured, simply run docker-compose up to start the containers. Run docker-compose down to stop and remove all containers.
NOTE:
docker-compose is not installed on the Toil appliance by default. See the following section to set up the WES server on a Toil cluster.
Running on a Toil cluster
To run the server on a Toil leader instance on EC2:
|
1. |
Launch a Toil cluster with the toil launch-cluster command with the AWS provisioner |
||
|
2. |
SSH into your cluster with the --sshOption=-L8080:localhost:8080 option to forward port 8080 |
||
|
3. |
Install Docker Compose by running the following commands from the - Docker docs : |
curl -L
"https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname
-s)-$(uname -m)" -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
# check
installation
docker-compose --version
or, install a different version of Docker Compose by changing "1.29.2" to another version.
|
4. |
Copy the docker-compose.yml file from ( Running the Server with docker-compose ) to an empty directory, and modify the configuration as needed. |
||
|
5. |
Now, run docker-compose up -d to start the WES server in detach mode on the Toil appliance. |
||
|
6. |
To stop the server, run docker-compose down . |
WES API Endpoints
As defined by the GA4GH WES API specification, the following endpoints with base path ga4gh/wes/v1/ are supported by Toil:
When running the WES server with the docker-compose setup above, most endpoints (except GET /service-info ) will be protected with basic authentication. Make sure to set the Authorization header with the correct credentials when submitting or retrieving a workflow.
Submitting a Workflow
Now that the WES API is up and running, we can submit and monitor workflows remotely using the WES API endpoints. A workflow can be submitted for execution using the POST /runs endpoint.
As a quick example, we can submit the example CWL workflow from Running a basic CWL workflow to our WES API:
# example.cwl
cwlVersion: v1.0
class: CommandLineTool
baseCommand: echo
stdout: output.txt
inputs:
message:
type: string
inputBinding:
position: 1
outputs:
output:
type: stdout
using cURL:
$ curl
--location --request POST
'http://localhost:8080/ga4gh/wes/v1/runs' \
--user test:test \
--form 'workflow_url="example.cwl"' \
--form 'workflow_type="cwl"' \
--form 'workflow_type_version="v1.0"' \
--form 'workflow_params="{\"message\":
\"Hello world!\"}"' \
--form
'workflow_attachment=@"./toil_test_files/example.cwl"'
{
"run_id":
"4deb8beb24894e9eb7c74b0f010305d1"
}
Note that the --user argument is used to attach the basic authentication credentials along with the request. Make sure to change test:test to the username and password you configured for your WES server. Alternatively, you can also set the Authorization header manually as "Authorization: Basic base64_encoded_auth" .
If the workflow is submitted successfully, a JSON object containing a run_id will be returned. The run_id is a unique identifier of your requested workflow, which can be used to monitor or cancel the run.
There are a few required parameters that have to be set for all workflow submissions, which are the following:
Additionally, the following optional parameters are also available:
For more details about these parameters, refer to the Run Workflow section in the WES API spec.
Upload multiple files
Looking at the body of the request of the previous example, note that the workflow_url is a relative URL that refers to the example.cwl file uploaded from the local path ./toil_test_files/example.cwl .
To specify the file name (or subdirectory) of the remote destination file, set the filename field in the Content-Disposition header. You could also upload more than one file by providing the workflow_attachment parameter multiple times with different files.
This can be shown by the following example:
$ curl
--location --request POST
'http://localhost:8080/ga4gh/wes/v1/runs' \
--user test:test \
--form 'workflow_url="example.cwl"' \
--form 'workflow_type="cwl"' \
--form 'workflow_type_version="v1.0"' \
--form 'workflow_params="{\"message\":
\"Hello world!\"}"' \
--form
'workflow_attachment=@"./toil_test_files/example.cwl"'
\
--form
'workflow_attachment=@"./toil_test_files/2.fasta";filename=inputs/test.fasta'
\
--form
'workflow_attachment=@"./toil_test_files/2.fastq";filename=inputs/test.fastq'
On the server, the execution directory would have the following structure from the above request:
execution/
├── example.cwl
├── inputs
│ ├── test.fasta
| └── test.fastq
└── wes_inputs.json
Specify Toil options
To pass Toil-specific parameters to the workflow, you can include the workflow_engine_parameters parameter along with your request.
For example, to set the logging level to INFO , and change the working directory of the workflow, simply include the following as workflow_engine_parameters :
{"--logLevel": "INFO", "--workDir": "/tmp/"}
These options would be appended at the end of existing parameters during command construction, which would override the default parameters if provided. (Default parameters that can be passed multiple times would not be overridden).
Monitoring a Workflow
With the run_id returned when submitting the workflow, we can check the status or get the full logs of the workflow run.
Checking the state
The GET /runs/{run_id}/status endpoint can be used to get a simple result with the overall state of your run:
$ curl --user
test:test
http://localhost:8080/ga4gh/wes/v1/runs/4deb8beb24894e9eb7c74b0f010305d1/status
{
"run_id":
"4deb8beb24894e9eb7c74b0f010305d1",
"state": "RUNNING"
}
The possible states here are: QUEUED , INITIALIZING , RUNNING , COMPLETE , EXECUTOR_ERROR , SYSTEM_ERROR , CANCELING , and CANCELED .
Getting the full logs
To get the detailed information about a workflow run, use the GET /runs/{run_id} endpoint:
$ curl --user
test:test
http://localhost:8080/ga4gh/wes/v1/runs/4deb8beb24894e9eb7c74b0f010305d1
{
"run_id":
"4deb8beb24894e9eb7c74b0f010305d1",
"request": {
"workflow_attachment": [
"example.cwl"
],
"workflow_url": "example.cwl",
"workflow_type": "cwl",
"workflow_type_version": "v1.0",
"workflow_params": {
"message": "Hello world!"
}
},
"state": "RUNNING",
"run_log": {
"cmd": [
"toil-cwl-runner
--outdir=/home/toil/workflows/4deb8beb24894e9eb7c74b0f010305d1/outputs
--jobStore=file:/home/toil/workflows/4deb8beb24894e9eb7c74b0f010305d1/toil_job_store
/home/toil/workflows/4deb8beb24894e9eb7c74b0f010305d1/execution/example.cwl
/home/workflows/4deb8beb24894e9eb7c74b0f010305d1/execution/wes_inputs.json"
],
"start_time": "2021-08-30T17:35:50Z",
"end_time": null,
"stdout": null,
"stderr": null,
"exit_code": null
},
"task_logs": [],
"outputs": {}
}
Canceling a run
To cancel a workflow run, use the POST /runs/{run_id}/cancel endpoint:
$ curl
--location --request POST
'http://localhost:8080/ga4gh/wes/v1/runs/4deb8beb24894e9eb7c74b0f010305d1/cancel'
\
--user test:test
{
"run_id":
"4deb8beb24894e9eb7c74b0f010305d1"
}
DEVELOPING A PYTHON WORKFLOW
This tutorial walks through the features of Toil necessary for developing a workflow using the Toil Python API.
Scripting Quick Start
To begin, consider this short Toil Python workflow which illustrates defining a workflow:
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.io import mkdtemp
def
helloWorld(message):
return f"Hello, world!, here's a message:
{message}"
if __name__ ==
"__main__":
jobstore: str = mkdtemp("tutorial_quickstart")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "OFF"
options.clean = "always"
hello_job = Job.wrapFn(helloWorld, "Woot")
with
Toil(options) as toil:
print(toil.start(hello_job)) # prints "Hello, world!,
..."
The workflow consists of a single job. The resource requirements for that job are (optionally) specified by keyword arguments (memory, cores, disk). The workflow is run using toil.job.Job.Runner.getDefaultOptions() . Below we explain the components of this code in detail.
Job Basics
The atomic unit of work in a Toil workflow is a Job . User code extends this base class, or uses helper methods like toil.job.Job.addChildJobFn() , to define units of work. For example, here is a more long-winded class-based version of the job in the quick start example:
from toil.job import Job
class
HelloWorld(Job):
def __init__(self, message):
Job.__init__(self, memory="2G", cores=2,
disk="3G")
self.message = message
def run(self,
fileStore):
return f"Hello, world! Here's a message:
{self.message}"
In the example a class, HelloWorld, is defined. The constructor requests 2 gigabytes of memory, 2 cores and 3 gigabytes of local disk to complete the work.
The toil.job.Job.run() method is the function the user overrides to get work done. Here it just returns a message.
It is also possible to log a message using toil.job.Job.log() , which will be registered in the log output of the leader process of the workflow:
...
def run(self, fileStore):
self.log(f"Hello, world! Here's a message:
{self.message}")
Invoking a Workflow
We can add to the previous example to turn it into a complete workflow by adding the necessary function calls to create an instance of HelloWorld and to run this as a workflow containing a single job. For example:
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.io import mkdtemp
class
HelloWorld(Job):
def __init__(self, message):
Job.__init__(self)
self.message = message
def run(self,
fileStore):
return f"Hello, world!, here's a message:
{self.message}"
if __name__ ==
"__main__":
jobstore: str = mkdtemp("tutorial_invokeworkflow")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "OFF"
options.clean = "always"
hello_job = HelloWorld("Woot")
with
Toil(options) as toil:
print(toil.start(hello_job))
NOTE:
Do not include a . in the name of your python script (besides .py at the end). This is to allow toil to import the types and functions defined in your file while starting a new process.
This uses the toil.common.Toil class, which is used to run and resume Toil workflows. It is used as a context manager and allows for preliminary setup, such as staging of files into the job store on the leader node. An instance of the class is initialized by specifying an options object. The actual workflow is then invoked by calling the toil.common.Toil.start() method, passing the root job of the workflow, or, if a workflow is being restarted, toil.common.Toil.restart() should be used. Note that the context manager should have explicit if else branches addressing restart and non restart cases. The boolean value for these if else blocks is toil.options.restart.
For example:
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.io import mkdtemp
class
HelloWorld(Job):
def __init__(self, message):
Job.__init__(self)
self.message = message
def run(self,
fileStore):
return f"Hello, world!, I have a message:
{self.message}"
if __name__ ==
"__main__":
jobstore: str =
mkdtemp("tutorial_invokeworkflow2")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "INFO"
options.clean = "always"
with
Toil(options) as toil:
if not toil.options.restart:
job = HelloWorld("Woot!")
output = toil.start(job)
else:
output = toil.restart()
print(output)
The call to toil.job.Job.Runner.getDefaultOptions() creates a set of default options for the workflow. The only argument is a description of how to store the workflow's state in what we call a job-store . Here the job-store is contained in a directory within the current working directory called "toilWorkflowRun". Alternatively this string can encode other ways to store the necessary state, e.g. an S3 bucket object store location. By default the job-store is deleted if the workflow completes successfully.
The workflow is executed in the final line, which creates an instance of HelloWorld and runs it as a workflow. Note all Toil workflows start from a single starting job, referred to as the root job. The return value of the root job is returned as the result of the completed workflow (see promises below to see how this is a useful feature!).
Specifying Commandline Arguments
To allow command line control of the options we can use the toil.job.Job.Runner.getDefaultArgumentParser() method to create a argparse.ArgumentParser object which can be used to parse command line options for a Toil Python workflow. For example:
from
toil.common import Toil
from toil.job import Job
class
HelloWorld(Job):
def __init__(self, message):
Job.__init__(self)
self.message = message
def run(self,
fileStore):
return "Hello, world!, here's a message: %s" %
self.message
if __name__ ==
"__main__":
parser = Job.Runner.getDefaultArgumentParser()
options = parser.parse_args()
options.logLevel = "OFF"
options.clean = "always"
hello_job = HelloWorld("Woot")
with
Toil(options) as toil:
print(toil.start(hello_job))
This creates a fully fledged Toil Python workflow with all the options Toil exposes as command line arguments. Running this program with --help will print the full list of options.
Alternatively an existing argparse.ArgumentParser object can have Toil command line options added to it with the toil.job.Job.Runner.addToilOptions() method.
Resuming a Workflow
In the event that a workflow fails, either because of programmatic error within the jobs being run, or because of node failure, the workflow can be resumed. Workflows can only not be reliably resumed if the job-store itself becomes corrupt.
Critical to resumption is that jobs can be rerun, even if they have apparently completed successfully. Put succinctly, a user defined job should not corrupt its input arguments. That way, regardless of node, network or leader failure the job can be restarted and the workflow resumed.
To resume a workflow specify the "restart" option in the options object passed to toil.common.Toil.start() . If node failures are expected it can also be useful to use the integer "retryCount" option, which will attempt to rerun a job retryCount number of times before marking it fully failed.
In the common scenario that a small subset of jobs fail (including retry attempts) within a workflow Toil will continue to run other jobs until it can do no more, at which point toil.common.Toil.start() will raise a toil.exceptions.FailedJobsException exception. Typically at this point the user can decide to fix the script and resume the workflow or delete the job-store manually and rerun the complete workflow.
Functions and Job Functions
Defining jobs by creating class definitions generally involves the boilerplate of creating a constructor. To avoid this the classes toil.job.FunctionWrappingJob and toil.job.JobFunctionWrappingTarget allow functions to be directly converted to jobs. For example, the quick start example (repeated here):
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.io import mkdtemp
def
helloWorld(message):
return f"Hello, world!, here's a message:
{message}"
if __name__ ==
"__main__":
jobstore: str = mkdtemp("tutorial_quickstart")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "OFF"
options.clean = "always"
hello_job = Job.wrapFn(helloWorld, "Woot")
with
Toil(options) as toil:
print(toil.start(hello_job)) # prints "Hello, world!,
..."
Is equivalent to the previous example, but using a function to define the job.
The function call:
Job.wrapFn(helloWorld, "Woot")
Creates the instance of the toil.job.FunctionWrappingTarget that wraps the function.
The keyword arguments memory , cores and disk allow resource requirements to be specified as before. Even if they are not included as keyword arguments within a function header they can be passed as arguments when wrapping a function as a job and will be used to specify resource requirements.
We can also use the function wrapping syntax to a job function , a function whose first argument is a reference to the wrapping job. Just like a self argument in a class, this allows access to the methods of the wrapping job, see toil.job.JobFunctionWrappingTarget . For example:
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.io import mkdtemp
def
helloWorld(job, message):
job.log(f"Hello world, I have a message:
{message}")
if __name__ ==
"__main__":
jobstore: str = mkdtemp("tutorial_jobfunctions")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "INFO"
options.clean = "always"
hello_job = Job.wrapJobFn(helloWorld, "Woot!")
with
Toil(options) as toil:
toil.start(hello_job)
Here helloWorld() is a job function. It uses the toil.job.Job.log() to log a message that will be printed to the output console. Here the only subtle difference to note is the line:
hello_job = Job.wrapJobFn(helloWorld, "Woot")
Which uses the function toil.job.Job.wrapJobFn() to wrap the job function instead of toil.job.Job.wrapFn() which wraps a vanilla function.
Workflows with Multiple Jobs
A parent job can have child jobs and follow-on jobs. These relationships are specified by methods of the job class, e.g. toil.job.Job.addChild() and toil.job.Job.addFollowOn() .
Considering a set of jobs the nodes in a job graph and the child and follow-on relationships the directed edges of the graph, we say that a job B that is on a directed path of child/follow-on edges from a job A in the job graph is a successor of A , similarly A is a predecessor of B .
A parent job's child jobs are run directly after the parent job has completed, and in parallel. The follow-on jobs of a job are run after its child jobs and their successors have completed. They are also run in parallel. Follow-ons allow the easy specification of cleanup tasks that happen after a set of parallel child tasks. The following shows a simple example that uses the earlier helloWorld() job function:
from
toil.common import Toil
from toil.job import Job
def
helloWorld(job, message):
job.log(f"Hello world, I have a message:
{message}")
if __name__ ==
"__main__":
parser = Job.Runner.getDefaultArgumentParser()
options = parser.parse_args()
options.logLevel = "INFO"
options.clean = "always"
j1 =
Job.wrapJobFn(helloWorld, "first")
j2 = Job.wrapJobFn(helloWorld, "second or third")
j3 = Job.wrapJobFn(helloWorld, "second or third")
j4 = Job.wrapJobFn(helloWorld, "last")
j1.addChild(j2)
j1.addChild(j3)
j1.addFollowOn(j4)
with
Toil(options) as toil:
toil.start(j1)
In the example four jobs are created, first j1 is run, then j2 and j3 are run in parallel as children of j1 , finally j4 is run as a follow-on of j1 .
There are multiple short hand functions to achieve the same workflow, for example:
from
toil.common import Toil
from toil.job import Job
def
helloWorld(job, message):
job.log(f"Hello world, I have a message:
{message}")
if __name__ ==
"__main__":
parser = Job.Runner.getDefaultArgumentParser()
options = parser.parse_args()
options.logLevel = "INFO"
options.clean = "always"
j1 =
Job.wrapJobFn(helloWorld, "first")
j2 = j1.addChildJobFn(helloWorld, "second or
third")
j3 = j1.addChildJobFn(helloWorld, "second or
third")
j4 = j1.addFollowOnJobFn(helloWorld, "last")
with
Toil(options) as toil:
toil.start(j1)
Equivalently defines the workflow, where the functions toil.job.Job.addChildJobFn() and toil.job.Job.addFollowOnJobFn() are used to create job functions as children or follow-ons of an earlier job.
Jobs graphs are not limited to trees, and can express arbitrary directed acyclic graphs. For a precise definition of legal graphs see toil.job.Job.checkJobGraphForDeadlocks() . The previous example could be specified as a DAG as follows:
from
toil.common import Toil
from toil.job import Job
def
helloWorld(job, message):
job.log(f"Hello world, I have a message:
{message}")
if __name__ ==
"__main__":
parser = Job.Runner.getDefaultArgumentParser()
options = parser.parse_args()
options.logLevel = "INFO"
options.clean = "always"
j1 =
Job.wrapJobFn(helloWorld, "first")
j2 = j1.addChildJobFn(helloWorld, "second or
third")
j3 = j1.addChildJobFn(helloWorld, "second or
third")
j4 = j2.addChildJobFn(helloWorld, "last")
j3.addChild(j4)
with
Toil(options) as toil:
toil.start(j1)
Note the use of an extra child edge to make j4 a child of both j2 and j3 .
Dynamic Job Creation
The previous examples show a workflow being defined outside of a job. However, Toil also allows jobs to be created dynamically within jobs. For example:
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.io import mkdtemp
def
binaryStringFn(job, depth, message=""):
if depth > 0:
job.addChildJobFn(binaryStringFn, depth - 1, message +
"0")
job.addChildJobFn(binaryStringFn, depth - 1, message +
"1")
else:
job.log(f"Binary string: {message}")
if __name__ ==
"__main__":
jobstore: str = mkdtemp("tutorial_dynamic")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "INFO"
options.clean = "always"
with
Toil(options) as toil:
toil.start(Job.wrapJobFn(binaryStringFn, depth=5))
The job function binaryStringFn logs all possible binary strings of length n (here n=5 ), creating a total of 2ˆ(n+2) - 1 jobs dynamically and recursively. Static and dynamic creation of jobs can be mixed in a Toil workflow, with jobs defined within a job or job function being created at run time.
Promises
The previous example of dynamic job creation shows variables from a parent job being passed to a child job. Such forward variable passing is naturally specified by recursive invocation of successor jobs within parent jobs. This can also be achieved statically by passing around references to the return variables of jobs. In Toil this is achieved with promises, as illustrated in the following example:
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.io import mkdtemp
def fn(job, i):
job.log("i is: %s" % i, level=100)
return i + 1
if __name__ ==
"__main__":
jobstore: str = mkdtemp("tutorial_promises")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "INFO"
options.clean = "always"
j1 =
Job.wrapJobFn(fn, 1)
j2 = j1.addChildJobFn(fn, j1.rv())
j3 = j1.addFollowOnJobFn(fn, j2.rv())
with
Toil(options) as toil:
toil.start(j1)
Running this workflow results in three log messages from the jobs: i is 1 from j1 , i is 2 from j2 and i is 3 from j3 .
The return value from the first job is promised to the second job by the call to toil.job.Job.rv() in the following line:
j2 = j1.addChildFn(fn, j1.rv())
The value of j1.rv() is a promise , rather than the actual return value of the function, because j1 for the given input has at that point not been evaluated. A promise ( toil.job.Promise ) is essentially a pointer to for the return value that is replaced by the actual return value once it has been evaluated. Therefore, when j2 is run the promise becomes 2.
Promises also support indexing of return values:
def
parent(job):
indexable = Job.wrapJobFn(fn)
job.addChild(indexable)
job.addFollowOnFn(raiseWrap, indexable.rv(2))
def
raiseWrap(arg):
raise RuntimeError(arg) # raises "2"
def fn(job):
return (0, 1, 2, 3)
Promises can be quite useful. For example, we can combine dynamic job creation with promises to achieve a job creation process that mimics the functional patterns possible in many programming languages:
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.io import mkdtemp
def
binaryStrings(job, depth, message=""):
if depth > 0:
s = [
job.addChildJobFn(binaryStrings, depth - 1, message +
"0").rv(),
job.addChildJobFn(binaryStrings, depth - 1, message +
"1").rv(),
]
return job.addFollowOnFn(merge, s).rv()
return [message]
def
merge(strings):
return strings[0] + strings[1]
if __name__ ==
"__main__":
jobstore: str = mkdtemp("tutorial_promises2")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.loglevel = "OFF"
options.clean = "always"
with
Toil(options) as toil:
print(toil.start(Job.wrapJobFn(binaryStrings, depth=5)))
The return value l of the workflow is a list of all binary strings of length 10, computed recursively. Although a toy example, it demonstrates how closely Toil workflows can mimic typical programming patterns.
Promised Requirements
Promised requirements are a special case of Promises that allow a job's return value to be used as another job's resource requirements.
This is useful when, for example, a job's storage requirement is determined by a file staged to the job store by an earlier job:
import os
from
toil.common import Toil
from toil.job import Job, PromisedRequirement
from toil.lib.io import mkdtemp
def
parentJob(job):
downloadJob = Job.wrapJobFn(
stageFn,
"file://" + os.path.realpath(__file__),
cores=0.1,
memory="32M",
disk="1M",
)
job.addChild(downloadJob)
analysis =
Job.wrapJobFn(
analysisJob,
fileStoreID=downloadJob.rv(0),
disk=PromisedRequirement(downloadJob.rv(1)),
)
job.addFollowOn(analysis)
def
stageFn(job, url):
importedFile = job.fileStore.import_file(url)
return importedFile, importedFile.size
def
analysisJob(job, fileStoreID):
# now do some analysis on the file
pass
if __name__ ==
"__main__":
jobstore: str = mkdtemp("tutorial_requirements")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "INFO"
options.clean = "always"
with
Toil(options) as toil:
toil.start(Job.wrapJobFn(parentJob))
Note that this also makes use of the size attribute of the FileID object. This promised requirements mechanism can also be used in combination with an aggregator for multiple jobs' output values:
def
parentJob(job):
aggregator = []
for fileNum in range(0, 10):
downloadJob = Job.wrapJobFn(stageFn, "file://" +
os.path.realpath(__file__), cores=0.1, memory='32M',
disk='1M')
job.addChild(downloadJob)
aggregator.append(downloadJob)
analysis =
Job.wrapJobFn(analysisJob,
fileStoreID=downloadJob.rv(0),
disk=PromisedRequirement(lambda xs: sum(xs), [j.rv(1) for j
in aggregator]))
job.addFollowOn(analysis)
Limitations
Just like regular promises, the return value must be determined prior to scheduling any job that depends on the return value. In our example above, notice how the dependent jobs were follow ons to the parent while promising jobs are children of the parent. This ordering ensures that all promises are properly fulfilled.
FileID
The toil.fileStore.FileID class is a small wrapper around Python's builtin string class. It is used to represent a file's ID in the file store, and has a size attribute that is the file's size in bytes. This object is returned by importFile and writeGlobalFile .
Managing files within a workflow
It is frequently the case that a workflow will want to create files, both persistent and temporary, during its run. The toil.fileStores.abstractFileStore.AbstractFileStore class is used by jobs to manage these files in a manner that guarantees cleanup and resumption on failure.
The toil.job.Job.run() method has a file store instance as an argument. The following example shows how this can be used to create temporary files that persist for the length of the job, be placed in a specified local disk of the node and that will be cleaned up, regardless of failure, when the job finishes:
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.io import mkdtemp
class
LocalFileStoreJob(Job):
def run(self, fileStore):
# self.tempDir will always contain the name of a directory
within the allocated disk space reserved for the job
scratchDir = self.tempDir
# Similarly
create a temporary file.
scratchFile = fileStore.getLocalTempFile()
if __name__ ==
"__main__":
jobstore: str = mkdtemp("tutorial_managing")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "INFO"
options.clean = "always"
# Create an
instance of FooJob which will have at least 2 gigabytes of
storage space.
j = LocalFileStoreJob(disk="2G")
# Run the
workflow
with Toil(options) as toil:
toil.start(j)
Job functions can also access the file store for the job. The equivalent of the LocalFileStoreJob class is
def
localFileStoreJobFn(job):
scratchDir = job.tempDir
scratchFile = job.fileStore.getLocalTempFile()
Note that the fileStore attribute is accessed as an attribute of the job argument.
In addition to temporary files that exist for the duration of a job, the file store allows the creation of files in a global store, which persists during the workflow and are globally accessible (hence the name) between jobs. For example:
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.io import mkdtemp
def
globalFileStoreJobFn(job):
job.log(
"The following example exercises all the methods
provided "
"by the
toil.fileStores.abstractFileStore.AbstractFileStore
class"
)
# Create a
local temporary file.
scratchFile = job.fileStore.getLocalTempFile()
# Write
something in the scratch file.
with open(scratchFile, "w") as fH:
fH.write("What a tangled web we weave")
# Write a copy
of the file into the file-store; fileID is the key that can
be used to retrieve the file.
# This write is asynchronous by default
fileID = job.fileStore.writeGlobalFile(scratchFile)
# Write another
file using a stream; fileID2 is the
# key for this second file.
with job.fileStore.writeGlobalFileStream(cleanup=True) as
(fH, fileID2):
fH.write(b"Out brief candle")
# Now read the
first file; scratchFile2 is a local copy of the file that is
read-only by default.
scratchFile2 = job.fileStore.readGlobalFile(fileID)
# Read the
second file to a desired location: scratchFile3.
scratchFile3 = os.path.join(job.tempDir,
"foo.txt")
job.fileStore.readGlobalFile(fileID2,
userPath=scratchFile3)
# Read the
second file again using a stream.
with job.fileStore.readGlobalFileStream(fileID2) as fH:
print(fH.read()) # This prints "Out brief
candle"
# Delete the
first file from the global file-store.
job.fileStore.deleteGlobalFile(fileID)
# It is
unnecessary to delete the file keyed by fileID2 because we
used the cleanup flag,
# which removes the file after this job and all its
successors have run (if the file still exists)
if __name__ ==
"__main__":
jobstore: str = mkdtemp("tutorial_managing2")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "INFO"
options.clean = "always"
with
Toil(options) as toil:
toil.start(Job.wrapJobFn(globalFileStoreJobFn))
The example demonstrates the global read, write and delete functionality of the file-store, using both local copies of the files and streams to read and write the files. It covers all the methods provided by the file store interface.
What is obvious is that the file-store provides no functionality to update an existing "global" file, meaning that files are, barring deletion, immutable. Also worth noting is that there is no file system hierarchy for files in the global file store. These limitations allow us to fairly easily support different object stores and to use caching to limit the amount of network file transfer between jobs.
Staging of Files into the Job Store
External files can be imported into or exported out of the job store prior to running a workflow when the toil.common.Toil context manager is used on the leader. The context manager provides methods toil.common.Toil.importFile() , and toil.common.Toil.exportFile() for this purpose. The destination and source locations of such files are described with URLs passed to the two methods. Local files can be imported and exported as relative paths, and should be relative to the directory where the toil workflow is initially run from.
Using absolute paths and appropriate schema where possible (prefixing with " file:// " or "s3:/" for example), make imports and exports less ambiguous and is recommended.
A list of the currently supported URLs can be found at toil.jobStores.abstractJobStore.AbstractJobStore.importFile() . To import an external file into the job store as a shared file, pass the optional sharedFileName parameter to that method.
If a workflow fails for any reason an imported file acts as any other file in the job store. If the workflow was configured such that it not be cleaned up on a failed run, the file will persist in the job store and needs not be staged again when the workflow is resumed.
Example:
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.io import mkdtemp
from toil.test import get_data
class
HelloWorld(Job):
def __init__(self, id):
Job.__init__(self)
self.inputFileID = id
def run(self,
fileStore):
with fileStore.readGlobalFileStream(self.inputFileID,
encoding="utf-8") as fi:
with
fileStore.writeGlobalFileStream(encoding="utf-8")
as (
fo,
outputFileID,
):
fo.write(fi.read() + "World!")
return outputFileID
if __name__ ==
"__main__":
jobstore: str = mkdtemp("tutorial_staging")
tmp: str = mkdtemp("tutorial_staging_tmp")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "INFO"
options.clean = "always"
with
Toil(options) as toil:
if not toil.options.restart:
inputFileID = toil.importFile(
"file://" +
get_data("toil/test/docs/scripts/stagingExampleFiles/in.txt")
)
outputFileID = toil.start(HelloWorld(inputFileID))
else:
outputFileID = toil.restart()
toil.exportFile(
outputFileID,
"file://" + os.path.join(tmp +
"out.txt"),
)
Using Docker Containers in Toil
Docker containers are commonly used with Toil. The combination of Toil and Docker allows for pipelines to be fully portable between any platform that has both Toil and Docker installed. Docker eliminates the need for the user to do any other tool installation or environment setup.
In order to use Docker containers with Toil, Docker must be installed on all workers of the cluster. Instructions for installing Docker can be found on the Docker website.
When using Toil-based autoscaling, Docker will be automatically set up on the cluster's worker nodes, so no additional installation steps are necessary. Further information on using Toil-based autoscaling can be found in the Running a Workflow with Autoscaling documentation.
In order to use docker containers in a Toil workflow, the container can be built locally or downloaded in real time from an online docker repository like Quay . If the container is not in a repository, the container's layers must be accessible on each node of the cluster.
When invoking docker containers from within a Toil workflow, it is strongly recommended that you use dockerCall() , a toil job function provided in toil.lib.docker . dockerCall leverages docker's own python API, and provides container cleanup on job failure. When docker containers are run without this feature, failed jobs can result in resource leaks. Docker's API can be found at docker-py .
In order to use dockerCall , your installation of Docker must be set up to run without sudo . Instructions for setting this up can be found - here .
An example of a basic dockerCall is below:
dockerCall(job=job,
tool='quay.io/ucsc_cgl/bwa',
workDir=job.tempDir,
parameters=['index', '/data/reference.fa'])
Note the assumption that reference.fa file is located in /data . This is Toil's standard convention as a mount location to reduce boilerplate when calling dockerCall . Users can choose their own mount locations by supplying a volumes kwarg to dockerCall , such as: volumes={working_dir: {'bind': '/data', 'mode': 'rw'}} , where working_dir is an absolute path on the user's filesystem.
dockerCall can also be added to workflows like any other job function:
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.docker import apiDockerCall
from toil.lib.io import mkdtemp
align =
Job.wrapJobFn(
apiDockerCall, image="ubuntu",
working_dir=os.getcwd(), parameters=["ls",
"-lha"]
)
if __name__ ==
"__main__":
jobstore: str = mkdtemp("tutorial_docker")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "INFO"
options.clean = "always"
with
Toil(options) as toil:
toil.start(align)
cgl-docker-lib contains dockerCall -compatible Dockerized tools that are commonly used in bioinformatics analysis.
The documentation provides guidelines for developing your own Docker containers that can be used with Toil and dockerCall . In order for a container to be compatible with dockerCall , it must have an ENTRYPOINT set to a wrapper script, as described in cgl-docker-lib containerization standards. This can be set by passing in the optional keyword argument, 'entrypoint'. Example:
entrypoint=["/bin/bash","-c"]
dockerCall supports currently the 75 keyword arguments found in the python Docker API , under the 'run' command.
Services
It is sometimes desirable to run services , such as a database or server, concurrently with a workflow. The toil.job.Job.Service class provides a simple mechanism for spawning such a service within a Toil workflow, allowing precise specification of the start and end time of the service, and providing start and end methods to use for initialization and cleanup. The following simple, conceptual example illustrates how services work:
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.io import mkdtemp
class
DemoService(Job.Service):
def start(self, fileStore):
# Start up a database/service here
# Return a value that enables another process to connect to
the database
return "loginCredentials"
def
check(self):
# A function that if it returns False causes the service to
quit
# If it raises an exception the service is killed and an
error is reported
return True
def stop(self,
fileStore):
# Cleanup the database here
pass
j = Job()
s = DemoService()
loginCredentialsPromise = j.addService(s)
def
dbFn(loginCredentials):
# Use the login credentials returned from the service's
start method to connect to the service
pass
j.addChildFn(dbFn, loginCredentialsPromise)
if __name__ ==
"__main__":
jobstore: str = mkdtemp("tutorial_services")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "INFO"
options.clean = "always"
with
Toil(options) as toil:
toil.start(j)
In this example the DemoService starts a database in the start method, returning an object from the start method indicating how a client job would access the database. The service's stop method cleans up the database, while the service's check method is polled periodically to check the service is alive.
A DemoService instance is added as a service of the root job j , with resource requirements specified. The return value from toil.job.Job.addService() is a promise to the return value of the service's start method. When the promised is fulfilled it will represent how to connect to the database. The promise is passed to a child job of j , which uses it to make a database connection. The services of a job are started before any of its successors have been run and stopped after all the successors of the job have completed successfully.
Multiple services can be created per job, all run in parallel. Additionally, services can define sub-services using toil.job.Job.Service.addChild() . This allows complex networks of services to be created, e.g. Apache Spark clusters, within a workflow.
Checkpoints
Services complicate resuming a workflow after failure, because they can create complex dependencies between jobs. For example, consider a service that provides a database that multiple jobs update. If the database service fails and loses state, it is not clear that just restarting the service will allow the workflow to be resumed, because jobs that created that state may have already finished. To get around this problem Toil supports checkpoint jobs, specified as the boolean keyword argument checkpoint to a job or wrapped function, e.g.:
j = Job(checkpoint=True)
A checkpoint job is rerun if one or more of its successors fails its retry attempts, until it itself has exhausted its retry attempts. Upon restarting a checkpoint job all its existing successors are first deleted, and then the job is rerun to define new successors. By checkpointing a job that defines a service, upon failure of the service the database and the jobs that access the service can be redefined and rerun.
To make the implementation of checkpoint jobs simple, a job can only be a checkpoint if when first defined it has no successors, i.e. it can only define successors within its run method.
Encapsulation
Let A be a root job potentially with children and follow-ons. Without an encapsulated job the simplest way to specify a job B which runs after A and all its successors is to create a parent of A , call it Ap , and then make B a follow-on of Ap . e.g.:
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.io import mkdtemp
if __name__ ==
"__main__":
# A is a job with children and follow-ons, for example:
A = Job()
A.addChild(Job())
A.addFollowOn(Job())
# B is a job
which needs to run after A and its successors
B = Job()
# The way to do
this without encapsulation is to make a parent of A, Ap, and
make B a follow-on of Ap.
Ap = Job()
Ap.addChild(A)
Ap.addFollowOn(B)
jobstore: str =
mkdtemp("tutorial_encapsulations")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "INFO"
options.clean = "always"
with
Toil(options) as toil:
print(toil.start(Ap))
An encapsulated job E(A) of A saves making Ap , instead we can write:
import os
from
toil.common import Toil
from toil.job import Job
from toil.lib.io import mkdtemp
if __name__ ==
"__main__":
# A
A = Job()
A.addChild(Job())
A.addFollowOn(Job())
# Encapsulate A
A = A.encapsulate()
# B is a job
which needs to run after A and its successors
B = Job()
# With
encapsulation A and its successor subgraph appear to be a
single job, hence:
A.addChild(B)
jobstore: str =
mkdtemp("tutorial_encapsulations2")
os.rmdir(jobstore)
options = Job.Runner.getDefaultOptions(jobstore)
options.logLevel = "INFO"
options.clean = "always"
with
Toil(options) as toil:
print(toil.start(A))
Note the call to toil.job.Job.encapsulate() creates the toil.job.Job.EncapsulatedJob .
Depending on Toil
If you are packing your workflow(s) as a pip-installable distribution on PyPI, you might be tempted to declare Toil as a dependency in your setup.py , via the install_requires keyword argument to setup() . Unfortunately, this does not work, for two reasons: For one, Toil uses Setuptools' extra mechanism to manage its own optional dependencies. If you explicitly declared a dependency on Toil, you would have to hard-code a particular combination of extras (or no extras at all), robbing the user of the choice what Toil extras to install. Secondly, and more importantly, declaring a dependency on Toil would only lead to Toil being installed on the leader node of a cluster, but not the worker nodes. Auto-deployment does not work here because Toil cannot auto-deploy itself, the classic "Which came first, chicken or egg?" problem.
In other words, you shouldn't explicitly depend on Toil. Document the dependency instead (as in "This workflow needs Toil version X.Y.Z to be installed") and optionally add a version check to your setup.py . Refer to the check_version() function in the toil-lib project's setup.py for an example. Alternatively, you can also just depend on toil-lib and you'll get that check for free.
If your workflow depends on a dependency of Toil, consider not making that dependency explicit either. If you do, you risk a version conflict between your project and Toil. The pip utility may silently ignore that conflict, breaking either Toil or your workflow. It is safest to simply assume that Toil installs that dependency for you. The only downside is that you are locked into the exact version of that dependency that Toil declares. But such is life with Python, which, unlike Java, has no means of dependencies belonging to different software components within the same process, and whose favored software distribution utility is - incapable of properly resolving overlapping dependencies and detecting conflicts.
Best Practices for Dockerizing Toil Workflows
Computational Genomics Lab 's Dockstore based production system provides workflow authors a way to run Dockerized versions of their pipeline in an automated, scalable fashion. To be compatible with this system of a workflow should meet the following requirements. In addition to the Docker container, a common workflow language descriptor file is needed. For inputs:
|
• |
Only command line arguments should be used for configuring the workflow. If the workflow relies on a configuration file, like - Toil-RNAseq or ProTECT , a wrapper script inside the Docker container can be used to parse the CLI and generate the necessary configuration file. |
||
|
• |
All inputs to the pipeline should be explicitly enumerated rather than implicit. For example, don't rely on one FASTQ read's path to discover the location of its pair. This is necessary since all inputs are mapped to their own isolated directories when the Docker is called via Dockstore. |
||
|
• |
All inputs must be documented in the CWL descriptor file. Examples of this file can be seen in both Toil-RNAseq and ProTECT . |
For outputs:
|
• |
All outputs should be written to a local path rather than S3. |
||
|
• |
Take care to package outputs in a local and user-friendly way. For example, don't tar up all output if there are specific files that will care to see individually. |
||
|
• |
All output file names should be deterministic and predictable. For example, don't prepend the name of an output file with PASS/FAIL depending on the outcome of the pipeline. |
||
|
• |
All outputs must be documented in the CWL descriptor file. Examples of this file can be seen in both Toil-RNAseq and ProTECT . |
TOIL CLASS API
The Toil class
configures and starts a Toil run.
class toil.common.Toil(options)
A context manager that represents a Toil workflow.
Specifically
the batch system, job store, and its configuration.
Parameters
options ( Namespace )
__init__(options)
Initialize a Toil object from the given options.
Note that this
is very light-weight and that the bulk of the work is done
when the context is entered.
Parameters
options ( Namespace ) -- command line options specified by the user
Return type
None
start(rootJob)
Invoke a Toil workflow with the given job as the root for an initial run.
This method
must be called in the body of a
with Toil(...) as
toil:
statement. This method should not be called more
than once for a workflow that has not finished.
Parameters
rootJob ( Job ) -- The root job of the workflow
Return type
Any
Returns
The root job's return value
restart()
Restarts a workflow that has
been interrupted.
Return type
Any
Returns
The root job's return value
classmethod getJobStore(locator)
Create an instance of the
concrete job store implementation that matches the given
locator.
Parameters
|
• |
locator ( str ) -- The location of the job store to be represent by the instance |
||
|
• |
locator |
Return type
AbstractJobStore
Returns
an instance of a concrete subclass of AbstractJobStore
static createBatchSystem(config)
Create an instance of the batch
system specified in the given config.
Parameters
config ( Config ) -- the current configuration
Return type
AbstractBatchSystem
Returns
an instance of a concrete subclass of AbstractBatchSystem
import_file(src_uri,
shared_file_name=None, symlink=True,
check_existence=True)
Import the file at the given URL into the job store.
By default,
returns None if the file does not exist.
Parameters
|
• |
check_existence ( bool ) -- If true, raise FileNotFoundError if the file does not exist. If false, return None when the file does not exist. |
||
|
• |
src_uri ( str ) |
||
|
• |
shared_file_name ( str | None ) |
||
|
• |
symlink ( bool ) |
Return type
Optional [ FileID ]
See
toil.jobStores.abstractJobStore.AbstractJobStore.importFile()
for a full description
Parameters
|
• |
src_uri ( str ) |
|||
|
• |
shared_file_name ( Optional [ str ]) |
|||
|
• |
symlink ( bool ) |
|||
|
• |
check_existence ( bool ) |
Return type
FileID | None
export_file(file_id, dst_uri)
Export file to destination pointed at by the destination URL.
See
toil.jobStores.abstractJobStore.AbstractJobStore.exportFile()
for a full description
Parameters
|
• |
file_id ( FileID ) |
|||
|
• |
dst_uri ( str ) |
Return type
None
static normalize_uri(uri, check_existence=False)
Given a URI, if it has no
scheme, prepend "file:".
Parameters
|
• |
check_existence ( bool ) -- If set, raise FileNotFoundError if a URI points to a local file that does not exist. |
||
|
• |
uri ( str ) |
Return type
str
static getToilWorkDir(configWorkDir=None)
Return a path to a writable directory under which per-workflow directories exist.
This directory
is always required to exist on a machine, even if the Toil
worker has not run yet. If your workers and leader have
different temp directories, you may need to set
TOIL_WORKDIR.
Parameters
configWorkDir ( Optional [ str ]) -- Value passed to the program using the --workDir flag
Return type
str
Returns
Path to the Toil work directory, constant across all machines
classmethod
get_toil_coordination_dir(config_work_dir,
config_coordination_dir)
Return a path to a writable
directory, which will be in memory if convenient. Ought to
be used for file locking and coordination.
Parameters
|
• |
config_work_dir ( Optional [ str ]) -- Value passed to the program using the --workDir flag |
||
|
• |
config_coordination_dir ( Optional [ str ]) -- Value passed to the program using the --coordinationDir flag |
||
|
• |
workflow_id -- Used if a tmpdir_prefix exists to create full directory paths unique per workflow |
Return type
str
Returns
Path to the Toil coordination directory. Ought to be on a POSIX filesystem that allows directories containing open files to be deleted.
static get_workflow_path_component(workflow_id)
Get a safe filesystem path component for a workflow.
Will be
consistent for all processes on a given machine, and
different for all processes on different machines.
Parameters
workflow_id ( str ) -- The ID of the current Toil workflow.
Return type
str
classmethod getLocalWorkflowDir(workflowID, configWorkDir=None)
Return the directory where
worker directories and the cache will be located for this
workflow on this machine.
Parameters
|
• |
configWorkDir ( Optional [ str ]) -- Value passed to the program using the --workDir flag |
||
|
• |
workflowID ( str ) |
Return type
str
Returns
Path to the local workflow directory on this machine
classmethod
get_local_workflow_coordination_dir(workflow_id,
config_work_dir, config_coordination_dir)
Return the directory where coordination files should be located for this workflow on this machine. These include internal Toil databases and lock files for the machine.
If an in-memory
filesystem is available, it is used. Otherwise, the local
workflow directory, which may be on a shared network
filesystem, is used.
Parameters
|
• |
workflow_id ( str ) -- Unique ID of the current workflow. |
||
|
• |
config_work_dir ( Optional [ str ]) -- Value used for the work directory in the current Toil Config. |
||
|
• |
config_coordination_dir ( Optional [ str ]) -- Value used for the coordination directory in the current Toil Config. |
Return type
str
Returns
Path to the local workflow coordination directory on this machine.
JOB STORE API
The job store
interface is an abstraction layer that that hides the
specific details of file storage, for example standard file
systems, S3, etc. The
AbstractJobStore
API is
implemented to support a give file store, e.g. S3. Implement
this API to support a new file store.
class
toil.jobStores.abstractJobStore.AbstractJobStore(locator)
Represents the physical storage for the jobs and files in a Toil workflow.
JobStores are responsible for storing toil.job.JobDescription (which relate jobs to each other) and files.
Actual toil.job.Job objects are stored in files, referenced by JobDescriptions. All the non-file CRUD methods the JobStore provides deal in JobDescriptions and not full, executable Jobs.
To actually get
ahold of a
toil.job.Job
, use
toil.job.Job.loadJob()
with a JobStore and the
relevant JobDescription.
Parameters
locator ( str )
__init__(locator)
Create an instance of the job store.
The instance will not be fully functional until either initialize() or resume() is invoked. Note that the destroy() method may be invoked on the object with or without prior invocation of either of these two methods.
Takes and
stores the locator string for the job store, which will be
accessible via self.locator.
Parameters
locator ( str )
Return type
None
initialize(config)
Initialize this job store.
Create the
physical storage for this job store, allocate a workflow ID
and persist the given Toil configuration to the store.
Parameters
config ( Config ) -- the Toil configuration to initialize this job store with. The given configuration will be updated with the newly allocated workflow ID.
|
Raises |
JobStoreExistsException -- if the physical storage for this job store already exists |
Return type
None
write_config()
Persists the value of the
AbstractJobStore.config
attribute to the job store,
so that it can be retrieved later by other instances of this
class.
Return type
None
resume()
Connect this instance to the physical storage it represents and load the Toil configuration into the AbstractJobStore.config attribute.
|
Raises |
NoSuchJobStoreException -- if the physical storage for this job store doesn't exist |
Return type
None
property config: Config
Return the Toil configuration
associated with this job store.
Return type
toil.common.Config
property locator: str
Get the locator that defines
the job store, which can be used to connect to it.
Return type
str
setRootJob(rootJobStoreID)
Set the root job of the
workflow backed by this job store.
Parameters
rootJobStoreID ( FileID )
Return type
None
set_root_job(job_id)
Set the root job of the
workflow backed by this job store.
Parameters
job_id ( FileID ) -- The ID of the job to set as root
Return type
None
load_root_job()
Loads the JobDescription for the root job in the current job store.
|
Raises |
toil.job.JobException -- If no root job is set or if the root job doesn't exist in this job store |
Return type
JobDescription
Returns
The root job.
create_root_job(job_description)
Create the given JobDescription
and set it as the root job in this job store.
Parameters
job_description ( JobDescription ) -- JobDescription to save and make the root job.
Return type
JobDescription
get_root_job_return_value()
Parse the return value from the root job.
Raises an
exception if the root job hasn't fulfilled its promise yet.
Return type
Any
import_file(src_uri,
shared_file_name=None, hardlink=False,
symlink=True)
Imports the file at the given URL into job store. The ID of the newly imported file is returned. If the name of a shared file name is provided, the file will be imported as such and None is returned. If an executable file on the local filesystem is uploaded, its executability will be preserved when it is downloaded.
Currently supported schemes are:
|
• |
's3' for objects in Amazon S3
e.g. s3://bucket/key
|
• |
'file' for local files
e.g. file:///local/file/path
|
• |
||||
|
'http' |
e.g. http://someurl.com/path |
|||
|
• |
||||
|
'gs' |
e.g. gs://bucket/file |
Raises
FileNotFoundError if the file does not exist.
Parameters
|
• |
src_uri ( str ) -- URL that points to a file or object in the storage mechanism of a supported URL scheme e.g. a blob in an AWS s3 bucket. It must be a file, not a directory or prefix. |
||
|
• |
shared_file_name ( Optional [ str ]) -- Optional name to assign to the imported file within the job store |
||
|
• |
src_uri |
||
|
• |
shared_file_name |
||
|
• |
hardlink ( bool ) |
||
|
• |
symlink ( bool ) |
Returns
The jobStoreFileID of the imported file or None if shared_file_name was given
Return type
toil.fileStores.FileID or None
export_file(file_id, dst_uri)
Exports file to destination pointed at by the destination URL. The exported file will be executable if and only if it was originally uploaded from an executable file on the local filesystem.
Refer to AbstractJobStore.import_file() documentation for currently supported URL schemes.
Note that the
helper method _exportFile is used to read from the source
and write to destination. To implement any optimizations
that circumvent this, the _exportFile method should be
overridden by subclasses of AbstractJobStore.
Parameters
|
• |
file_id ( FileID ) -- The id of the file in the job store that should be exported. |
||
|
• |
dst_uri ( str ) -- URL that points to a file or object in the storage mechanism of a supported URL scheme e.g. a blob in an AWS s3 bucket. May also be a local path. |
||
|
• |
file_id |
||
|
• |
dst_uri |
Return type
None
classmethod url_exists(src_uri)
Return True if the file at the given URI exists, and False otherwise.
May raise an
error if file existence cannot be determined.
Parameters
src_uri ( str ) -- URL that points to a file or object in the storage mechanism of a supported URL scheme e.g. a blob in an AWS s3 bucket.
Return type
bool
classmethod get_size(src_uri)
Get the size in bytes of the
file at the given URL, or None if it cannot be obtained.
Parameters
src_uri ( str ) -- URL that points to a file or object in the storage mechanism of a supported URL scheme e.g. a blob in an AWS s3 bucket.
Return type
Optional [ int ]
classmethod get_is_directory(src_uri)
Return True if the thing at the
given URL is a directory, and False if it is a file. The URL
may or may not end in '/'.
Parameters
src_uri ( str )
Return type
bool
classmethod list_url(src_uri)
List the directory at the given URL. Returned path components can be joined with '/' onto the passed URL to form new URLs. Those that end in '/' correspond to directories. The provided URL may or may not end with '/'.
Currently supported schemes are:
|
• |
's3' for objects in Amazon S3
e.g. s3://bucket/prefix/
|
• |
'file' for local files
e.g. file:///local/dir/path/
Parameters
|
• |
src_uri ( str ) -- URL that points to a directory or prefix in the storage mechanism of a supported URL scheme e.g. a prefix in an AWS s3 bucket. |
||
|
• |
src_uri |
Return type
list [ str ]
Returns
A list of URL components in the given directory, already URL-encoded.
classmethod read_from_url(src_uri, writable)
Read the given URL and write its content into the given writable stream.
Raises
FileNotFoundError if the URL doesn't exist.
Return type
tuple [ int , bool ]
Returns
The size of the file in bytes and whether the executable permission bit is set
Parameters
|
• |
src_uri ( str ) |
|||
|
• |
writable ( IO [ bytes ]) |
classmethod open_url(src_uri)
Read from the given URI.
Raises FileNotFoundError if the URL doesn't exist.
Has a readable
stream interface, unlike
read_from_url()
which takes
a writable stream.
Parameters
src_uri ( str )
Return type
IO [ bytes ]
abstract destroy()
The inverse of
initialize()
, this method deletes the physical
storage represented by this instance. While not being
atomic, this method
is
at least idempotent, as a
means to counteract potential issues with eventual
consistency exhibited by the underlying storage mechanisms.
This means that if the method fails (raises an exception),
it may (and should be) invoked again. If the underlying
storage mechanism is eventually consistent, even a
successful invocation is not an ironclad guarantee that the
physical storage vanished completely and immediately. A
successful invocation only guarantees that the deletion will
eventually happen. It is therefore recommended to not
immediately reuse the same job store location for a new Toil
workflow.
Return type
None
get_env()
Returns a dictionary of
environment variables that this job store requires to be set
in order to function properly on a worker.
Return type
dict [ str , str ]
clean(jobCache=None)
Function to cleanup the state of a job store after a restart.
Fixes jobs that
might have been partially updated. Resets the try counts and
removes jobs that are not successors of the current root
job.
Parameters
jobCache ( Optional [ dict [ Union [ str , TemporaryID ], JobDescription ]]) -- if a value it must be a dict from job ID keys to JobDescription object values. Jobs will be loaded from the cache (which can be downloaded from the job store in a batch) instead of piecemeal when recursed into.
Return type
JobDescription
abstract assign_job_id(job_description)
Get a new jobStoreID to be used by the described job, and assigns it to the JobDescription.
Files
associated with the assigned ID will be accepted even if the
JobDescription has never been created or updated.
Parameters
|
• |
job_description ( JobDescription ) -- The JobDescription to give an ID to |
||
|
• |
job_description |
Return type
None
batch()
If supported by the batch
system, calls to create() with this context manager active
will be performed in a batch after the context manager is
released.
Return type
Iterator [ None ]
abstract create_job(job_description)
Writes the given JobDescription to the job store. The job must have an ID assigned already.
Must call
jobDescription.pre_update_hook()
Returns
The JobDescription passed.
Return type
toil.job.JobDescription
Parameters
job_description ( JobDescription )
abstract job_exists(job_id)
Indicates whether a description
of the job with the specified jobStoreID exists in the job
store
Return type
bool
Parameters
job_id ( str )
abstract get_public_url(file_name)
Returns a publicly accessible
URL to the given file in the job store. The returned URL may
expire as early as 1h after its been returned. Throw an
exception if the file does not exist.
Parameters
|
• |
file_name ( str ) -- the jobStoreFileID of the file to generate a URL for |
||
|
• |
file_name |
||
|
Raises |
NoSuchFileException -- if the specified file does not exist in this job store
Return type
str
abstract get_shared_public_url(shared_file_name)
Differs from getPublicUrl() in that this method is for generating URLs for shared files written by writeSharedFileStream() .
Returns a
publicly accessible URL to the given file in the job store.
The returned URL starts with 'http:', 'https:' or 'file:'.
The returned URL may expire as early as 1h after its been
returned. Throw an exception if the file does not exist.
Parameters
|
• |
shared_file_name ( str ) -- The name of the shared file to generate a publically accessible url for. |
||
|
• |
shared_file_name |
||
|
Raises |
NoSuchFileException -- raised if the specified file does not exist in the store
Return type
str
abstract load_job(job_id)
Loads the description of the job referenced by the given ID, assigns it the job store's config, and returns it.
May declare the
job to have failed (see
toil.job.JobDescription.setupJobAfterFailure()
) if
there is evidence of a failed update attempt.
Parameters
job_id ( str ) -- the ID of the job to load
|
Raises |
NoSuchJobException -- if there is no job with the given ID |
Return type
JobDescription
abstract update_job(job_description)
Persists changes to the state of the given JobDescription in this store atomically.
Must call
jobDescription.pre_update_hook()
Parameters
|
• |
job ( toil.job.JobDescription ) -- the job to write to this job store |
||
|
• |
job_description ( JobDescription ) |
Return type
None
abstract delete_job(job_id)
Removes the JobDescription from the store atomically. You may not then subsequently call load(), write(), update(), etc. with the same jobStoreID or any JobDescription bearing it.
This operation
is idempotent, i.e. deleting a job twice or deleting a
non-existent job will succeed silently.
Parameters
|
• |
job_id ( str ) -- the ID of the job to delete from this job store |
||
|
• |
job_id |
Return type
None
|
jobs() |
Best effort attempt to return iterator on JobDescriptions for all jobs in the store. The iterator may not return all jobs and may also contain orphaned jobs that have already finished successfully and should not be rerun. To guarantee you get any and all jobs that can be run instead construct a more expensive ToilState object |
Returns
Returns iterator on jobs in the store. The iterator may or may not contain all jobs and may contain invalid jobs
Return type
Iterator[toil.job.jobDescription]
abstract write_file(local_path, job_id=None, cleanup=False)
Takes a file (as a path) and
places it in this job store. Returns an ID that can be used
to retrieve the file at a later time. The file is written in
a atomic manner. It will not appear in the jobStore until
the write has successfully completed.
Parameters
|
• |
local_path ( str ) -- the path to the local file that will be uploaded to the job store. The last path component (basename of the file) will remain associated with the file in the file store, if supported, so that the file can be searched for by name or name glob. |
||
|
• |
job_id ( str ) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchJobException -- if the job specified via jobStoreID does not exist |
Return type
str
FIXME: some
implementations may not raise this
Returns
an ID referencing the newly created file and can be used to read the file in the future.
Return type
str
Parameters
|
• |
local_path ( str ) |
|||
|
• |
job_id ( Optional [ str ]) |
|||
|
• |
cleanup ( bool ) |
abstract
write_file_stream(job_id=None, cleanup=False,
basename=None, encoding=None, errors=None)
Similar to writeFile, but
returns a context manager yielding a tuple of 1) a file
handle which can be written to and 2) the ID of the
resulting file in the job store. The yielded file handle
does not need to and should not be closed explicitly. The
file is written in a atomic manner. It will not appear in
the jobStore until the write has successfully completed.
Parameters
|
• |
job_id ( str ) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
• |
basename ( str ) -- If supported by the implementation, use the given file basename so that when searching the job store with a query matching that basename, the file will be detected. |
||
|
• |
encoding ( str ) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchJobException -- if the job specified via jobStoreID does not exist |
Return type
Iterator [ tuple [ IO [ bytes ], str ]]
FIXME: some
implementations may not raise this
Returns
a context manager yielding a file handle which can be written to and an ID that references the newly created file and can be used to read the file in the future.
Return type
Iterator[Tuple[IO[ bytes ], str ]]
Parameters
|
• |
job_id ( Optional [ str ]) |
|||
|
• |
cleanup ( bool ) |
|||
|
• |
basename ( Optional [ str ]) |
|||
|
• |
encoding ( Optional [ str ]) |
|||
|
• |
errors ( Optional [ str ]) |
abstract
get_empty_file_store_id(job_id=None, cleanup=False,
basename=None)
Creates an empty file in the
job store and returns its ID. Call to
fileExists(getEmptyFileStoreID(jobStoreID)) will return
True.
Parameters
|
• |
job_id ( Optional [ str ]) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
• |
basename ( Optional [ str ]) -- If supported by the implementation, use the given file basename so that when searching the job store with a query matching that basename, the file will be detected. |
||
|
• |
job_id |
||
|
• |
cleanup |
||
|
• |
basename |
Returns
a jobStoreFileID that references the newly created file and can be used to reference the file in the future.
Return type
str
abstract read_file(file_id, local_path, symlink=False)
Copies or hard links the file referenced by jobStoreFileID to the given local file path. The version will be consistent with the last copy of the file written/updated. If the file in the job store is later modified via updateFile or updateFileStream, it is implementation-defined whether those writes will be visible at localFilePath. The file is copied in an atomic manner. It will not appear in the local file system until the copy has completed.
The file at the given local path may not be modified after this method returns!
Note!
Implementations of readFile need to respect/provide the
executable attribute on FileIDs.
Parameters
|
• |
file_id ( str ) -- ID of the file to be copied |
||
|
• |
local_path ( str ) -- the local path indicating where to place the contents of the given file in the job store |
||
|
• |
symlink ( bool ) -- whether the reader can tolerate a symlink. If set to true, the job store may create a symlink instead of a full copy of the file or a hard link. |
||
|
• |
file_id |
||
|
• |
local_path |
||
|
• |
symlink |
Return type
None
abstract read_file_stream(file_id, encoding=None, errors=None)
Similar to readFile, but
returns a context manager yielding a file handle which can
be read from. The yielded file handle does not need to and
should not be closed explicitly.
Parameters
|
• |
file_id ( Union [ FileID , str ]) -- ID of the file to get a readable file handle for |
||
|
• |
encoding ( Optional [ str ]) -- the name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( Optional [ str ]) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
• |
file_id |
||
|
• |
encoding |
||
|
• |
errors |
Returns
a context manager yielding a file handle which can be read from
Return type
Iterator[Union[IO[ bytes ], IO[ str ]]]
abstract delete_file(file_id)
Deletes the file with the given
ID from this job store. This operation is idempotent, i.e.
deleting a file twice or deleting a non-existent file will
succeed silently.
Parameters
|
• |
file_id ( str ) -- ID of the file to delete |
|||
|
• |
file_id |
Return type
None
fileExists(jobStoreFileID)
Determine whether a file exists
in this job store.
Parameters
jobStoreFileID ( str )
Return type
bool
abstract file_exists(file_id)
Determine whether a file exists
in this job store.
Parameters
file_id ( str ) -- an ID referencing the file to be checked
Return type
bool
getFileSize(jobStoreFileID)
Get the size of the given file
in bytes.
Parameters
jobStoreFileID ( str )
Return type
int
abstract get_file_size(file_id)
Get the size of the given file in bytes, or 0 if it does not exist when queried.
Note that job
stores which encrypt files might return overestimates of
file sizes, since the encrypted file may have been padded to
the nearest block, augmented with an initialization vector,
etc.
Parameters
|
• |
file_id ( str ) -- an ID referencing the file to be checked |
||
|
• |
file_id |
Return type
int
updateFile(jobStoreFileID, localFilePath)
Replaces the existing version
of a file in the job store.
Parameters
|
• |
jobStoreFileID ( str ) |
|||
|
• |
localFilePath ( str ) |
Return type
None
abstract update_file(file_id, local_path)
Replaces the existing version of a file in the job store.
Throws an
exception if the file does not exist.
Parameters
|
• |
file_id ( str ) -- the ID of the file in the job store to be updated |
||
|
• |
local_path ( str ) -- the local path to a file that will overwrite the current version in the job store |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchFileException -- if the specified file does not exist |
Return type
None
abstract update_file_stream(file_id, encoding=None, errors=None)
Replaces the existing version
of a file in the job store. Similar to writeFile, but
returns a context manager yielding a file handle which can
be written to. The yielded file handle does not need to and
should not be closed explicitly.
Parameters
|
• |
file_id ( str ) -- the ID of the file in the job store to be updated |
||
|
• |
encoding ( Optional [ str ]) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( Optional [ str ]) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
• |
file_id |
||
|
• |
encoding |
||
|
• |
errors |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchFileException -- if the specified file does not exist |
Return type
Iterator [ IO [ Any ]]
abstract
write_shared_file_stream(shared_file_name,
encrypted=None, encoding=None, errors=None)
Returns a context manager
yielding a writable file handle to the global file
referenced by the given name. File will be created in an
atomic manner.
Parameters
|
• |
shared_file_name ( str ) -- A file name matching AbstractJobStore.fileNameRegex, unique within this job store |
||
|
• |
encrypted ( Optional [ bool ]) -- True if the file must be encrypted, None if it may be encrypted or False if it must be stored in the clear. |
||
|
• |
encoding ( Optional [ str ]) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( Optional [ str ]) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
• |
shared_file_name |
||
|
• |
encrypted |
||
|
• |
encoding |
||
|
• |
errors |
||
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method
Returns
a context manager yielding a writable file handle
Return type
Iterator[IO[ bytes ]]
abstract
read_shared_file_stream(shared_file_name,
encoding=None, errors=None)
Returns a context manager
yielding a readable file handle to the global file
referenced by the given name.
Parameters
|
• |
shared_file_name ( str ) -- A file name matching AbstractJobStore.fileNameRegex, unique within this job store |
||
|
• |
encoding ( Optional [ str ]) -- the name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( Optional [ str ]) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
• |
shared_file_name |
||
|
• |
encoding |
||
|
• |
errors |
Returns
a context manager yielding a readable file handle
Return type
Iterator[IO[ bytes ]]
abstract write_logs(msg)
Stores a message as a log in
the jobstore.
Parameters
|
• |
msg ( str ) -- the string to be written |
|||
|
• |
msg |
|||
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method
Return type
None
abstract read_logs(callback, read_all=False)
Reads logs accumulated by the write_logs() method. For each log this method calls the given callback function with the message as an argument (rather than returning logs directly, this method must be supplied with a callback which will process log messages).
Only unread
logs will be read unless the read_all parameter is set.
Parameters
|
• |
callback ( Callable [ ... , Any ]) -- a function to be applied to each of the stats file handles found |
||
|
• |
read_all ( bool ) -- a boolean indicating whether to read the already processed stats files in addition to the unread stats files |
||
|
• |
callback |
||
|
• |
read_all |
||
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method
Returns
the number of stats files processed
Return type
int
write_leader_pid()
Write the pid of this process to a file in the job store.
Overwriting the
current contents of pid.log is a feature, not a bug of this
method. Other methods will rely on always having the most
current pid available. So far there is no reason to store
any old pids.
Return type
None
read_leader_pid()
Read the pid of the leader process to a file in the job store.
|
Raises |
NoSuchFileException -- If the PID file doesn't exist. |
Return type
int
write_leader_node_id()
Write the leader node id to the
job store. This should only be called by the leader.
Return type
None
read_leader_node_id()
Read the leader node id stored in the job store.
|
Raises |
NoSuchFileException -- If the node ID file doesn't exist. |
Return type
str
write_kill_flag(kill=False)
Write a file inside the job store that serves as a kill flag.
The initialized file contains the characters "NO". This should only be changed when the user runs the "toil kill" command.
Changing this
file to a "YES" triggers a kill of the leader
process. The workers are expected to be cleaned up by the
leader.
Parameters
kill ( bool )
Return type
None
read_kill_flag()
Read the kill flag from the job
store, and return True if the leader has been killed. False
otherwise.
Return type
bool
default_caching()
Jobstore's preference as to whether it likes caching or doesn't care about it. Some jobstores benefit from caching, however on some local configurations it can be flaky.
see
https://github.com/DataBiosphere/toil/issues/4218
Return type
bool
TOIL JOB API
Functions to wrap jobs and return values (promises).
FunctionWrappingJob
The subclass of
Job for wrapping user functions.
class toil.job.FunctionWrappingJob(userFunction, *args,
**kwargs)
Job used to wrap a function. In
its
run
method the wrapped function is called.
__init__(userFunction, *args, **kwargs)
Parameters
userFunction ( callable ) -- The function to wrap. It will be called with *args and **kwargs as arguments.
The keywords memory , cores , disk , accelerators`, ``preemptible and checkpoint are reserved keyword arguments that if specified will be used to determine the resources required for the job, as toil.job.Job.__init__() . If they are keyword arguments to the function they will be extracted from the function definition, but may be overridden by the user (as you would expect).
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
JobFunctionWrappingJob
The subclass of
FunctionWrappingJob for wrapping user job functions.
class toil.job.JobFunctionWrappingJob(userFunction, *args,
**kwargs)
A job function is a function whose first argument is a Job instance that is the wrapping job for the function. This can be used to add successor jobs for the function and perform all the functions the Job class provides.
To enable the job function to get access to the toil.fileStores.abstractFileStore.AbstractFileStore instance (see toil.job.Job.run() ), it is made a variable of the wrapping job called fileStore.
To specify a job's resource requirements the following default keyword arguments can be specified:
|
• |
memory |
|||
|
• |
disk |
|||
|
• |
cores |
|||
|
• |
accelerators |
|||
|
• |
preemptible |
For example to wrap a function into a job we would call:
Job.wrapJobFn(myJob, memory='100k', disk='1M', cores=0.1)
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
EncapsulatedJob
The subclass of
Job for
encapsulating
a job, allowing a subgraph of
jobs to be treated as a single job.
class toil.job.EncapsulatedJob(job, unitName=None)
A convenience Job class used to make a job subgraph appear to be a single job.
Let A be the root job of a job subgraph and B be another job we'd like to run after A and all its successors have completed, for this use encapsulate:
# Job A and
subgraph, Job B
A, B = A(), B()
Aprime = A.encapsulate()
Aprime.addChild(B)
# B will run after A and all its successors have completed,
A and its subgraph of
# successors in effect appear to be just one job.
If the job being encapsulated has predecessors (e.g. is not the root job), then the encapsulated job will inherit these predecessors. If predecessors are added to the job being encapsulated after the encapsulated job is created then the encapsulating job will NOT inherit these predecessors automatically. Care should be exercised to ensure the encapsulated job has the proper set of predecessors.
The return
value of an encapsulated job (as accessed by the
toil.job.Job.rv()
function) is the return value of
the root job, e.g. A().encapsulate().rv() and A().rv() will
resolve to the same value after A or A.encapsulate() has
been run.
__init__(job, unitName=None)
Parameters
|
• |
job ( toil.job.Job ) -- the job to encapsulate. |
||
|
• |
unitName ( str ) -- human-readable name to identify this job instance. |
addChild(childJob)
Add a childJob to be run as child of this job.
Child jobs will
be run directly after this job's
toil.job.Job.run()
method has completed.
Returns
childJob: for call chaining
addService(service, parentService=None)
Add a service.
The toil.job.Job.Service.start() method of the service will be called after the run method has completed but before any successors are run. The service's toil.job.Job.Service.stop() method will be called once the successors of the job have been run.
Services allow things like databases and servers to be started and accessed by jobs in a workflow.
|
Raises |
toil.job.JobException -- If service has already been made the child of a job or another service. |
Parameters
|
• |
service -- Service to add. |
||
|
• |
parentService -- Service that will be started before 'service' is started. Allows trees of services to be established. parentService must be a service of this job. |
Returns
a promise that will be replaced with the return value from toil.job.Job.Service.start() of service in any successor of the job.
addFollowOn(followOnJob)
Add a follow-on job.
Follow-on jobs
will be run after the child jobs and their successors have
been run.
Returns
followOnJob for call chaining
rv(*path)
Create a promise ( toil.job.Promise ).
The
"promise" representing a return value of the job's
run method, or, in case of a function-wrapping job, the
wrapped function's return value.
Parameters
path ( (Any) ) -- Optional path for selecting a component of the promised return value. If absent or empty, the entire return value will be used. Otherwise, the first element of the path is used to select an individual item of the return value. For that to work, the return value must be a list, dictionary or of any other type implementing the __getitem__() magic method. If the selected item is yet another composite value, the second element of the path can be used to select an item from it, and so on. For example, if the return value is [6,{'a':42}] , .rv(0) would select 6 , rv(1) would select {'a':3} while rv(1,'a') would select 3 . To select a slice from a return value that is slicable, e.g. tuple or list, the path element should be a slice object. For example, assuming that the return value is [6, 7, 8, 9] then .rv(slice(1, 3)) would select [7, 8] . Note that slicing really only makes sense at the end of path.
Return type
Promise
Returns
A promise representing the return value of this jobs toil.job.Job.run() method.
prepareForPromiseRegistration(jobStore)
Set up to allow this job's promises to register themselves.
Prepare this job (the promisor) so that its promises can register themselves with it, when the jobs they are promised to (promisees) are serialized.
The promissee holds the reference to the promise (usually as part of the job arguments) and when it is being pickled, so will the promises it refers to. Pickling a promise triggers it to be registered with the promissor.
Promise
The class used
to reference return values of jobs/services not yet
run/started.
class toil.job.Promise(job, path)
References a return value from a method as a promise before the method itself is run.
References a return value from a toil.job.Job.run() or toil.job.Job.Service.start() method as a promise before the method itself is run.
Let T be a job.
Instances of
Promise
(termed a
promise
) are
returned by T.rv(), which is used to reference the return
value of T's run function. When the promise is passed to the
constructor (or as an argument to a wrapped function) of a
different, successor job the promise will be replaced by the
actual referenced return value. This mechanism allows a
return values from one job's run method to be input argument
to job before the former job's run function has been
executed.
Parameters
|
• |
job ( Job ) |
|||
|
• |
path ( Any ) |
Return type
Promise
filesToDelete = {}
A set of IDs of files containing promised values when we know we won't need them anymore
__init__(job, path)
Initialize this promise.
Parameters
|
• |
job ( Job ) -- the job whose return value this promise references |
||
|
• |
path ( Any ) -- see Job.rv() |
||
|
• |
job |
class toil.job.PromisedRequirement(valueOrCallable, *args)
Class for dynamically allocating job function resource requirements.
(involving toil.job.Promise instances.)
Use when resource requirements depend on the return value of a parent function. PromisedRequirements can be modified by passing a function that takes the Promise as input.
For example,
let f, g, and h be functions. Then a Toil workflow can be
defined as follows:: A = Job.wrapFn(f) B = A.addChildFn(g,
cores=PromisedRequirement(A.rv()) C = B.addChildFn(h,
cores=PromisedRequirement(lambda x: 2*x, B.rv()))
__init__(valueOrCallable, *args)
Initialize this Promised
Requirement.
Parameters
|
• |
valueOrCallable -- A single Promise instance or a function that takes args as input parameters. |
||
|
• |
args ( int or .Promise ) -- variable length argument list |
getValue()
Return PromisedRequirement value.
static convertPromises(kwargs)
Return True if reserved resource keyword is a Promise or PromisedRequirement instance.
Converts
Promise instance to PromisedRequirement.
Parameters
kwargs ( dict [ str , Any ]) -- function keyword arguments
Return type
bool
JOB METHODS API
Jobs are the
units of work in Toil which are composed into workflows.
class toil.job.Job(memory=None, cores=None, disk=None,
accelerators=None, preemptible=None, preemptable=None,
unitName='',
checkpoint=False, displayName='', descriptionClass=None,
local=None,
files=None)
Class represents a unit of work
in toil.
Parameters
|
• |
memory ( Union [ str , int , None ]) |
||
|
• |
cores ( Union [ str , int , float , None ]) |
||
|
• |
disk ( Union [ str , int , None ]) |
||
|
• |
accelerators ( Union [ str , int , Mapping [ str , Any ], AcceleratorRequirement , Sequence [ Union [ str , int , Mapping [ str , Any ], AcceleratorRequirement ]], None ]) |
||
|
• |
preemptible ( Union [ str , int , bool , None ]) |
||
|
• |
preemptable ( Union [ str , int , bool , None ]) |
||
|
• |
unitName ( Optional [ str ]) |
||
|
• |
checkpoint ( Optional [ bool ]) |
||
|
• |
displayName ( Optional [ str ]) |
||
|
• |
descriptionClass ( Optional [ type ]) |
||
|
• |
local ( Optional [ bool ]) |
||
|
• |
files ( Optional [ set [ FileID ]]) |
__init__(memory=None,
cores=None, disk=None, accelerators=None,
preemptible=None, preemptable=None, unitName='',
checkpoint=False, displayName='', descriptionClass=None,
local=None, files=None)
Job initializer.
This method
must be called by any overriding constructor.
Parameters
|
• |
memory ( int or string convertible by toil.lib.conversions.human2bytes to an int ) -- the maximum number of bytes of memory the job will require to run. |
||
|
• |
cores ( float, int, or string convertible by toil.lib.conversions.human2bytes to an int ) -- the number of CPU cores required. |
||
|
• |
disk ( int or string convertible by toil.lib.conversions.human2bytes to an int ) -- the amount of local disk space required by the job, expressed in bytes. |
||
|
• |
accelerators ( int, string, dict, or list of those. Strings and dicts must be parseable by parse_accelerator. ) -- the computational accelerators required by the job. If a string, can be a string of a number, or a string specifying a model, brand, or API (with optional colon-delimited count). |
||
|
• |
preemptible ( bool, int in {0, 1}, or string in {'false', 'true'} in any case ) -- if the job can be run on a preemptible node. |
||
|
• |
preemptable ( Union [ str , int , bool , None ]) -- legacy preemptible parameter, for backwards compatibility with workflows not using the preemptible keyword |
||
|
• |
unitName ( str ) -- Human-readable name for this instance of the job. |
||
|
• |
checkpoint ( bool ) -- if any of this job's successor jobs completely fails, exhausting all their retries, remove any successor jobs and rerun this job to restart the subtree. Job must be a leaf vertex in the job graph when initially defined, see toil.job.Job.checkNewCheckpointsAreCutVertices() . |
||
|
• |
displayName ( str ) -- Human-readable job type display name. |
||
|
• |
descriptionClass ( class ) -- Override for the JobDescription class used to describe the job. |
||
|
• |
local ( Optional [ bool ]) -- if the job can be run on the leader. |
||
|
• |
files ( Optional [ set [ FileID ]]) -- Set of Files that the job will want to use. |
Return type
None
check_initialized()
Ensure that Job.__init__() has been called by any subclass __init__().
This uses the fact that the self._description instance variable should always be set after __init__().
If __init__()
has not been called, raise an error.
Return type
None
property jobStoreID: str | TemporaryID
Get the ID of this Job.
Return type
Union[ str , TemporaryID ]
property description: JobDescription
Expose the JobDescription that
describes this job.
Return type
JobDescription
property disk: int
The maximum number of bytes of
disk the job will require to run.
Return type
int
property memory
The maximum number of bytes of memory the job will require to run.
property cores: int | float
The number of CPU cores
required.
Return type
Union[ int , float ]
property accelerators: list [ AcceleratorRequirement ]
Any accelerators, such as GPUs,
that are needed.
Return type
list [ AcceleratorRequirement ]
property preemptible: bool
Whether the job can be run on a
preemptible node.
Return type
bool
property checkpoint: bool
Determine if the job is a
checkpoint job or not.
Return type
bool
assignConfig(config)
Assign the given config object.
It will be used
by various actions implemented inside the Job class.
Parameters
config ( Config ) -- Config object to query
Return type
None
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore ( AbstractFileStore ) -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Return type
Any
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
addChild(childJob)
Add a childJob to be run as child of this job.
Child jobs will
be run directly after this job's
toil.job.Job.run()
method has completed.
Return type
Job
Returns
childJob: for call chaining
Parameters
childJob ( Job )
hasChild(childJob)
Check if childJob is already a
child of this job.
Return type
bool
Returns
True if childJob is a child of the job, else False.
Parameters
childJob ( Job )
addFollowOn(followOnJob)
Add a follow-on job.
Follow-on jobs
will be run after the child jobs and their successors have
been run.
Return type
Job
Returns
followOnJob for call chaining
Parameters
followOnJob ( Job )
hasPredecessor(job)
Check if a given job is already
a predecessor of this job.
Parameters
job ( Job )
Return type
bool
hasFollowOn(followOnJob)
Check if given job is already a
follow-on of this job.
Return type
bool
Returns
True if the followOnJob is a follow-on of this job, else False.
Parameters
followOnJob ( Job )
addService(service, parentService=None)
Add a service.
The toil.job.Job.Service.start() method of the service will be called after the run method has completed but before any successors are run. The service's toil.job.Job.Service.stop() method will be called once the successors of the job have been run.
Services allow things like databases and servers to be started and accessed by jobs in a workflow.
|
Raises |
toil.job.JobException -- If service has already been made the child of a job or another service. |
Parameters
|
• |
service ( Service ) -- Service to add. |
||
|
• |
parentService ( Optional [ Service ]) -- Service that will be started before 'service' is started. Allows trees of services to be established. parentService must be a service of this job. |
Return type
Promise
Returns
a promise that will be replaced with the return value from toil.job.Job.Service.start() of service in any successor of the job.
hasService(service)
Return True if the given
Service is a service of this job, and False otherwise.
Parameters
service ( Service )
Return type
bool
addChildFn(fn, *args, **kwargs)
Add a function as a child job.
Parameters
fn ( Callable ) -- Function to be run as a child job with *args and **kwargs as arguments to this function. See toil.job.FunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Return type
FunctionWrappingJob
Returns
The new child job that wraps fn.
addFollowOnFn(fn, *args, **kwargs)
Add a function as a follow-on
job.
Parameters
fn ( Callable ) -- Function to be run as a follow-on job with *args and **kwargs as arguments to this function. See toil.job.FunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Return type
FunctionWrappingJob
Returns
The new follow-on job that wraps fn.
addChildJobFn(fn, *args, **kwargs)
Add a job function as a child job.
See
toil.job.JobFunctionWrappingJob
for a definition of a
job function.
Parameters
fn ( Callable ) -- Job function to be run as a child job with *args and **kwargs as arguments to this function. See toil.job.JobFunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Return type
FunctionWrappingJob
Returns
The new child job that wraps fn.
addFollowOnJobFn(fn, *args, **kwargs)
Add a follow-on job function.
See
toil.job.JobFunctionWrappingJob
for a definition of a
job function.
Parameters
fn ( Callable ) -- Job function to be run as a follow-on job with *args and **kwargs as arguments to this function. See toil.job.JobFunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Return type
FunctionWrappingJob
Returns
The new follow-on job that wraps fn.
property tempDir: str
Shortcut to calling job.fileStore.getLocalTempDir() .
Temp dir is
created on first call and will be returned for first and
future calls :return: Path to tempDir. See
job.fileStore.getLocalTempDir
Return type
str
log(text, level=20)
Log using
fileStore.log_to_leader()
.
Parameters
text ( str )
Return type
None
static wrapFn(fn, *args, **kwargs)
Makes a Job out of a function.
Convenience
function for constructor of
toil.job.FunctionWrappingJob
.
Parameters
fn -- Function to be run with *args and **kwargs as arguments. See toil.job.JobFunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Return type
FunctionWrappingJob
Returns
The new function that wraps fn.
static wrapJobFn(fn, *args, **kwargs)
Makes a Job out of a job function.
Convenience
function for constructor of
toil.job.JobFunctionWrappingJob
.
Parameters
fn -- Job function to be run with *args and **kwargs as arguments. See toil.job.JobFunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Return type
JobFunctionWrappingJob
Returns
The new job function that wraps fn.
encapsulate(name=None)
Encapsulates the job, see
toil.job.EncapsulatedJob
. Convenience function for
constructor of
toil.job.EncapsulatedJob
.
Parameters
name ( Optional [ str ]) -- Human-readable name for the encapsulated job.
Return type
EncapsulatedJob
Returns
an encapsulated version of this job.
rv(*path)
Create a promise ( toil.job.Promise ).
The
"promise" representing a return value of the job's
run method, or, in case of a function-wrapping job, the
wrapped function's return value.
Parameters
path ( (Any) ) -- Optional path for selecting a component of the promised return value. If absent or empty, the entire return value will be used. Otherwise, the first element of the path is used to select an individual item of the return value. For that to work, the return value must be a list, dictionary or of any other type implementing the __getitem__() magic method. If the selected item is yet another composite value, the second element of the path can be used to select an item from it, and so on. For example, if the return value is [6,{'a':42}] , .rv(0) would select 6 , rv(1) would select {'a':3} while rv(1,'a') would select 3 . To select a slice from a return value that is slicable, e.g. tuple or list, the path element should be a slice object. For example, assuming that the return value is [6, 7, 8, 9] then .rv(slice(1, 3)) would select [7, 8] . Note that slicing really only makes sense at the end of path.
Return type
Promise
Returns
A promise representing the return value of this jobs toil.job.Job.run() method.
prepareForPromiseRegistration(jobStore)
Set up to allow this job's promises to register themselves.
Prepare this job (the promisor) so that its promises can register themselves with it, when the jobs they are promised to (promisees) are serialized.
The promissee
holds the reference to the promise (usually as part of the
job arguments) and when it is being pickled, so will the
promises it refers to. Pickling a promise triggers it to be
registered with the promissor.
Parameters
jobStore ( AbstractJobStore )
Return type
None
checkJobGraphForDeadlocks()
Ensures that a graph of Jobs (that hasn't yet been saved to the JobStore) doesn't contain any pathological relationships between jobs that would result in deadlocks if we tried to run the jobs.
See toil.job.Job.checkJobGraphConnected() , toil.job.Job.checkJobGraphAcyclic() and toil.job.Job.checkNewCheckpointsAreLeafVertices() for more info.
|
Raises |
toil.job.JobGraphDeadlockException -- if the job graph is cyclic, contains multiple roots or contains checkpoint jobs that are not leaf vertices when defined (see toil.job.Job.checkNewCheckpointsAreLeaves() ). |
getRootJobs()
Return the set of root job objects that contain this job.
A root job is a job with no predecessors (i.e. which are not children, follow-ons, or services).
Only deals with
jobs created here, rather than loaded from the job store.
Return type
set [ Job ]
checkJobGraphConnected()
|
Raises |
toil.job.JobGraphDeadlockException -- if toil.job.Job.getRootJobs() does not contain exactly one root job. |
As execution always starts from one root job, having multiple root jobs will cause a deadlock to occur.
Only deals with jobs created here, rather than loaded from the job store.
checkJobGraphAcylic()
|
Raises |
toil.job.JobGraphDeadlockException -- if the connected component of jobs containing this job contains any cycles of child/followOn dependencies in the augmented job graph (see below). Such cycles are not allowed in valid job graphs. |
A follow-on edge (A, B) between two jobs A and B is equivalent to adding a child edge to B from (1) A, (2) from each child of A, and (3) from the successors of each child of A. We call each such edge an edge an "implied" edge. The augmented job graph is a job graph including all the implied edges.
For a job graph G = (V, E) the algorithm is O(|V|ˆ2) . It is O(|V| + |E|) for a graph with no follow-ons. The former follow-on case could be improved!
Only deals with jobs created here, rather than loaded from the job store.
checkNewCheckpointsAreLeafVertices()
A checkpoint job is a job that is restarted if either it fails, or if any of its successors completely fails, exhausting their retries.
A job is a leaf it is has no successors.
A checkpoint job must be a leaf when initially added to the job graph. When its run method is invoked it can then create direct successors. This restriction is made to simplify implementation.
Only works on connected components of jobs not yet added to the JobStore.
|
Raises |
toil.job.JobGraphDeadlockException -- if there exists a job being added to the graph for which checkpoint=True and which is not a leaf. |
Return type
None
defer(function, *args, **kwargs)
Register a deferred function, i.e. a callable that will be invoked after the current attempt at running this job concludes. A job attempt is said to conclude when the job function (or the toil.job.Job.run() method for class-based jobs) returns, raises an exception or after the process running it terminates abnormally. A deferred function will be called on the node that attempted to run the job, even if a subsequent attempt is made on another node. A deferred function should be idempotent because it may be called multiple times on the same node or even in the same process. More than one deferred function may be registered per job attempt by calling this method repeatedly with different arguments. If the same function is registered twice with the same or different arguments, it will be called twice per job attempt.
Examples for
deferred functions are ones that handle cleanup of resources
external to Toil, like Docker containers, files outside the
work directory, etc.
Parameters
|
• |
function ( callable ) -- The function to be called after this job concludes. |
||
|
• |
args ( list ) -- The arguments to the function |
||
|
• |
kwargs ( dict ) -- The keyword arguments to the function |
Return type
None
getTopologicalOrderingOfJobs()
Return type
list [ Job ]
Returns
a list of jobs such that for all pairs of indices i, j for which i < j, the job at index i can be run before the job at index j.
Only considers jobs in this job's subgraph that are newly added, not loaded from the job store.
Ignores service jobs.
saveBody(jobStore)
Save the execution data for just this job to the JobStore, and fill in the JobDescription with the information needed to retrieve it.
The Job's JobDescription must have already had a real jobStoreID assigned to it.
Does not save
the JobDescription.
Parameters
jobStore ( AbstractJobStore ) -- The job store to save the job body into.
Return type
None
saveAsRootJob(jobStore)
Save this job to the given
jobStore as the root job of the workflow.
Return type
JobDescription
Returns
the JobDescription describing this job.
Parameters
jobStore ( AbstractJobStore )
classmethod loadJob(job_store, job_description)
Retrieves a
toil.job.Job
instance from a JobStore
Parameters
|
• |
job_store ( AbstractJobStore ) -- The job store. |
||
|
• |
job_description ( JobDescription ) -- the JobDescription of the job to retrieve. |
Return type
Job
Returns
The job referenced by the JobDescription.
set_debug_flag(flag)
Enable the given debug option
on the job.
Parameters
flag ( str )
Return type
None
has_debug_flag(flag)
Return true if the given debug
flag is set.
Parameters
flag ( str )
Return type
bool
files_downloaded_hook(host_and_job_paths=None)
Function that subclasses can call when they have downloaded their input files.
Will abort the job if the "download_only" debug flag is set.
Can be hinted a
list of file path pairs outside and inside the job
container, in which case the container environment can be
reconstructed.
Parameters
host_and_job_paths ( Optional [ list [ tuple [ str , str ]]])
Return type
None
JobDescription
The class used
to store all the information that the Toil Leader ever needs
to know about a Job.
class toil.job.JobDescription(requirements, jobName,
unitName='',
displayName='', local=None, files=None)
Stores all the information that
the Toil Leader ever needs to know about a Job.
This includes:
|
• |
Resource requirements. |
||
|
• |
Which jobs are children or follow-ons or predecessors of this job. |
||
|
• |
A reference to the Job object in the job store. |
Can be obtained from an actual (i.e. executable) Job object, and can be used to obtain the Job object from the JobStore.
Never contains other Jobs or JobDescriptions: all reference is by ID.
Subclassed into
variants for checkpoint jobs and service jobs that have
their specific parameters.
Parameters
|
• |
requirements ( Mapping [ str , Union [ int , str , bool ]]) |
|||
|
• |
jobName ( str ) |
|||
|
• |
unitName ( Optional [ str ]) |
|||
|
• |
displayName ( Optional [ str ]) |
|||
|
• |
local ( Optional [ bool ]) |
|||
|
• |
files ( Optional [ set [ FileID ]]) |
__init__(requirements,
jobName, unitName='', displayName='',
local=None, files=None)
Create a new JobDescription.
Parameters
|
• |
requirements ( Mapping [ str , Union [ int , str , bool ]]) -- Dict from string to number, string, or bool describing the resource requirements of the job. 'cores', 'memory', 'disk', and 'preemptible' fields, if set, are parsed and broken out into properties. If unset, the relevant property will be unspecified, and will be pulled from the assigned Config object if queried (see toil.job.Requirer.assignConfig() ). |
||
|
• |
jobName ( str ) -- Name of the kind of job this is. May be used in job store IDs and logging. Also used to let the cluster scaler learn a model for how long the job will take. Ought to be the job class's name if no real user-defined name is available. |
||
|
• |
unitName ( Optional [ str ]) -- Name of this instance of this kind of job. May appear with jobName in logging. |
||
|
• |
displayName ( Optional [ str ]) -- A human-readable name to identify this particular job instance. Ought to be the job class's name if no real user-defined name is available. |
||
|
• |
local ( Optional [ bool ]) -- If True, the job is meant to use minimal resources but is sensitive to execution latency, and so should be executed by the leader. |
||
|
• |
files ( Optional [ set [ FileID ]]) -- Set of FileID objects that the job plans to use. |
Return type
None
get_names()
Get the names and ID of this
job as a named tuple.
Return type
Names
get_chain()
Get all the jobs that executed in this job's chain, in order.
For each job, produces a named tuple with its various names and its original job store ID. The jobs in the chain are in execution order.
If the job
hasn't run yet or it didn't chain, produces a one-item list.
Return type
list [ Names ]
serviceHostIDsInBatches()
Find all batches of service host job IDs that can be started at the same time.
(in the order
they need to start in)
Return type
Iterator [ list [ str ]]
successorsAndServiceHosts()
Get an iterator over all child,
follow-on, and service job IDs.
Return type
Iterator [ str ]
allSuccessors()
Get an iterator over all child, follow-on, and chained, inherited successor job IDs.
Follow-ons will
come before children.
Return type
Iterator [ str ]
successors_by_phase()
Get an iterator over all child/follow-on/chained inherited successor job IDs, along with their phase number on the stack.
Phases execute
higher numbers to lower numbers.
Return type
Iterator [ tuple [ int , str ]]
property services
Get a collection of the IDs of service host jobs for this job, in arbitrary order.
Will be empty if the job has no unfinished services.
has_body()
Returns True if we have a job
body associated, and False otherwise.
Return type
bool
attach_body(file_store_id, user_script)
Attach a job body to this JobDescription.
Takes the file store ID that the body is stored at, and the required user script module.
The file store
ID can also be "firstJob" for the root job, stored
as a shared file instead.
Parameters
|
• |
file_store_id ( str ) |
|||
|
• |
user_script ( ModuleDescriptor ) |
Return type
None
detach_body()
Drop the body reference from a
JobDescription.
Return type
None
get_body()
Get the information needed to
load the job body.
Return type
tuple [ str , ModuleDescriptor ]
Returns
a file store ID (or magic shared file name "firstJob") and a user script module.
Fails if no body is attached; check has_body() first.
nextSuccessors()
Return the collection of job IDs for the successors of this job that are ready to run.
If those jobs have multiple predecessor relationships, they may still be blocked on other jobs.
Returns None
when at the final phase (all successors done), and an empty
collection if there are more phases but they can't be
entered yet (e.g. because we are waiting for the job itself
to run).
Return type
Optional [ set [ str ]]
filterSuccessors(predicate)
Keep only successor jobs for which the given predicate function approves.
The predicate function is called with the job's ID.
Treats all
other successors as complete and forgets them.
Parameters
predicate ( Callable [[ str ], bool ])
Return type
None
filterServiceHosts(predicate)
Keep only services for which the given predicate approves.
The predicate function is called with the service host job's ID.
Treats all
other services as complete and forgets them.
Parameters
predicate ( Callable [[ str ], bool ])
Return type
None
clear_nonexistent_dependents(job_store)
Remove all references to child, follow-on, and associated service jobs that do not exist.
That is to say,
all those that have been completed and removed.
Parameters
job_store ( AbstractJobStore )
Return type
None
clear_dependents()
Remove all references to
successor and service jobs.
Return type
None
is_subtree_done()
Check if the subtree is done.
Return type
bool
Returns
True if the job appears to be done, and all related child, follow-on, and service jobs appear to be finished and removed.
replace(other)
Take on the ID of another JobDescription, retaining our own state and type.
When updated in the JobStore, we will save over the other JobDescription.
Useful for chaining jobs: the chained-to job can replace the parent job.
Merges cleanup
state and successors other than this job from the job being
replaced into this one.
Parameters
other ( JobDescription ) -- Job description to replace.
Return type
None
assert_is_not_newer_than(other)
Make sure this JobDescription
is not newer than a prospective new version of the
JobDescription.
Parameters
other ( JobDescription )
Return type
None
is_updated_by(other)
Return True if the passed
JobDescription is a distinct, newer version of this one.
Parameters
other ( JobDescription )
Return type
bool
addChild(childID)
Make the job with the given ID
a child of the described job.
Parameters
childID ( str )
Return type
None
addFollowOn(followOnID)
Make the job with the given ID
a follow-on of the described job.
Parameters
followOnID ( str )
Return type
None
addServiceHostJob(serviceID, parentServiceID=None)
Make the ServiceHostJob with the given ID a service of the described job.
If a parent ServiceHostJob ID is given, that parent service will be started first, and must have already been added.
hasChild(childID)
Return True if the job with the
given ID is a child of the described job.
Parameters
childID ( str )
Return type
bool
hasFollowOn(followOnID)
Test if the job with the given
ID is a follow-on of the described job.
Parameters
followOnID ( str )
Return type
bool
hasServiceHostJob(serviceID)
Test if the ServiceHostJob is a
service of the described job.
Return type
bool
renameReferences(renames)
Apply the given dict of ID renames to all references to jobs.
Does not modify
our own ID or those of finished predecessors. IDs not
present in the renames dict are left as-is.
Parameters
renames ( dict [ TemporaryID , str ]) -- Rename operations to apply.
Return type
None
addPredecessor()
Notify the JobDescription that
a predecessor has been added to its Job.
Return type
None
onRegistration(jobStore)
Perform setup work that requires the JobStore.
Called by the Job saving logic when this JobDescription meets the JobStore and has its ID assigned.
Overridden to
perform setup work (like hooking up flag files for service
jobs) that requires the JobStore.
Parameters
jobStore ( AbstractJobStore ) -- The job store we are being placed into
Return type
None
setupJobAfterFailure(exit_status=None, exit_reason=None)
Configure job after a failure.
Reduce the remainingTryCount if greater than zero and set the memory to be at least as big as the default memory (in case of exhaustion of memory, which is common).
Requires a
configuration to have been assigned (see
toil.job.Requirer.assignConfig()
).
Parameters
|
• |
exit_status ( Optional [ int ]) -- The exit code from the job. |
||
|
• |
exit_reason ( Optional [ BatchJobExitReason ]) -- The reason the job stopped, if available from the batch system. |
Return type
None
getLogFileHandle(jobStore)
Create a context manager that yields a file handle to the log file.
Assumes logJobStoreFileID is set.
property remainingTryCount
Get the number of tries remaining.
The try count set on the JobDescription, or the default based on the retry count from the config if none is set.
clearRemainingTryCount()
Clear remainingTryCount and set
it back to its default value.
Return type
bool
Returns
True if a modification to the JobDescription was made, and False otherwise.
reserve_versions(count)
Reserve a job version number
for later, for journaling asynchronously.
Parameters
count ( int )
Return type
None
pre_update_hook()
Run before pickling and saving a created or updated version of this job.
Called by the
job store.
Return type
None
JOB.RUNNER API
The Runner
contains the methods needed to configure and start a Toil
run.
class Job.Runner
Used to setup and run Toil
workflow.
static
getDefaultArgumentParser(jobstore_as_flag=False)
Get argument parser with added
toil workflow options.
Parameters
jobstore_as_flag ( bool ) -- make the job store option a --jobStore flag instead of a required jobStore positional argument.
Return type
ArgumentParser
Returns
The argument parser used by a toil workflow with added Toil options.
static getDefaultOptions(jobStore=None, jobstore_as_flag=False)
Get default options for a toil
workflow.
Parameters
|
• |
jobStore ( Optional [ str ]) -- A string describing the jobStore for the workflow. |
||
|
• |
jobstore_as_flag ( bool ) -- make the job store option a --jobStore flag instead of a required jobStore positional argument. |
Return type
Namespace
Returns
The options used by a toil workflow.
static addToilOptions(parser, jobstore_as_flag=False)
Adds the default toil options
to an
optparse
or
argparse
parser object.
Parameters
|
• |
parser ( Union [ OptionParser , ArgumentParser ]) -- Options object to add toil options to. |
||
|
• |
jobstore_as_flag ( bool ) -- make the job store option a --jobStore flag instead of a required jobStore positional argument. |
Return type
None
static startToil(job, options)
Run the toil workflow using the given options.
Deprecated by toil.common.Toil.start.
(see
Job.Runner.getDefaultOptions and Job.Runner.addToilOptions)
starting with this job. :type job:
Job
:param job:
root job of the workflow :raises:
toil.exceptions.FailedJobsException if at the end of
function there remain failed jobs. :rtype:
Any
:return: The return value of the root job's run function.
Parameters
job ( Job )
Return type
Any
JOB.FILESTORE API
The
AbstractFileStore is an abstraction of a Toil run's shared
storage.
class
toil.fileStores.abstractFileStore.AbstractFileStore(jobStore,
jobDesc, file_store_dir, waitForPreviousCommit)
Interface used to allow user code run by Toil to read and write files.
Also provides the interface to other Toil facilities used by user code, including:
|
• |
normal (non-real-time) logging |
|||
|
• |
finding the correct temporary directory for scratch work |
|||
|
• |
importing and exporting files into and out of the workflow |
Stores user files in the jobStore, but keeps them separate from actual jobs.
May implement caching.
Passed as argument to the toil.job.Job.run() method.
Access to files is only permitted inside the context manager provided by toil.fileStores.abstractFileStore.AbstractFileStore.open() .
Also
responsible for committing completed jobs back to the job
store with an update operation, and allowing that commit
operation to be waited for.
Parameters
|
• |
jobStore ( AbstractJobStore ) |
|||
|
• |
jobDesc ( JobDescription ) |
|||
|
• |
file_store_dir ( str ) |
|||
|
• |
waitForPreviousCommit ( Callable [[], Any ]) |
__init__(jobStore, jobDesc,
file_store_dir,
waitForPreviousCommit)
Create a new file store object.
Parameters
|
• |
jobStore ( AbstractJobStore ) -- the job store in use for the current Toil run. |
||
|
• |
jobDesc ( JobDescription ) -- the JobDescription object for the currently running job. |
||
|
• |
file_store_dir ( str ) -- the per-worker local temporary directory where the file store should store local files. Per-job directories will be created under here by the file store. |
||
|
• |
waitForPreviousCommit ( Callable [[], Any ]) -- the waitForCommit method of the previous job's file store, when jobs are running in sequence on the same worker. Used to prevent this file store's startCommit and the previous job's startCommit methods from running at the same time and racing. If they did race, it might be possible for the later job to be fully marked as completed in the job store before the eralier job was. |
Return type
None
static
createFileStore(jobStore, jobDesc, file_store_dir,
waitForPreviousCommit, caching)
Create a concreate FileStore.
Parameters
|
• |
jobStore ( AbstractJobStore ) |
|||
|
• |
jobDesc ( JobDescription ) |
|||
|
• |
file_store_dir ( str ) |
|||
|
• |
waitForPreviousCommit ( Callable [[], Any ]) |
|||
|
• |
caching ( Optional [ bool ]) |
Return type
Union [ NonCachingFileStore , CachingFileStore ]
static
shutdownFileStore(workflowID, config_work_dir,
config_coordination_dir)
Carry out any necessary filestore-specific cleanup.
This is a destructive operation and it is important to ensure that there are no other running processes on the system that are modifying or using the file store for this workflow.
This is the
intended to be the last call to the file store in a Toil
run, called by the batch system cleanup function upon batch
system shutdown.
Parameters
|
• |
workflowID ( str ) -- The workflow ID for this invocation of the workflow |
||
|
• |
config_work_dir ( Optional [ str ]) -- The path to the work directory in the Toil Config. |
||
|
• |
config_coordination_dir ( Optional [ str ]) -- The path to the coordination directory in the Toil Config. |
Return type
None
open(job)
Create the context manager around tasks prior and after a job has been run.
File operations are only permitted inside the context manager.
Implementations
must only yield from within
with super().open(job):
.
Parameters
job ( Job ) -- The job instance of the toil job to run.
Return type
Generator [ None , None , None ]
get_disk_usage()
Get the number of bytes of disk used by the last job run under open().
Disk usage is
measured at the end of the job. TODO: Sample periodically
and record peak usage.
Return type
Optional [ int ]
getLocalTempDir()
Get a new local temporary directory in which to write files.
The directory
will only persist for the duration of the job.
Return type
str
Returns
The absolute path to a new local temporary directory. This directory will exist for the duration of the job only, and is guaranteed to be deleted once the job terminates, removing all files it contains recursively.
getLocalTempFile(suffix=None, prefix=None)
Get a new local temporary file
that will persist for the duration of the job.
Parameters
|
• |
suffix ( Optional [ str ]) -- If not None, the file name will end with this string. Otherwise, default value ".tmp" will be used |
||
|
• |
prefix ( Optional [ str ]) -- If not None, the file name will start with this string. Otherwise, default value "tmp" will be used |
Return type
str
Returns
The absolute path to a local temporary file. This file will exist for the duration of the job only, and is guaranteed to be deleted once the job terminates.
getLocalTempFileName(suffix=None, prefix=None)
Get a valid name for a new
local file. Don't actually create a file at the path.
Parameters
|
• |
suffix ( Optional [ str ]) -- If not None, the file name will end with this string. Otherwise, default value ".tmp" will be used |
||
|
• |
prefix ( Optional [ str ]) -- If not None, the file name will start with this string. Otherwise, default value "tmp" will be used |
Return type
str
Returns
Path to valid file
abstract writeGlobalFile(localFileName, cleanup=False)
Upload a file (as a path) to the job store.
If the file is in a FileStore-managed temporary directory (i.e. from toil.fileStores.abstractFileStore.AbstractFileStore.getLocalTempDir() ), it will become a local copy of the file, eligible for deletion by toil.fileStores.abstractFileStore.AbstractFileStore.deleteLocalFile() .
If an
executable file on the local filesystem is uploaded, its
executability will be preserved when it is downloaded again.
Parameters
|
• |
localFileName ( str ) -- The path to the local file to upload. The last path component (basename of the file) will remain associated with the file in the file store, if supported by the backing JobStore, so that the file can be searched for by name or name glob. |
||
|
• |
cleanup ( bool ) -- if True then the copy of the global file will be deleted once the job and all its successors have completed running. If not the global file must be deleted manually. |
Return type
FileID
Returns
an ID that can be used to retrieve the file.
writeGlobalFileStream(cleanup=False,
basename=None,
encoding=None, errors=None)
Similar to writeGlobalFile, but
allows the writing of a stream to the job store. The yielded
file handle does not need to and should not be closed
explicitly.
Parameters
|
• |
encoding ( Optional [ str ]) -- The name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( Optional [ str ]) -- Specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
• |
cleanup ( bool ) -- is as in toil.fileStores.abstractFileStore.AbstractFileStore.writeGlobalFile() . |
||
|
• |
basename ( Optional [ str ]) -- If supported by the backing JobStore, use the given file basename so that when searching the job store with a query matching that basename, the file will be detected. |
Return type
Iterator [ tuple [ WriteWatchingStream , FileID ]]
Returns
A context manager yielding a tuple of 1) a file handle which can be written to and 2) the toil.fileStores.FileID of the resulting file in the job store.
logAccess(fileStoreID, destination=None)
Record that the given file was read by the job.
(to be announced if the job fails)
If destination is not None, it gives the path that the file was downloaded to. Otherwise, assumes that the file was streamed.
Must be called
by
readGlobalFile()
and
readGlobalFileStream()
implementations.
Parameters
|
• |
fileStoreID ( Union [ FileID , str ]) |
|||
|
• |
destination ( Optional [ str ]) |
Return type
None
abstract
readGlobalFile(fileStoreID, userPath=None, cache=True,
mutable=False, symlink=False)
Make the file associated with fileStoreID available locally.
If mutable is True, then a copy of the file will be created locally so that the original is not modified and does not change the file for other jobs. If mutable is False, then a link can be created to the file, saving disk resources. The file that is downloaded will be executable if and only if it was originally uploaded from an executable file on the local filesystem.
If a user path is specified, it is used as the destination. If a user path isn't specified, the file is stored in the local temp directory with an encoded name.
The destination file must not be deleted by the user; it can only be deleted through deleteLocalFile.
Implementations
must call
logAccess()
to report the download.
Parameters
|
• |
fileStoreID ( str ) -- job store id for the file |
||
|
• |
userPath ( Optional [ str ]) -- a path to the name of file to which the global file will be copied or hard-linked (see below). |
||
|
• |
cache ( bool ) -- Described in toil.fileStores.CachingFileStore.readGlobalFile() |
||
|
• |
mutable ( bool ) -- Described in toil.fileStores.CachingFileStore.readGlobalFile() |
||
|
• |
symlink ( bool ) -- True if caller can accept symlink, False if caller can only accept a normal file or hardlink |
Return type
str
Returns
An absolute path to a local, temporary copy of the file keyed by fileStoreID.
abstract
readGlobalFileStream(fileStoreID, encoding=None,
errors=None)
Read a stream from the job store; similar to readGlobalFile.
The yielded
file handle does not need to and should not be closed
explicitly.
Parameters
|
• |
encoding ( Optional [ str ]) -- the name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( Optional [ str ]) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
• |
fileStoreID ( str ) |
Return type
ContextManager [ IO [ bytes ] | IO [ str ], bool | None]
Implementations
must call
logAccess()
to report the download.
Return type
ContextManager [ Union [ IO [ bytes ], IO [ str ]], bool | None ]
Returns
a context manager yielding a file handle which can be read from.
Parameters
|
• |
fileStoreID ( str ) |
|||
|
• |
encoding ( str | None ) |
|||
|
• |
errors ( str | None ) |
getGlobalFileSize(fileStoreID)
Get the size of the file pointed to by the given ID, in bytes.
If a FileID or something else with a non-None 'size' field, gets that.
Otherwise, asks the job store to poll the file's size.
Note that the
job store may overestimate the file's size, for example if
it is encrypted and had to be augmented with an IV or other
encryption framing.
Parameters
fileStoreID ( Union [ FileID , str ]) -- File ID for the file
Return type
int
Returns
File's size in bytes, as stored in the job store
abstract deleteLocalFile(fileStoreID)
Delete local copies of files associated with the provided job store ID.
Raises an OSError with an errno of errno.ENOENT if no such local copies exist. Thus, cannot be called multiple times in succession.
The files
deleted are all those previously read from this file ID via
readGlobalFile by the current job into the job's
file-store-provided temp directory, plus the file that was
written to create the given file ID, if it was written by
the current job from the job's file-store-provided temp
directory.
Parameters
fileStoreID ( Union [ FileID , str ]) -- File Store ID of the file to be deleted.
Return type
None
abstract deleteGlobalFile(fileStoreID)
Delete local files and then permanently deletes them from the job store.
To ensure that
the job can be restarted if necessary, the delete will not
happen until after the job's run method has completed.
Parameters
fileStoreID ( Union [ FileID , str ]) -- the File Store ID of the file to be deleted.
Return type
None
log_to_leader(text, level=20)
Send a logging message to the
leader. The message will also be logged by the worker at the
same level.
Parameters
|
• |
text ( str ) -- The string to log. |
|||
|
• |
level ( int ) -- The logging level. |
Return type
None
log_user_stream(name, stream)
Send a stream of UTF-8 text to the leader as a named log stream.
Useful for
things like the error logs of Docker containers. The leader
will show it to the user or organize it appropriately for
user-level log information.
Parameters
|
• |
name ( str ) -- A hierarchical, .-delimited string. |
||
|
• |
stream ( IO [ bytes ]) -- A stream of encoded text. Encoding errors will be tolerated. |
Return type
None
abstract startCommit(jobState=False)
Update the status of the job on the disk.
May bump the version number of the job.
May start an
asynchronous process. Call waitForCommit() to wait on that
process. You must waitForCommit() before committing any
further updates to the job. During the asynchronous process,
it is safe to modify the job; modifications after this call
will not be committed until the next call.
Parameters
jobState ( bool ) -- If True, commit the state of the FileStore's job, and file deletes. Otherwise, commit only file creates/updates.
Return type
None
abstract waitForCommit()
Blocks while startCommit is running.
This function is called by this job's successor to ensure that it does not begin modifying the job store until after this job has finished doing so.
Might be called
when startCommit is never called on a particular instance,
in which case it does not block.
Return type
bool
Returns
Always returns True
abstract classmethod shutdown(shutdown_info)
Shutdown the filestore on this node.
This is
intended to be called on batch system shutdown.
Parameters
shutdown_info ( Any ) -- The implementation-specific shutdown information, for shutting down the file store and removing all its state and all job local temp directories from the node.
Return type
None
class toil.fileStores.FileID(fileStoreID, size, executable=False)
A small wrapper around Python's builtin string class.
It is used to represent a file's ID in the file store, and has a size attribute that is the file's size in bytes. This object is returned by importFile and writeGlobalFile.
Calls into the
file store can use bare strings; size will be queried from
the job store if unavailable in the ID.
Parameters
|
• |
fileStoreID ( str ) |
|||
|
• |
size ( int ) |
|||
|
• |
executable ( bool ) |
|||
|
• |
args ( Any ) |
Return type
FileID
__init__(fileStoreID, size, executable=False)
Parameters
|
• |
fileStoreID ( str ) |
|||
|
• |
size ( int ) |
|||
|
• |
executable ( bool ) |
Return type
None
|
pack() |
Pack the FileID into a string so it can be passed through external code. |
Return type
str
classmethod unpack(packedFileStoreID)
Unpack the result of pack()
into a FileID object.
Parameters
packedFileStoreID ( str )
Return type
FileID
BATCH SYSTEM API
The batch system interface is used by Toil to abstract over different ways of running batches of jobs, for example on Slurm clusters, Kubernetes clusters, or a single node. The toil.batchSystems.abstractBatchSystem.AbstractBatchSystem API is implemented to run jobs using a given job management system.
Batch System Environment Variables
Environmental variables allow passing of scheduler specific parameters.
For SLURM there are two environment variables - the first applies to all jobs, while the second defined the partition to use for parallel jobs:
export
TOIL_SLURM_ARGS="-t 1:00:00 -q fatq"
export TOIL_SLURM_PE='multicore'
For TORQUE there are two environment variables - one for everything but the resource requirements, and another - for resources requirements (without the -l prefix):
export
TOIL_TORQUE_ARGS="-q fatq"
export TOIL_TORQUE_REQS="walltime=1:00:00"
For GridEngine (SGE, UGE), there is an additional environmental variable to define the parallel environment for running multicore jobs:
export
TOIL_GRIDENGINE_PE='smp'
export TOIL_GRIDENGINE_ARGS='-q batch.q'
For HTCondor, additional parameters can be included in the submit file passed to condor_submit:
export TOIL_HTCONDOR_PARAMS='requirements = TARGET.has_sse4_2 == true; accounting_group = test'
The environment variable is parsed as a semicolon-separated string of parameter = value pairs.
Batch System API
class toil.batchSystems.abstractBatchSystem.AbstractBatchSystem
An abstract base class to
represent the interface the batch system must provide to
Toil.
abstract classmethod supportsAutoDeployment()
Whether this batch system supports auto-deployment of the user script itself.
If it does, the setUserScript() can be invoked to set the resource object representing the user script.
Note to
implementors: If your implementation returns True here, it
should also override
Return type
bool
abstract classmethod supportsWorkerCleanup()
Whether this batch system supports worker cleanup.
Indicates
whether this batch system invokes
BatchSystemSupport.workerCleanup()
after the last job
for a particular workflow invocation finishes. Note that the
term
worker
refers to an entire node, not just a
worker process. A worker process may run more than one job
sequentially, and more than one concurrent worker process
may exist on a worker node, for the same workflow. The batch
system is said to
shut down
after the last worker
process terminates.
Return type
bool
setUserScript(userScript)
Set the user script for this workflow.
This method
must be called before the first job is issued to this batch
system, and only if
supportsAutoDeployment()
returns
True, otherwise it will raise an exception.
Parameters
userScript ( Resource ) -- the resource object representing the user script or module and the modules it depends on.
Return type
None
set_message_bus(message_bus)
Give the batch system an
opportunity to connect directly to the message bus, so that
it can send informational messages about the jobs it is
running to other Toil components.
Parameters
message_bus ( MessageBus )
Return type
None
abstract issueBatchJob(command, job_desc, job_environment=None)
Issues a job with the specified
command to the batch system and returns a unique job ID
number.
Parameters
|
• |
command ( str ) -- the command to execute somewhere to run the Toil worker process |
||
|
• |
job_desc ( JobDescription ) -- the JobDescription for the job being run |
||
|
• |
job_environment ( Optional [ dict [ str , str ]]) -- a collection of job-specific environment variables to be set on the worker. |
Return type
int
Returns
a unique job ID number that can be used to reference the newly issued job
abstract killBatchJobs(jobIDs)
Kills the given job IDs. After
returning, the killed jobs will not appear in the results of
getRunningBatchJobIDs. The killed job will not be returned
from getUpdatedBatchJob.
Parameters
jobIDs ( list [ int ]) -- list of IDs of jobs to kill
Return type
None
abstract getIssuedBatchJobIDs()
Gets all currently issued jobs
Return type
list [ int ]
Returns
A list of jobs (as job ID numbers) currently issued (may be running, or may be waiting to be run). Despite the result being a list, the ordering should not be depended upon.
abstract getRunningBatchJobIDs()
Gets a map of jobs as job ID
numbers that are currently running (not just waiting) and
how long they have been running, in seconds.
Return type
dict [ int , float ]
Returns
dictionary with currently running job ID number keys and how many seconds they have been running as the value
abstract getUpdatedBatchJob(maxWait)
Returns information about job that has updated its status (i.e. ceased running, either successfully or with an error). Each such job will be returned exactly once.
Does not return
info for jobs killed by killBatchJobs, although they may
cause None to be returned earlier than maxWait.
Parameters
maxWait ( int ) -- the number of seconds to block, waiting for a result
Return type
Optional [ UpdatedBatchJobInfo ]
Returns
If a result is available, returns UpdatedBatchJobInfo. Otherwise it returns None. wallTime is the number of seconds (a strictly positive float) in wall-clock time the job ran for, or None if this batch system does not support tracking wall time.
getSchedulingStatusMessage()
Get a log message fragment for the user about anything that might be going wrong in the batch system, if available.
If no useful message is available, return None.
This can be
used to report what resource is the limiting factor when
scheduling jobs, for example. If the leader thinks the
workflow is stuck, the message can be displayed to the user
to help them diagnose why it might be stuck.
Return type
Optional [ str ]
Returns
User-directed message about scheduling state.
abstract shutdown()
Called at the completion of a
toil invocation. Should cleanly terminate all worker
threads.
Return type
None
setEnv(name, value=None)
Set an environment variable for the worker process before it is launched.
The worker process will typically inherit the environment of the machine it is running on but this method makes it possible to override specific variables in that inherited environment before the worker is launched. Note that this mechanism is different to the one used by the worker internally to set up the environment of a job. A call to this method affects all jobs issued after this method returns. Note to implementors: This means that you would typically need to copy the variables before enqueuing a job.
If no value is
provided it will be looked up from the current environment.
Parameters
|
• |
name ( str ) |
|||
|
• |
value ( Optional [ str ]) |
Return type
None
classmethod add_options(parser)
If this batch system provides
any command line options, add them to the given parser.
Parameters
parser ( Union [ ArgumentParser , _ArgumentGroup ])
Return type
None
classmethod setOptions(setOption)
Process command line or
configuration options relevant to this batch system.
Parameters
setOption ( OptionSetter ) -- A function with signature setOption(option_name, parsing_function=None, check_function=None, default=None, env=None) returning nothing, used to update run configuration as a side effect.
Return type
None
getWorkerContexts()
Get a list of picklable context manager objects to wrap worker work in, in order.
Can be used to
ask the Toil worker to do things in-process (such as
configuring environment variables, hot-deploying user
scripts, or cleaning up a node) that would otherwise require
a wrapping "executor" process.
Return type
list [ ContextManager [ Any , bool | None ]]
JOB.SERVICE API
The Service
class allows databases and servers to be spawned within a
Toil workflow.
class Job.Service(memory=None, cores=None, disk=None,
accelerators=None, preemptible=None, unitName=None)
Abstract class used to define the interface to a service.
Should be subclassed by the user to define services.
Is not executed
as a job; runs within a ServiceHostJob.
__init__(memory=None, cores=None, disk=None,
accelerators=None,
preemptible=None, unitName=None)
Memory, core and disk requirements are specified identically to as in toil.job.Job.__init__() .
abstract start(job)
Start the service.
Parameters
job ( Job ) -- The underlying host job that the service is being run in. Can be used to register deferred functions, or to access the fileStore for creating temporary files.
Return type
Any
Returns
An object describing how to access the service. The object must be pickleable and will be used by jobs to access the service (see toil.job.Job.addService() ).
abstract stop(job)
Stops the service. Function can
block until complete.
Parameters
job ( Job ) -- The underlying host job that the service is being run in. Can be used to register deferred functions, or to access the fileStore for creating temporary files.
Return type
None
check()
Checks the service is still running.
|
Raises |
exceptions.RuntimeError -- If the service failed, this will cause the service job to be labeled failed. |
Return type
bool
Returns
True if the service is still running, else False. If False then the service job will be terminated, and considered a success. Important point: if the service job exits due to a failure, it should raise a RuntimeError, not return False!
EXCEPTIONS API
Toil specific
exceptions.
exception toil.job.JobException(message)
General job exception.
Parameters
message ( str )
Return type
None
__init__(message)
Parameters
message ( str )
Return type
None
exception toil.job.JobGraphDeadlockException(string)
An exception raised in the
event that a workflow contains an unresolvable dependency,
such as a cycle. See
toil.job.Job.checkJobGraphForDeadlocks()
.
__init__(string)
exception
toil.jobStores.abstractJobStore.ConcurrentFileModificationException(jobStoreFileID)
Indicates that the file was
attempted to be modified by multiple processes at once.
Parameters
jobStoreFileID ( FileID )
__init__(jobStoreFileID)
Parameters
jobStoreFileID ( FileID ) -- the ID of the file that was modified by multiple workers or processes concurrently
exception
toil.jobStores.abstractJobStore.JobStoreExistsException(locator,
prefix)
Indicates that the specified
job store already exists.
Parameters
|
• |
locator ( str ) |
|||
|
• |
prefix ( str ) |
__init__(locator, prefix)
Parameters
|
• |
locator ( str ) -- The location of the job store |
|||
|
• |
locator |
|||
|
• |
prefix ( str ) |
exception
toil.jobStores.abstractJobStore.NoSuchFileException(jobStoreFileID,
customName=None, *extra)
Indicates that the specified
file does not exist.
Parameters
|
• |
jobStoreFileID ( FileID ) |
|||
|
• |
customName ( Optional [ str ]) |
|||
|
• |
extra ( Any ) |
__init__(jobStoreFileID, customName=None, *extra)
Parameters
|
• |
jobStoreFileID ( FileID ) -- the ID of the file that was mistakenly assumed to exist |
||
|
• |
customName ( Optional [ str ]) -- optionally, an alternate name for the nonexistent file |
||
|
• |
extra ( Any ) -- optional extra information to add to the error message |
||
|
• |
extra |
exception
toil.jobStores.abstractJobStore.NoSuchJobException(jobStoreID)
Indicates that the specified
job does not exist.
Parameters
jobStoreID ( FileID )
__init__(jobStoreID)
Parameters
|
• |
jobStoreID ( FileID ) -- the jobStoreID that was mistakenly assumed to exist |
||
|
• |
jobStoreID |
exception
toil.jobStores.abstractJobStore.NoSuchJobStoreException(locator,
prefix)
Indicates that the specified
job store does not exist.
Parameters
|
• |
locator ( str ) |
|||
|
• |
prefix ( str ) |
__init__(locator, prefix)
Parameters
|
• |
locator ( str ) -- The location of the job store |
|||
|
• |
locator |
|||
|
• |
prefix ( str ) |
RUNNING TESTS
Test make targets, invoked as $ make <target> , subject to which environment variables are set (see Running Integration Tests ).
Before running tests for the first time, initialize your virtual environment following the steps in Installing Plugins .
Run all tests (including slow tests):
$ make test
Run only quick tests (as of Jul 25, 2018, this was ˜ 20 minutes):
$ export TOIL_TEST_QUICK=True; make test
Run an individual test with:
$ make test tests=src/toil/test/sort/sortTest.py::SortTest::testSort
The default value for tests is "src" which includes all tests in the src/ subdirectory of the project root. Tests that require a particular feature will be skipped implicitly. If you want to explicitly skip tests that depend on a currently installed feature , use
$ make test tests="-m 'not aws' src"
This will run only the tests that don't depend on the aws extra, even if that extra is currently installed. Note the distinction between the terms feature and extra . Every extra is a feature but there are features that are not extras, such as the gridengine feature. To skip tests involving both the gridengine feature and the aws extra, use the following:
$ make test tests="-m 'not aws and not gridengine' src"
Running Tests with pytest
Often it is simpler to use pytest directly, instead of calling the make wrapper. This usually works as expected, but some tests need some manual preparation. To run a specific test with pytest, use the following:
python3 -m pytest src/toil/test/sort/sortTest.py::SortTest::testSort
For more information, see the pytest documentation .
Running Integration Tests
These tests are generally only run using in our CI workflow due to their resource requirements and cost. However, they can be made available for local testing:
|
• |
Running tests that make use of Docker (e.g. autoscaling tests and Docker tests) require an appliance image to be hosted. First, make sure you have gone through the set up found in Using Docker with Quay . Then to build and host the appliance image run the make target push_docker . |
$ make push_docker
|
• |
Running integration tests require activation via an environment variable as well as exporting information relevant to the desired tests. Enable the integration tests: |
$ export TOIL_TEST_INTEGRATIVE=True
|
• |
Finally, set the environment variables for keyname and desired zone: |
$ export
TOIL_X_KEYNAME=[Your Keyname]
$ export TOIL_X_ZONE=[Desired Zone]
Where X is one of our currently supported cloud providers ( GCE , AWS ).
|
• |
See the above sections for guidance on running tests. |
Test Environment Variables
Partial install and failing tests
Some tests may fail with an ImportError if the required extras are not installed. Install Toil with all of the extras do prevent such errors.
Using Docker with Quay
Docker is needed for some of the tests. Follow the appropriate installation instructions for your system on their website to get started.
When running make test you might still get the following error:
$ make test
Please set TOIL_DOCKER_REGISTRY, e.g. to quay.io/USER.
To solve, make an account with Quay and specify it like so:
$ TOIL_DOCKER_REGISTRY=quay.io/USER make test
where USER is your Quay username.
For convenience you may want to add this variable to your bashrc by running
$ echo 'export TOIL_DOCKER_REGISTRY=quay.io/USER' >> $HOME/.bashrc
Running Mesos Tests
If you're running Toil's Mesos tests, be sure to create the virtualenv with --system-site-packages to include the Mesos Python bindings. Verify this by activating the virtualenv and running pip list | grep mesos . On macOS, this may come up empty. To fix it, run the following:
for i in /usr/local/lib/python2.7/site-packages/*mesos*; do ln -snf $i venv/lib/python2.7/site-packages/; done
DEVELOPING WITH DOCKER
To develop on features reliant on the Toil Appliance (the docker image toil uses for AWS autoscaling), you should consider setting up a personal registry on Quay or Docker Hub . Because the Toil Appliance images are tagged with the Git commit they are based on and because only commits on our master branch trigger an appliance build on Quay, as soon as a developer makes a commit or dirties the working copy they will no longer be able to rely on Toil to automatically detect the proper Toil Appliance image. Instead, developers wishing to test any appliance changes in autoscaling should build and push their own appliance image to a personal Docker registry. This is described in the next section.
Making Your Own Toil Docker Image
Note! Toil checks if the docker image specified by TOIL_APPLIANCE_SELF exists prior to launching by using the docker v2 schema. This should be valid for any major docker repository, but there is an option to override this if desired using the option: --forceDockerAppliance .
Here is a general workflow (similar instructions apply when using Docker Hub):
|
1. |
Make some changes to the provisioner of your local version of Toil |
|||
|
2. |
Go to the location where you installed the Toil source code and run |
$ make docker
to automatically build a docker image that can now be uploaded to your personal Quay account. On Docker Desktop, containerd may have to be enabled . If you have not installed Toil source code yet see Installing Plugins .
|
3. |
If it's not already you will need Docker installed and need to log into Quay . Also you will want to make sure that your Quay account is public. |
||
|
4. |
Set the environment variable TOIL_DOCKER_REGISTRY to your Quay account. If you find yourself doing this often you may want to add |
export TOIL_DOCKER_REGISTRY=quay.io/<MY_QUAY_USERNAME>
to your .bashrc or equivalent.
|
5. |
Now you can run |
$ make push_docker
which will upload the docker image to your Quay account. Take note of the image's tag for the next step.
|
6. |
Finally you will need to tell Toil from where to pull the Appliance image you've created (it uses the Toil release you have installed by default). To do this set the environment variable TOIL_APPLIANCE_SELF to the url of your image. For more info see Environment Variables . |
||
|
7. |
Now you can launch your cluster! For more information see Running a Workflow with Autoscaling . |
Running a Cluster Locally
The Toil Appliance container can also be useful as a test environment since it can simulate a Toil cluster locally. An important caveat for this is autoscaling, since autoscaling will only work on an EC2 instance and cannot (at this time) be run on a local machine.
To spin up a local cluster, start by using the following Docker run command to launch a Toil leader container:
docker run \
--entrypoint=mesos-master \
--net=host \
-d \
--name=leader \
--volume=/home/jobStoreParentDir:/jobStoreParentDir \
quay.io/ucsc_cgl/toil:3.6.0 \
--registry=in_memory \
--ip=127.0.0.1 \
--port=5050 \
--allocation_interval=500ms
A couple notes on this command: the -d flag tells Docker to run in daemon mode so the container will run in the background. To verify that the container is running you can run docker ps to see all containers. If you want to run your own container rather than the official UCSC container you can simply replace the quay.io/ucsc_cgl/toil:3.6.0 parameter with your own container name.
Also note that we are not mounting the job store directory itself, but rather the location where the job store will be written. Due to complications with running Docker on MacOS, I recommend only mounting directories within your home directory. The next command will launch the Toil worker container with similar parameters:
docker run \
--entrypoint=mesos-slave \
--net=host \
-d \
--name=worker \
--volume=/home/jobStoreParentDir:/jobStoreParentDir \
quay.io/ucsc_cgl/toil:3.6.0 \
--work_dir=/var/lib/mesos \
--master=127.0.0.1:5050 \
--ip=127.0.0.1 \
—-attributes=preemptable:False \
--resources=cpus:2
Note here that we are specifying 2 CPUs and a non-preemptable worker. We can easily change either or both of these in a logical way. To change the number of cores we can change the 2 to whatever number you like, and to change the worker to be preemptable we change preemptable:False to preemptable:True . Also note that the same volume is mounted into the worker. This is needed since both the leader and worker write and read from the job store. Now that your cluster is running, you can run
docker exec -it leader bash
to get a shell in your leader 'node'. You can also replace the leader parameter with worker to get shell access in your worker.
Docker-in-Docker issues
If you want to run Docker inside this Docker cluster (Dockerized tools, perhaps), you should also mount in the Docker socket via -v /var/run/docker.sock:/var/run/docker.sock . This will give the Docker client inside the Toil Appliance access to the Docker engine on the host. Client/engine version mismatches have been known to cause issues, so we recommend using Docker version 1.12.3 on the host to be compatible with the Docker client installed in the Appliance. Finally, be careful where you write files inside the Toil Appliance - 'child' Docker containers launched in the Appliance will actually be siblings to the Appliance since the Docker engine is located on the host. This means that the 'child' container can only mount in files from the Appliance if the files are located in a directory that was originally mounted into the Appliance from the host - that way the files are accessible to the sibling container. Note: if Docker can't find the file/directory on the host it will silently fail and mount in an empty directory.
Enabling FUSE
When running toil-wdl-runner with Singularity, Singularity will decompress images to sandbox directories by default. This can take time if a workflow has lots of images. To avoid this, access to FUSE can be given to the Docker container at startup. There are 2 main ways to do this. Either run all the Docker containers in privileged mode:
docker run \
-d \
--name=toil_leader \
--privileged \
quay.io/ucsc_cgl/toil:6.2.0
Or pass through the /dev/fuse device node into the container:
docker run \
-d \
--name=toil_leader \
--device=/dev/fuse \
quay.io/ucsc_cgl/toil:6.2.0
toil-wdl-runner will handle the logic from there.
MAINTAINER’S GUIDELINES
In general, as developers and maintainers of the code, we adhere to the following guidelines:
|
• |
We strive to never break the build on master. All development should be done on branches, in either the main Toil repository or in developers' forks. |
||
|
• |
Pull requests should be used for any and all changes (except truly trivial ones). |
||
|
• |
Pull requests should be in response to issues. If you find yourself making a pull request without an issue, you should create the issue first. |
Naming Conventions
|
• |
Commit messages should be great . Most importantly, they must : |
•
|
Have a short subject line. If in need of more space, drop down two lines and write a body to explain what is changing and why it has to change. |
|||
|
• |
Write the subject line as a command: Destroy all humans , not All humans destroyed . |
||
|
• |
Reference the issue being fixed in a Github-parseable format, such as (resolves #1234) at the end of the subject line, or This will fix #1234. somewhere in the body. If no single commit on its own fixes the issue, the cross-reference must appear in the pull request title or body instead. |
||
|
• |
Branches in the main Toil repository must start with issues/ , followed by the issue number (or numbers, separated by a dash), followed by a short, lowercase, hyphenated description of the change. (There can be many open pull requests with their associated branches at any given point in time and this convention ensures that we can easily identify branches.)
Say there is an issue numbered #123 titled Foo does not work . The branch name would be issues/123-fix-foo and the title of the commit would be Fix foo in case of bar (resolves #123).
Pull Requests
|
• |
All pull requests must be reviewed by a person other than the request's author. Review the PR by following the Reviewing Pull Requests checklist. |
||
|
• |
Modified pull requests must be re-reviewed before merging. Note that Github does not enforce this! |
||
|
• |
Merge pull requests by following the Merging Pull Requests checklist. |
||
|
• |
When merging a pull request, make sure to update the Draft Changelog on the Github wiki, which we will use to produce the changelog for the next release. The PR template tells you to do this, so don't forget. New entries should go at the bottom. |
||
|
• |
Pull requests will not be merged unless CI tests pass. Gitlab tests are only run on code in the main Toil repository on some branch, so it is the responsibility of the approving reviewer to make sure that pull requests from outside repositories are copied to branches in the main repository. This can be accomplished with (from a Toil clone): |
./contrib/admin/test-pr theirusername their-branch issues/123-fix-description-here
This must be repeated every time the PR submitter updates their PR, after checking to see that the update is not malicious.
If there is no issue corresponding to the PR, after which the branch can be named, the reviewer of the PR should first create the issue.
Developers who have push access to the main Toil repository are encouraged to make their pull requests from within the repository, to avoid this step.
|
• |
Prefer using "Squash and marge" when merging pull requests to master especially when the PR contains a "single unit" of work (i.e. if one were to rewrite the PR from scratch with all the fixes included, they would have one commit for the entire PR). This makes the commit history on master more readable and easier to debug in case of a breakage. |
When squashing a PR from multiple authors, please add Co-authored-by to give credit to all contributing authors.
See Issue #2816 for more details.
Publishing a Release
These are the steps to take to publish a Toil release:
|
• |
Determine the release version X.Y.Z . This should follow semantic versioning ; if user-workflow-breaking changes are made, X should be incremented, and Y and Z should be zero. If non-breaking changes are made but new functionality is added, X should remain the same as the last release, Y should be incremented, and Z should be zero. If only patches are released, X and Y should be the same as the last release and Z should be incremented. |
||
|
• |
If it does not exist already, create a release branch in the Toil repo named X.Y.x , where x is a literal lower-case "x". For patch releases, find the existing branch and make sure it is up to date with the patch commits that are to be released. They may be - cherry-picked over from master. |
||
|
• |
On the release branch, edit version_template.py in the root of the repository. Find the line that looks like this (slightly different for patch releases): |
baseVersion = 'X.Y.0a1'
Make it look like this instead:
baseVersion = 'X.Y.Z'
Commit your change to the branch.
|
• |
Tag the current state of the release branch as releases/X.Y.Z . |
||
|
• |
Make the Github release here , referencing that tag. For a non-patch release, fill in the description with the changelog from the wiki page , which you should clear. For a patch release, just describe the patch. |
||
|
• |
For a non-patch release, set up the main branch so that development builds will declare themselves to be alpha versions of what the next release will probably be. Edit version_template.py in the root of the repository on the main branch to set baseVersion like this: |
baseVersion = 'X.Y+1.0a1'
Make sure to replace X and Y+1 with actual numbers.
Using Git Hooks
In the contrib/hooks directory, there are two scripts, mypy-after-commit.py and mypy-before-push.py , that can be set up as Git hooks to make sure you don't accidentally push commits that would immediately fail type-checking. These are supposed to eliminate the need to run make mypy constantly. You can install them into your Git working copy like this
ln -rs
./contrib/hooks/mypy-after-commit.py .git/hooks/post-commit
ln -rs ./contrib/hooks/mypy-before-push.py
.git/hooks/pre-push
After you make a commit, the post-commit script will start type-checking it, and if it takes too long re-launch the process in the background. When you push, the pre-push script will see if the commit you are pushing type-checked successfully, and if it hasn't been type-checked but is currently checked out, it will be type-checked. If type-checking fails, the push will be aborted.
Type-checking will only be performed if you are in a Toil development virtual environment. If you aren't, the scripts won't do anything.
To bypass or override pre-push hook, if it is wrong or if you need to push something that doesn't typecheck, you can git push --no-verify . If the scripts get confused about whether a commit actually typechecks, you can clear out the type-checking result cache, which is in /var/run/user/<your UID>/.mypy_toil_result_cache on Linux and in .mypy_toil_result_cache in the Toil repo on Mac.
To uninstall the scripts, delete .git/hooks/post-commit and .git/hooks/pre-push .
Adding Retries to a Function
See toil.lib.retry .
retry() can be used to decorate any function based on the list of errors one wishes to retry on.
This list of errors can contain normal Exception objects, and/or RetryCondition objects wrapping Exceptions to include additional conditions.
For example, retrying on a one Exception (HTTPError):
from requests
import get
from requests.exceptions import HTTPError
@retry(errors=[HTTPError])
def update_my_wallpaper():
return get('https://www.deviantart.com/')
Or:
from requests
import get
from requests.exceptions import HTTPError
@retry(errors=[HTTPError,
ValueError])
def update_my_wallpaper():
return get('https://www.deviantart.com/')
The examples above will retry for the default interval on any errors specified the "errors=" arg list.
To retry on specifically 500/502/503/504 errors, you could specify an ErrorCondition object instead, for example:
from requests
import get
from requests.exceptions import HTTPError
@retry(errors=[
ErrorCondition(
error=HTTPError,
error_codes=[500, 502, 503, 504]
)])
def update_my_wallpaper():
return requests.get('https://www.deviantart.com/')
To retry on specifically errors containing the phrase "NotFound":
from requests
import get
from requests.exceptions import HTTPError
@retry(errors=[
ErrorCondition(
error=HTTPError,
error_message_must_include="NotFound"
)])
def update_my_wallpaper():
return requests.get('https://www.deviantart.com/')
To retry on all HTTPError errors EXCEPT an HTTPError containing the phrase "NotFound":
from requests
import get
from requests.exceptions import HTTPError
@retry(errors=[
HTTPError,
ErrorCondition(
error=HTTPError,
error_message_must_include="NotFound",
retry_on_this_condition=False
)])
def update_my_wallpaper():
return requests.get('https://www.deviantart.com/')
To retry on boto3's specific status errors, an example of the implementation is:
import boto3
from botocore.exceptions import ClientError
@retry(errors=[
ErrorCondition(
error=ClientError,
boto_error_codes=["BucketNotFound"]
)])
def boto_bucket(bucket_name):
boto_session = boto3.session.Session()
s3_resource = boto_session.resource('s3')
return s3_resource.Bucket(bucket_name)
Any combination of these will also work, provided the codes are matched to the correct exceptions. A ValueError will not return a 404, for example.
The retry function as a decorator should make retrying functions easier and clearer. It also encourages smaller independent functions, as opposed to lumping many different things that may need to be retried on different conditions in the same function.
The ErrorCondition object tries to take some of the heavy lifting of writing specific retry conditions and boil it down to an API that covers all common use-cases without the user having to write any new bespoke functions.
Use-cases covered currently:
|
1. |
Retrying on a normal error, like a KeyError. |
||
|
2. |
Retrying on HTTP error codes (use ErrorCondition). |
||
|
3. |
Retrying on boto's specific status errors, like "BucketNotFound" (use ErrorCondition). |
||
|
4. |
Retrying when an error message contains a certain phrase (use ErrorCondition). |
||
|
5. |
Explicitly NOT retrying on a condition (use ErrorCondition). |
If new functionality is needed, it's currently best practice in Toil to add functionality to the ErrorCondition itself rather than making a new custom retry method.
PULL REQUEST CHECKLISTS
This document contains checklists for dealing with PRs. More general PR information is available at Pull Requests .
Reviewing Pull Requests
This checklist is to be kept in sync with the checklist in the pull request template.
When reviewing a PR, do the following:
|
• |
Make sure it
is coming from issues/XXXX-fix-the-thing in the Toil
repo, or from an external repo.
|
• |
If it is coming from an external repo, make sure to pull it in for CI with:
contrib/admin/test-pr otheruser theirbranchname issues/XXXX-fix-the-thing
|
• |
If there is no associated issue, create one .
|
• |
Read through the code changes. Make sure that it doesn't have:
|
• |
Addition of trailing whitespace.
|
• |
New variable or member names in camelCase that want to be in snake_case .
|
• |
New functions without type hints .
|
• |
New functions or classes without informative docstrings.
|
• |
Changes to semantics not reflected in the relevant docstrings.
|
• |
New or changed command line options for Toil workflows that are not reflected in docs/running/cliOptions.rst
|
• |
New features without tests.
|
• |
Comment on the lines of code where problems exist with a review comment. You can shift-click the line numbers in the diff to select multiple lines.
|
• |
Finish the review with an overall description of your opinion.
Merging Pull Requests
This checklist is to be kept in sync with the checklist in the pull request template.
When merging a PR, do the following:
|
• |
Make sure the PR passed tests, including the Gitlab tests, for the most recent commit in its branch.
|
• |
Make sure the PR has been reviewed. If not, review it. If it has been reviewed and any requested changes seem to have been addressed, proceed.
|
• |
Merge with the Github "Squash and merge" feature.
|
• |
If there are
multiple authors' commits, add
Co-authored-by
to give credit to all contributing authors.
|
• |
Copy its recommended changelog entry to the Draft Changelog .
|
• |
Append the issue number in parentheses to the changelog entry.
TOIL ARCHITECTURE
The following diagram layouts out the software architecture of Toil.
[image: Toil’s architecture is composed of the leader, the job store, the worker processes, the batch system, the node provisioner, and the stats and logging monitor.] [image] Figure 1: The basic components of Toil's architecture..UNINDENT
These components are described below:
|
• |
the leader:
The leader is responsible for deciding which jobs should be run. To do this it traverses the job graph. Currently this is a single threaded process, but we make aggressive steps to prevent it becoming a bottleneck (see Read-only Leader described below).
|
• |
the job-store:
Handles all files shared between the components. Files in the job-store are the means by which the state of the workflow is maintained. Each job is backed by a file in the job store, and atomic updates to this state are used to ensure the workflow can always be resumed upon failure. The job-store can also store all user files, allowing them to be shared between jobs. The job-store is defined by the AbstractJobStore class. Multiple implementations of this class allow Toil to support different back-end file stores, e.g.: S3, network file systems, Google file store, etc.
|
• |
workers:
The workers are temporary processes responsible for running jobs, one at a time per worker. Each worker process is invoked with a job argument that it is responsible for running. The worker monitors this job and reports back success or failure to the leader by editing the job's state in the file-store. If the job defines successor jobs the worker may choose to immediately run them (see Job Chaining below).
|
• |
the batch-system:
Responsible for scheduling the jobs given to it by the leader, running a worker command for each job. The batch-system is defined by the AbstractBatchSystem class. Toil uses multiple existing batch systems to schedule jobs, including Apache Mesos, GridEngine and a multi-process single node implementation that allows workflows to be run without any of these frameworks. Toil can therefore fairly easily be made to run a workflow using an existing cluster.
|
• |
the node provisioner:
Creates worker nodes in which the batch system schedules workers. It is defined by the AbstractProvisioner class.
|
• |
the statistics and logging monitor:
Monitors logging and statistics produced by the workers and reports them. Uses the job-store to gather this information.
Jobs and JobDescriptions
As noted in Job Basics , a job is the atomic unit of work in a Toil workflow. Workflows extend the Job class to define units of work. These jobs are pickled and stored in the job-store by the leader, and are retrieved and un-pickled by the worker when they are scheduled to run.
During scheduling, Toil does not work with the actual Job objects. Instead, JobDescription objects are used to store all the information that the Toil Leader ever needs to know about the Job. This includes requirements information, dependency information, body object to run, worker command to issue, etc.
Internally, the JobDescription object is referenced by its jobStoreID, which is often not human readable. However, the Job and JobDescription objects contain several human-readable names that are useful for logging and identification:
Statistics and Logging
Toil's statistics and logging system is implemented in a joint class StatsAndLogging . The class can be instantiated and run as a thread on the leader, where it polls for new log files in the job store with the read_logs() method. These are JSON files, which contain structured data. Structured log messages from user Python code, stored under workers.logs_to_leader , from the file store's log_to_leader() method, will be logged at the appropriate level. The text output that the worker captured for all its chained jobs, in logs.messages , will be logged at debug level in the worker's output. If --writeLogs or --writeLogsGzip is provided, the received worker logs will also be stored by the StatsAndLogging thread into per-job files inside the job store, using writeLogFiles() .
Note that the worker only fills this in if running with debug logging on, or if --writeLogsFromAllJobs is set. Otherwise, logs from successful jobs are not persisted. Logs from failed jobs are persisted differently; they are written to the file store, and the log file is made available through toil.job.JobDescription.getLogFileHandle() . The leader thread retrieves these logs and calls back into StatsAndLogging to print or locally save them as appropriate.
The CWL and WDL interpreters use log_user_stream() to inject CWL and WDL task-level logs into the stats and logging logging system. The full text of those logs gets stored in the JSON stats files, and when the StatsAndLogging thread sees them it reports and saves them, similarly to how it treats Toil job logs.
To ship the statistics and the non-failed-job logs around, the job store has a logs mailbox system: the write_logs() method deposits a string, and the read_logs() method on the leader passes the strings to a callback. It tracks a concept of new and old, based on whether the string has been read already by anyone, and one can read only the new values, or all values observed. The stats and logging system uses this to pass around structured JSON holding both log data and worker-measured stats, and expects the StatsAndLogging thread to be the only live reader.
Optimizations
Toil implements lots of optimizations designed for scalability. Here we detail some of the key optimizations.
Read-only leader
The leader process is currently implemented as a single thread. Most of the leader's tasks revolve around processing the state of jobs, each stored as a file within the job-store. To minimise the load on this thread, each worker does as much work as possible to manage the state of the job it is running. As a result, with a couple of minor exceptions, the leader process never needs to write or update the state of a job within the job-store. For example, when a job is complete and has no further successors the responsible worker deletes the job from the job-store, marking it complete. The leader then only has to check for the existence of the file when it receives a signal from the batch-system to know that the job is complete. This off-loading of state management is orthogonal to future parallelization of the leader.
Job chaining
The scheduling of successor jobs is partially managed by the worker, reducing the number of individual jobs the leader needs to process. Currently this is very simple: if the there is a single next successor job to run and its resources fit within the resources of the current job and closely match the resources of the current job then the job is run immediately on the worker without returning to the leader. Further extensions of this strategy are possible, but for many workflows which define a series of serial successors (e.g. map sequencing reads, post-process mapped reads, etc.) this pattern is very effective at reducing leader workload.
Preemptable node support
Critical to running at large-scale is dealing with intermittent node failures. Toil is therefore designed to always be resumable providing the job-store does not become corrupt. This robustness allows Toil to run on preemptible nodes, which are only available when others are not willing to pay more to use them. Designing workflows that divide into many short individual jobs that can use preemptable nodes allows for workflows to be efficiently scheduled and executed.
Caching
Running bioinformatic pipelines often require the passing of large datasets between jobs. Toil caches the results from jobs such that child jobs running on the same node can directly use the same file objects, thereby eliminating the need for an intermediary transfer to the job store. Caching also reduces the burden on the local disks, because multiple jobs can share a single file. The resulting drop in I/O allows pipelines to run faster, and, by the sharing of files, allows users to run more jobs in parallel by reducing overall disk requirements.
To demonstrate the efficiency of caching, we ran an experimental internal pipeline on 3 samples from the TCGA Lung Squamous Carcinoma (LUSC) dataset. The pipeline takes the tumor and normal exome fastqs, and the tumor rna fastq and input, and predicts MHC presented neoepitopes in the patient that are potential targets for T-cell based immunotherapies. The pipeline was run individually on the samples on c3.8xlarge machines on AWS (60GB RAM,600GB SSD storage, 32 cores). The pipeline aligns the data to hg19-based references, predicts MHC haplotypes using PHLAT, calls mutations using 2 callers (MuTect and RADIA) and annotates them using SnpEff, then predicts MHC:peptide binding using the IEDB suite of tools before running an in-house rank boosting algorithm on the final calls.
To optimize time taken, The pipeline is written such that mutations are called on a per-chromosome basis from the whole-exome bams and are merged into a complete vcf. Running mutect in parallel on whole exome bams requires each mutect job to download the complete Tumor and Normal Bams to their working directories -- An operation that quickly fills the disk and limits the parallelizability of jobs. The workflow was run in Toil, with and without caching, and Figure 2 shows that the workflow finishes faster in the cached case while using less disk on average than the uncached run. We believe that benefits of caching arising from file transfers will be much higher on magnetic disk-based storage systems as compared to the SSD systems we tested this on.
[image: Graph outlining the efficiency gain from caching.] [image] Figure 2: Efficiency gain from caching. The lower half of each plot describes the disk used by the pipeline recorded every 10 minutes over the duration of the pipeline, and the upper half shows the corresponding stage of the pipeline that is being processed. Since jobs requesting the same file shared the same inode, the effective load on the disk is considerably lower than in the uncached case where every job downloads a personal copy of every file it needs. We see that in all cases, the uncached run uses almost 300-400GB more that the cached run in the resource heavy mutation calling step. We also see a benefit in terms of wall time for each stage since we eliminate the time taken for file transfers..UNINDENT
Toil support for Common Workflow Language
The CWL document and input document are loaded using the 'cwltool.load_tool' module. This performs normalization and URI expansion (for example, relative file references are turned into absolute file URIs), validates the document against the CWL schema, initializes Python objects corresponding to major document elements (command line tools, workflows, workflow steps), and performs static type checking that sources and sinks have compatible types.
Input files referenced by the CWL document and input document are imported into the Toil file store. CWL documents may use any URI scheme supported by Toil file store, including local files and object storage.
The 'location' field of File references are updated to reflect the import token returned by the Toil file store.
For directory inputs, the directory listing is stored in Directory object. Each individual files is imported into Toil file store.
An initial workflow Job is created from the toplevel CWL document. Then, control passes to the Toil engine which schedules the initial workflow job to run.
When the toplevel workflow job runs, it traverses the CWL workflow and creates a toil job for each step. The dependency graph is expressed by making downstream jobs children of upstream jobs, and initializing the child jobs with an input object containing the promises of output from upstream jobs.
Because Toil jobs have a single output, but CWL permits steps to have multiple output parameters that may feed into multiple other steps, the input to a CWLJob is expressed with an "indirect dictionary". This is a dictionary of input parameters, where each entry value is a tuple of a promise and a promise key. When the job runs, the indirect dictionary is turned into a concrete input object by resolving each promise into its actual value (which is always a dict), and then looking up the promise key to get the actual value for the the input parameter.
If a workflow step specifies a scatter, then a scatter job is created and connected into the workflow graph as described above. When the scatter step runs, it creates child jobs for each parameterizations of the scatter. A gather job is added as a follow-on to gather the outputs into arrays.
When running a command line tool, it first creates output and temporary directories under the Toil local temp dir. It runs the command line tool using the single_job_executor from CWLTool, providing a Toil-specific constructor for filesystem access, and overriding the default PathMapper to use ToilPathMapper.
The ToilPathMapper keeps track of a file's symbolic identifier (the Toil FileID), its local path on the host (the value returned by readGlobalFile) and the the location of the file inside the Docker container.
After executing single_job_executor from CWLTool, it gets back the output object and status. If the underlying job failed, raise an exception. Files from the output object are added to the file store using writeGlobalFile and the 'location' field of File references are updated to reflect the token returned by the Toil file store.
When the workflow completes, it returns an indirect dictionary linking to the outputs of the job steps that contribute to the final output. This is the value returned by toil.start() or toil.restart(). This is resolved to get the final output object. The files in this object are exported from the file store to 'outdir' on the host file system, and the 'location' field of File references are updated to reflect the final exported location of the output files.
MINIMUM AWS IAM PERMISSIONS
Toil requires at least the following permissions in an IAM role to operate on a cluster. These are added by default when launching a cluster. However, ensure that they are present if creating a custom IAM role when launching a cluster with the --awsEc2ProfileArn parameter.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:*",
"s3:*",
"sdb:*",
"iam:PassRole"
],
"Resource": "*"
}
]
}
AUTO-DEPLOYMENT
If you want to run a Toil Python workflow in a distributed environment, on multiple worker machines, either in the cloud or on a bare-metal cluster, the Python code needs to be made available to those other machines. If the workflow's main module imports other modules, those modules also need to be made available on the workers. Toil can automatically do that for you, with a little help on your part. We call this feature auto-deployment of a workflow.
Let's first examine various scenarios of auto-deploying a workflow, which, as we'll see shortly cannot be auto-deployed. Lastly, we'll deal with the issue of declaring Toil as a dependency of a workflow that is packaged as a setuptools distribution.
Toil can be easily deployed to a remote host. First, assuming you've followed our Preparing your AWS environment section to install Toil and use it to create a remote leader node on (in this example) AWS, you can now log into this into using Ssh-Cluster Command and once on the remote host, create and activate a virtualenv (noting to make sure to use the --system-site-packages option!):
$ virtualenv
--system-site-packages venv
$ . venv/bin/activate
Note the --system-site-packages option, which ensures that globally-installed packages are accessible inside the virtualenv. Do not (re)install Toil after this! The --system-site-packages option has already transferred Toil and the dependencies from your local installation of Toil for you.
From here, you can install a project and its dependencies:
$ tree
.
├── util
│ ├── __init__.py
│ └── sort
│ ├── __init__.py
│ └── quick.py
└── workflow
├── __init__.py
└── main.py
3 directories,
5 files
$ pip install matplotlib
$ cp -R workflow util venv/lib/python3.9/site-packages
Ideally, your project would have a setup.py file (see setuptools ) which streamlines the installation process:
$ tree
.
├── util
│ ├── __init__.py
│ └── sort
│ ├── __init__.py
│ └── quick.py
├── workflow
│ ├── __init__.py
│ └── main.py
└── setup.py
3 directories,
6 files
$ pip install .
Or, if your project has been published to PyPI:
$ pip install my-project
In each case, we have created a virtualenv with the --system-site-packages flag in the venv subdirectory then installed the matplotlib distribution from PyPI along with the two packages that our project consists of. (Again, both Python and Toil are assumed to be present on the leader and all worker nodes.)
We can now run our workflow:
$ python3 main.py --batchSystem=kubernetes …
IMPORTANT:
If workflow's external dependencies contain native code (i.e. are not pure Python) then they must be manually installed on each worker.
WARNING:
Neither python3 setup.py develop nor pip install -e . can be used in this process as, instead of copying the source files, they create .egg-link files that Toil can't auto-deploy. Similarly, python3 setup.py install doesn't work either as it installs the project as a Python .egg which is also not currently supported by Toil (though it could be in the future).
Also note that using the --single-version-externally-managed flag with setup.py will prevent the installation of your package as an .egg . It will also disable the automatic installation of your project's dependencies.
Auto Deployment with Sibling Python Files
This scenario applies if a Python workflow imports files that are its siblings:
$ cd my_project
$ ls
userScript.py utilities.py
$ ./userScript.py --batchSystem=kubernetes …
Here userScript.py imports additional functionality from utilities.py . Toil detects that userScript.py has sibling Python files and copies them to the workers, alongside the main Python file. Note that sibling Python files will be auto-deployed regardless of whether they are actually imported by the workflow: all .py files residing in the same directory as the main workflow Python file will automatically be auto-deployed.
This structure is a suitable method of organizing the source code of reasonably complicated workflows.
Auto-Deploying a Package Hierarchy
Recall that in Python, a package is a directory containing one or more .py files, one of which must be called __init__.py , and optionally other packages. For more involved workflows that contain a significant amount of code, this is the recommended way of organizing the source code. Because we use a package hierarchy, the main workflow file is actually a Python module. It is merely one of the modules in the package hierarchy. We need to inform Toil that we want to use a package hierarchy by invoking Python's -m option. This enables Toil to identify the entire set of modules belonging to the workflow and copy all of them to each worker. Note that while using the -m option is optional in the scenarios above, it is mandatory in this one.
The following shell session illustrates this:
$ cd my_project
$ tree
.
├── utils
│ ├── __init__.py
│ └── sort
│ ├── __init__.py
│ └── quick.py
└── workflow
├── __init__.py
└── main.py
3 directories,
5 files
$ python3 -m workflow.main --batchSystem=kubernetes
…
Here the workflow entry point module main.py does not reside in the current directory, but is part of a package called util , in a subdirectory of the current directory. Additional functionality is in a separate module called util.sort.quick which corresponds to util/sort/quick.py . Because we invoke the workflow via python3 -m workflow.main , Toil can determine the root directory of the hierarchy– my_project in this case–and copy all Python modules underneath it to each worker. The -m option is documented here
When -m is passed, Python adds the current working directory to sys.path , the list of root directories to be considered when resolving a module name like workflow.main . Without that added convenience we'd have to run the workflow as PYTHONPATH="$PWD" python3 -m workflow.main . This also means that Toil can detect the root directory of the invoked module's package hierarchy even if it isn't the current working directory. In other words we could do this:
$ cd my_project
$ export PYTHONPATH="$PWD"
$ cd /some/other/dir
$ python3 -m workflow.main --batchSystem=kubernetes
…
Also note that the root directory itself must not be package, i.e. must not contain an __init__.py .
Relying on Shared Filesystems
Bare-metal clusters typically mount a shared file system like NFS on each node. If every node has that file system mounted at the same path, you can place your project on that shared filesystem and run your Python workflow from there. Additionally, you can clone the Toil source tree into a directory on that shared file system and you won't even need to install Toil on every worker. Be sure to add both your project directory and the Toil clone to PYTHONPATH . Toil replicates PYTHONPATH from the leader to every worker.
Using a shared filesystem
Toil currently only supports a tempdir set to a local, non-shared directory.
Toil Appliance
The term Toil Appliance refers to the Ubuntu-based Docker image that Toil uses for the machines in Toil-manages clusters, and for executing jobs on Kubernetes. It's easily deployed, only needs Docker, and allows a consistent environment on all Toil clusters. To specify a different image, see the Toil Environment Variables section. For more information on the Toil Appliance, see the Running in AWS section.
ENVIRONMENT VARIABLES
|
There are several environment variables that affect the way Toil runs. |
API REFERENCE
This page contains auto-generated API reference documentation [1].
toil
Submodules
toil.batchSystems
Submodules
toil.batchSystems.abstractBatchSystem
Attributes
Exceptions
Classes
Module Contents
toil.batchSystems.abstractBatchSystem.logger
toil.batchSystems.abstractBatchSystem.EXIT_STATUS_UNAVAILABLE_VALUE
=
255
class
toil.batchSystems.abstractBatchSystem.BatchJobExitReason
Bases: enum.IntEnum
Enum where
members are also (and must be) ints
FINISHED = 1
Successfully finished.
FAILED = 2
Job finished, but failed.
LOST = 3
Preemptable failure (job's executing host went away).
KILLED = 4
Job killed before finishing.
ERROR = 5
Internal error.
MEMLIMIT = 6
Job hit batch system imposed memory limit.
MISSING = 7
Job disappeared from the scheduler without actually stopping, so Toil killed it.
MAXJOBDURATION = 8
Job ran longer than --maxJobDuration, so Toil killed it.
PARTITION = 9
Job was not able to talk to the leader via the job store, so Toil declared it failed.
classmethod to_string(value)
Convert to human-readable string.
Given an int
that may be or may be equal to a value from the enum,
produce the string value of its matching enum entry, or a
stringified int.
Parameters
value ( int )
Return type
str
class toil.batchSystems.abstractBatchSystem.UpdatedBatchJobInfo
Bases:
NamedTuple
jobID:
int
exitStatus:
int
The exit status (integer value) of the job. 0 implies successful.
EXIT_STATUS_UNAVAILABLE_VALUE is used when the exit status is not available (e.g. job is lost, or otherwise died but actual exit code was not reported).
exitReason:
BatchJobExitReason
|
None
wallTime:
float
|
int
|
None
class toil.batchSystems.abstractBatchSystem.WorkerCleanupInfo
Bases:
NamedTuple
work_dir:
str
|
None
Work directory path (where the cache would go) if specified by user
coordination_dir: str | None
Coordination directory path (where lock files would go) if specified by user
workflow_id: str
Used to identify files specific to this workflow
clean_work_dir: str
When to clean up the work and coordination directories for a job ('always', 'onSuccess', 'onError', 'never')
class toil.batchSystems.abstractBatchSystem.AbstractBatchSystem
Bases: abc.ABC
An abstract
base class to represent the interface the batch system must
provide to Toil.
classmethod supportsAutoDeployment()
Abstractmethod
Return type
bool
Whether this batch system supports auto-deployment of the user script itself.
If it does, the setUserScript() can be invoked to set the resource object representing the user script.
Note to implementors: If your implementation returns True here, it should also override
classmethod supportsWorkerCleanup()
Abstractmethod
Return type
bool
Whether this batch system supports worker cleanup.
Indicates whether this batch system invokes BatchSystemSupport.workerCleanup() after the last job for a particular workflow invocation finishes. Note that the term worker refers to an entire node, not just a worker process. A worker process may run more than one job sequentially, and more than one concurrent worker process may exist on a worker node, for the same workflow. The batch system is said to shut down after the last worker process terminates.
abstract setUserScript(userScript)
Set the user script for this workflow.
This method
must be called before the first job is issued to this batch
system, and only if
supportsAutoDeployment()
returns
True, otherwise it will raise an exception.
Parameters
userScript ( toil.resource.Resource ) -- the resource object representing the user script or module and the modules it depends on.
Return type
None
set_message_bus(message_bus)
Give the batch system an
opportunity to connect directly to the message bus, so that
it can send informational messages about the jobs it is
running to other Toil components.
Parameters
message_bus ( toil.bus.MessageBus )
Return type
None
abstract issueBatchJob(command, job_desc, job_environment=None)
Issues a job with the specified
command to the batch system and returns a unique job ID
number.
Parameters
|
• |
command ( str ) -- the command to execute somewhere to run the Toil worker process |
||
|
• |
job_desc ( toil.job.JobDescription ) -- the JobDescription for the job being run |
||
|
• |
job_environment ( Optional[dict[str, str]] ) -- a collection of job-specific environment variables to be set on the worker. |
Returns
a unique job ID number that can be used to reference the newly issued job
Return type
int
abstract killBatchJobs(jobIDs)
Kills the given job IDs. After
returning, the killed jobs will not appear in the results of
getRunningBatchJobIDs. The killed job will not be returned
from getUpdatedBatchJob.
Parameters
jobIDs ( list[int] ) -- list of IDs of jobs to kill
Return type
None
abstract getIssuedBatchJobIDs()
Gets all currently issued jobs
Returns
A list of jobs (as job ID numbers) currently issued (may be running, or may be waiting to be run). Despite the result being a list, the ordering should not be depended upon.
Return type
list [ int ]
abstract getRunningBatchJobIDs()
Gets a map of jobs as job ID
numbers that are currently running (not just waiting) and
how long they have been running, in seconds.
Returns
dictionary with currently running job ID number keys and how many seconds they have been running as the value
Return type
dict [ int , float ]
abstract getUpdatedBatchJob(maxWait)
Returns information about job that has updated its status (i.e. ceased running, either successfully or with an error). Each such job will be returned exactly once.
Does not return
info for jobs killed by killBatchJobs, although they may
cause None to be returned earlier than maxWait.
Parameters
maxWait ( int ) -- the number of seconds to block, waiting for a result
Returns
If a result is available, returns UpdatedBatchJobInfo. Otherwise it returns None. wallTime is the number of seconds (a strictly positive float) in wall-clock time the job ran for, or None if this batch system does not support tracking wall time.
Return type
Optional[ UpdatedBatchJobInfo ]
getSchedulingStatusMessage()
Get a log message fragment for the user about anything that might be going wrong in the batch system, if available.
If no useful message is available, return None.
This can be
used to report what resource is the limiting factor when
scheduling jobs, for example. If the leader thinks the
workflow is stuck, the message can be displayed to the user
to help them diagnose why it might be stuck.
Returns
User-directed message about scheduling state.
Return type
Optional[ str ]
abstract shutdown()
Called at the completion of a
toil invocation. Should cleanly terminate all worker
threads.
Return type
None
abstract setEnv(name, value=None)
Set an environment variable for the worker process before it is launched.
The worker process will typically inherit the environment of the machine it is running on but this method makes it possible to override specific variables in that inherited environment before the worker is launched. Note that this mechanism is different to the one used by the worker internally to set up the environment of a job. A call to this method affects all jobs issued after this method returns. Note to implementors: This means that you would typically need to copy the variables before enqueuing a job.
If no value is
provided it will be looked up from the current environment.
Parameters
|
• |
name ( str ) |
|||
|
• |
value ( Optional[str] ) |
Return type
None
classmethod add_options(parser)
If this batch system provides
any command line options, add them to the given parser.
Parameters
parser ( Union[argparse.ArgumentParser, argparse._ArgumentGroup] )
Return type
None
classmethod setOptions(setOption)
Process command line or
configuration options relevant to this batch system.
Parameters
setOption ( toil.batchSystems.options.OptionSetter ) -- A function with signature setOption(option_name, parsing_function=None, check_function=None, default=None, env=None) returning nothing, used to update run configuration as a side effect.
Return type
None
getWorkerContexts()
Get a list of picklable context manager objects to wrap worker work in, in order.
Can be used to
ask the Toil worker to do things in-process (such as
configuring environment variables, hot-deploying user
scripts, or cleaning up a node) that would otherwise require
a wrapping "executor" process.
Return type
list [ContextManager[Any]]
class
toil.batchSystems.abstractBatchSystem.BatchSystemSupport(config,
maxCores, maxMemory, maxDisk)
Bases: AbstractBatchSystem
Partial
implementation of AbstractBatchSystem, support methods.
Parameters
|
• |
config ( toil.common.Config ) |
|||
|
• |
maxCores ( float ) |
|||
|
• |
maxMemory ( int ) |
|||
|
• |
maxDisk ( int ) |
|||
|
config |
maxCores
maxMemory
maxDisk
environment:
dict
[
str
,
str
]
check_resource_request(requirer)
Check resource request is not
greater than that available or allowed.
Parameters
|
• |
requirer ( toil.job.Requirer ) -- Object whose requirements are being checked |
||
|
• |
job_name ( str ) -- Name of the job being checked, for generating a useful error report. |
||
|
• |
detail ( str ) -- Batch-system-specific message to include in the error. |
||
|
Raises |
InsufficientSystemResources -- raised when a resource is requested in an amount greater than allowed
Return type
None
setEnv(name, value=None)
Set an environment variable for the worker process before it is launched. The worker process will typically inherit the environment of the machine it is running on but this method makes it possible to override specific variables in that inherited environment before the worker is launched. Note that this mechanism is different to the one used by the worker internally to set up the environment of a job. A call to this method affects all jobs issued after this method returns. Note to implementors: This means that you would typically need to copy the variables before enqueuing a job.
If no value is
provided it will be looked up from the current environment.
Parameters
|
• |
name ( str ) -- the environment variable to be set on the worker. |
||
|
• |
value ( Optional[str] ) -- if given, the environment variable given by name will be set to this value. If None, the variable's current value will be used as the value on the worker |
||
|
Raises |
RuntimeError -- if value is None and the name cannot be found in the environment
Return type
None
set_message_bus(message_bus)
Give the batch system an
opportunity to connect directly to the message bus, so that
it can send informational messages about the jobs it is
running to other Toil components.
Parameters
message_bus ( toil.bus.MessageBus )
Return type
None
get_batch_logs_dir()
Get the directory where the backing batch system should save its logs.
Only really
makes sense if the backing batch system actually saves logs
to a filesystem; Kubernetes for example does not. Ought to
be a directory shared between the leader and the workers, if
the backing batch system writes logs onto the worker's view
of the filesystem, like many HPC schedulers do.
Return type
str
format_std_out_err_path(toil_job_id, cluster_job_id, std)
Format path for batch system standard output/error and other files generated by the batch system itself.
Files will be written to the batch logs directory (--batchLogsDir, defaulting to the Toil work directory) with names containing both the Toil and batch system job IDs, for ease of debugging job failures.
|
Param |
int toil_job_id : The unique id that Toil gives a job. |
||
|
Param |
cluster_job_id : What the cluster, for example, GridEngine, uses as its internal job id. |
||
|
Param |
string std : The provenance of the stream (for example: 'err' for 'stderr' or 'out' for 'stdout') |
Return type
string : Formatted filename; however if self.config.noStdOutErr is true, returns '/dev/null' or equivalent.
Parameters
|
• |
toil_job_id ( int ) |
|||
|
• |
cluster_job_id ( str ) |
|||
|
• |
std ( str ) |
format_std_out_err_glob(toil_job_id)
Get a glob string that will
match all file paths generated by format_std_out_err_path
for a job.
Parameters
toil_job_id ( int )
Return type
str
static workerCleanup(info)
Cleans up the worker node on batch system shutdown.
Also see
supportsWorkerCleanup()
.
Parameters
info ( WorkerCleanupInfo ) -- A named tuple consisting of all the relevant information for cleaning up the worker.
Return type
None
class
toil.batchSystems.abstractBatchSystem.NodeInfo(coresUsed,
memoryUsed, coresTotal, memoryTotal, requestedCores,
requestedMemory,
workers)
The coresUsed attribute is a floating point value between 0 (all cores idle) and 1 (all cores busy), reflecting the CPU load of the node.
The memoryUsed attribute is a floating point value between 0 (no memory used) and 1 (all memory used), reflecting the memory pressure on the node.
The coresTotal and memoryTotal attributes are the node's resources, not just the used resources
The requestedCores and requestedMemory attributes are all the resources that Toil Jobs have reserved on the node, regardless of whether the resources are actually being used by the Jobs.
The workers
attribute is an integer reflecting the number of workers
currently active workers on the node.
Parameters
|
• |
coresUsed ( float ) |
|||
|
• |
memoryUsed ( float ) |
|||
|
• |
coresTotal ( float ) |
|||
|
• |
memoryTotal ( int ) |
|||
|
• |
requestedCores ( float ) |
|||
|
• |
requestedMemory ( int ) |
|||
|
• |
workers ( int ) |
coresUsed
memoryUsed
coresTotal
memoryTotal
requestedCores
requestedMemory
workers
class toil.batchSystems.abstractBatchSystem.AbstractScalableBatchSystem
Bases: AbstractBatchSystem
A batch system that supports a variable number of worker nodes.
Used by
toil.provisioners.clusterScaler.ClusterScaler
to
scale the number of worker nodes in the cluster up or down
depending on overall load.
abstract getNodes(preemptible=None, timeout=600)
Returns a dictionary mapping
node identifiers of preemptible or non-preemptible nodes to
NodeInfo objects, one for each node.
Parameters
|
• |
preemptible ( Optional[bool] ) -- If True (False) only (non-)preemptible nodes will be returned. If None, all nodes will be returned. |
||
|
• |
timeout ( int ) |
Return type
dict [ str , NodeInfo ]
abstract nodeInUse(nodeIP)
Can be used to determine if a
worker node is running any tasks. If the node is doesn't
exist, this function should simply return False.
Parameters
nodeIP ( str ) -- The worker nodes private IP address
Returns
True if the worker node has been issued any tasks, else False
Return type
bool
abstract ignoreNode(nodeAddress)
Stop sending jobs to this node.
Used in autoscaling when the autoscaler is ready to
terminate a node, but jobs are still running. This allows
the node to be terminated after the current jobs have
finished.
Parameters
nodeAddress ( str ) -- IP address of node to ignore.
Return type
None
abstract unignoreNode(nodeAddress)
Stop ignoring this address,
presumably after a node with this address has been
terminated. This allows for the possibility of a new node
having the same address as a terminated one.
Parameters
nodeAddress ( str )
Return type
None
exception
toil.batchSystems.abstractBatchSystem.InsufficientSystemResources(requirer,
resource, available=None, batch_system=None, source=None,
details=[])
Bases: Exception
Common base
class for all non-exit exceptions.
Parameters
|
• |
requirer ( toil.job.Requirer ) |
|||
|
• |
resource ( str ) |
|||
|
• |
available ( Optional[toil.job.ParsedRequirement] ) |
|||
|
• |
batch_system ( Optional[str] ) |
|||
|
• |
source ( Optional[str] ) |
|||
|
• |
details ( list[str] ) |
job_name:
str
|
None
resource
requested
available
batch_system
|
source |
details
__str__()
Explain the exception.
Return type
str
exception
toil.batchSystems.abstractBatchSystem.AcquisitionTimeoutException(resource,
requested, available)
Bases: Exception
To be raised
when a resource request times out.
Parameters
|
• |
resource ( str ) |
|||
|
• |
requested ( Union[int, float, set[int]] ) |
|||
|
• |
available ( Union[int, float, set[int]] ) |
requested
available
resource
class
toil.batchSystems.abstractBatchSystem.ResourcePool(initial_value,
resource_type, timeout=5)
Represents an integral amount
of a resource (such as memory bytes). Amounts can be
acquired immediately or with a timeout, and released.
Provides a context manager to do something with an amount of
resource acquired.
Parameters
|
• |
initial_value ( int ) |
|||
|
• |
resource_type ( str ) |
|||
|
• |
timeout ( float ) |
condition
|
value |
resource_type
timeout
acquireNow(amount)
Reserve the given amount of the
given resource. Returns True if successful and False if this
is not possible immediately.
Parameters
amount ( int )
Return type
bool
acquire(amount)
Reserve the given amount of the
given resource. Raises AcquisitionTimeoutException if this
is not possible in under self.timeout time.
Parameters
amount ( int )
Return type
None
release(amount)
Parameters
amount ( int )
Return type
None
__str__()
Return type
str
__repr__()
Return type
str
acquisitionOf(amount)
Parameters
amount ( int )
Return type
collections.abc.Iterator [None]
class
toil.batchSystems.abstractBatchSystem.ResourceSet(initial_value,
resource_type, timeout=5)
Represents a collection of
distinct resources (such as accelerators). Subsets can be
acquired immediately or with a timeout, and released.
Provides a context manager to do something with a set of of
resources acquired.
Parameters
|
• |
initial_value ( set[int] ) |
|||
|
• |
resource_type ( str ) |
|||
|
• |
timeout ( float ) |
condition
|
value |
resource_type
timeout
acquireNow(subset)
Reserve the given amount of the
given resource. Returns True if successful and False if this
is not possible immediately.
Parameters
subset ( set[int] )
Return type
bool
acquire(subset)
Reserve the given amount of the
given resource. Raises AcquisitionTimeoutException if this
is not possible in under self.timeout time.
Parameters
subset ( set[int] )
Return type
None
release(subset)
Parameters
subset ( set[int] )
Return type
None
get_free_snapshot()
Get a snapshot of what items
are free right now. May be stale as soon as you get it, but
you will need some kind of hint to try and do an acquire.
Return type
set [ int ]
__str__()
Return type
str
__repr__()
Return type
str
acquisitionOf(subset)
Parameters
subset ( set[int] )
Return type
collections.abc.Iterator [None]
toil.batchSystems.abstractGridEngineBatchSystem
Attributes
Exceptions
Classes
Module Contents
toil.batchSystems.abstractGridEngineBatchSystem.logger
toil.batchSystems.abstractGridEngineBatchSystem.JobTuple
exception
toil.batchSystems.abstractGridEngineBatchSystem.ExceededRetryAttempts
Bases: Exception
Common base class for all non-exit exceptions.
class
toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineBatchSystem(config,
maxCores, maxMemory, maxDisk)
Bases: toil.batchSystems.cleanup_support.BatchSystemCleanupSupport
A partial
implementation of BatchSystemSupport for batch systems run
on a standard HPC cluster. By default auto-deployment is not
implemented.
Parameters
|
• |
config ( toil.common.Config ) |
|||
|
• |
maxCores ( float ) |
|||
|
• |
maxMemory ( int ) |
|||
|
• |
maxDisk ( int ) |
exception GridEngineThreadException
Bases: Exception
Common base class for all non-exit exceptions.
class
GridEngineThread(newJobsQueue, updatedJobsQueue,
killQueue, killedJobsQueue, boss)
Bases: threading.Thread
A class that represents a thread of control.
This class can
be safely subclassed in a limited fashion. There are two
ways to specify the activity: by passing a callable object
to the constructor, or by overriding the run() method in a
subclass.
Parameters
|
• |
newJobsQueue ( queue.Queue ) |
|||
|
• |
updatedJobsQueue ( queue.Queue ) |
|||
|
• |
killQueue ( queue.Queue ) |
|||
|
• |
killedJobsQueue ( queue.Queue ) |
|||
|
• |
boss ( AbstractGridEngineBatchSystem ) |
|||
|
boss |
newJobsQueue
updatedJobsQueue
killQueue
killedJobsQueue
waitingJobs:
list
[JobTuple]
runningJobs
runningJobsLock
batchJobIDs:
dict
[
int
,
str
]
exception = None
getBatchSystemID(jobID)
Get batch system-specific job ID
Note: for the
moment this is the only consistent way to cleanly get the
batch system job ID
Parameters
jobID ( int ) -- Toil BatchSystem numerical job ID
Return type
str
forgetJob(jobID)
Remove jobID passed
Parameters
jobID ( int ) -- toil job ID
Return type
None
createJobs(newJob)
Create a new job with the given attributes.
Implementation-specific;
called by GridEngineThread.run()
Parameters
newJob ( JobTuple )
Return type
bool
killJobs()
Kill any running jobs within thread
checkOnJobs()
Check and update status of all running jobs.
Respects statePollingWait and will return cached results if not within time period to talk with the scheduler.
|
run() |
Run any new jobs |
coalesce_job_exit_codes(batch_job_id_list)
Returns exit codes and possibly exit reasons for a list of jobs, or None if they are running.
Called by GridEngineThread.checkOnJobs().
The default
implementation falls back on self.getJobExitCode and polls
each job individually
Parameters
batch_job_id_list ( string ) -- List of batch system job ID
Return type
list [Union[ int , tuple [ int , Optional[- toil.batchSystems.abstractBatchSystem.BatchJobExitReason ]], None]]
abstract
prepareSubmission(cpu, memory, jobID, command,
jobName, job_environment=None, gpus=None)
Preparation in putting together a command-line string for submitting to batch system (via submitJob().)
|
Param |
int cpu |
||
|
Param |
int memory |
||
|
Param |
int jobID: Toil job ID |
||
|
Param |
string subLine: the command line string to be called |
||
|
Param |
string jobName: the name of the Toil job, to provide metadata to batch systems if desired |
||
|
Param |
dict job_environment: the environment variables to be set on the worker |
Return type
List[ str ]
Parameters
|
• |
cpu ( int ) |
||
|
• |
memory ( int ) |
||
|
• |
jobID ( int ) |
||
|
• |
command ( str ) |
||
|
• |
jobName ( str ) |
||
|
• |
job_environment ( Optional[dict[str, str]] ) |
||
|
• |
gpus ( Optional[int] ) |
abstract submitJob(subLine)
Wrapper routine for submitting the actual command-line call, then processing the output to get the batch system job ID
|
Param |
string subLine: the literal command line string to be called |
Return type
string: batch system job ID, which will be stored internally
abstract getRunningJobIDs()
Get a list of running job IDs.
Implementation-specific; called by boss
AbstractGridEngineBatchSystem implementation via
AbstractGridEngineBatchSystem.getRunningBatchJobIDs()
Return type
list
abstract killJob(jobID)
Kill specific job with the Toil
job ID. Implementation-specific; called by
GridEngineThread.killJobs()
Parameters
jobID ( string ) -- Toil job ID
abstract getJobExitCode(batchJobID)
Returns job exit code and possibly an instance of abstractBatchSystem.BatchJobExitReason.
Returns None if the job is still running.
If the job is not running but the exit code is not available, it will be EXIT_STATUS_UNAVAILABLE_VALUE. Implementation-specific; called by GridEngineThread.checkOnJobs().
The exit code
will only be 0 if the job affirmatively succeeded.
Parameters
batchjobID ( string ) -- batch system job ID
Return type
Union[ int , tuple [ int , Optional[- toil.batchSystems.abstractBatchSystem.BatchJobExitReason ]], None]
|
config |
currentJobs
newJobsQueue
updatedJobsQueue
killQueue
killedJobsQueue
background_thread
classmethod supportsAutoDeployment()
Whether this batch system supports auto-deployment of the user script itself.
If it does, the setUserScript() can be invoked to set the resource object representing the user script.
Note to implementors: If your implementation returns True here, it should also override
count_needed_gpus(job_desc)
Count the number of
cluster-allocateable GPUs we want to allocate for the given
job.
Parameters
job_desc ( toil.job.JobDescription )
issueBatchJob(command, job_desc, job_environment=None)
Issues a job with the specified
command to the batch system and returns a unique job ID
number.
Parameters
|
• |
command ( str ) -- the command to execute somewhere to run the Toil worker process |
||
|
• |
job_desc ( toil.job.JobDescription ) -- the JobDescription for the job being run |
||
|
• |
job_environment ( Optional[dict[str, str]] ) -- a collection of job-specific environment variables to be set on the worker. |
Returns
a unique job ID number that can be used to reference the newly issued job
killBatchJobs(jobIDs)
Kills the given jobs, represented as Job ids, then checks they are dead by checking they are not in the list of issued jobs.
getIssuedBatchJobIDs()
Gets the list of issued jobs
getRunningBatchJobIDs()
Retrieve running job IDs from local and batch scheduler.
Respects statePollingWait and will return cached results if not within time period to talk with the scheduler.
getUpdatedBatchJob(maxWait)
Returns information about job that has updated its status (i.e. ceased running, either successfully or with an error). Each such job will be returned exactly once.
Does not return
info for jobs killed by killBatchJobs, although they may
cause None to be returned earlier than maxWait.
Parameters
maxWait -- the number of seconds to block, waiting for a result
Returns
If a result is available, returns UpdatedBatchJobInfo. Otherwise it returns None. wallTime is the number of seconds (a strictly positive float) in wall-clock time the job ran for, or None if this batch system does not support tracking wall time.
shutdown()
Signals thread to shutdown (via
sentinel) then cleanly joins the thread
Return type
None
setEnv(name, value=None)
Set an environment variable for the worker process before it is launched. The worker process will typically inherit the environment of the machine it is running on but this method makes it possible to override specific variables in that inherited environment before the worker is launched. Note that this mechanism is different to the one used by the worker internally to set up the environment of a job. A call to this method affects all jobs issued after this method returns. Note to implementors: This means that you would typically need to copy the variables before enqueuing a job.
If no value is
provided it will be looked up from the current environment.
Parameters
|
• |
name -- the environment variable to be set on the worker. |
||
|
• |
value -- if given, the environment variable given by name will be set to this value. If None, the variable's current value will be used as the value on the worker |
||
|
Raises |
RuntimeError -- if value is None and the name cannot be found in the environment
classmethod
getWaitDuration()
sleepSeconds(sleeptime=1)
Helper function to drop on all state-querying functions to avoid over-querying.
with_retries(operation, *args, **kwargs)
Call operation with args and kwargs. If one of the calls to a command fails, sleep and try again.
toil.batchSystems.awsBatch
Batch system for running Toil workflows on AWS Batch.
Useful with the AWS job store.
AWS Batch has no means for scheduling based on disk usage, so the backing machines need to have "enough" disk and other constraints need to guarantee that disk does not fill.
Assumes that an AWS Batch Queue name or ARN is already provided.
Handles creating and destroying a JobDefinition for the workflow run.
Additional containers should be launched with Singularity, not Docker.
Attributes
Classes
Module Contents
toil.batchSystems.awsBatch.logger
toil.batchSystems.awsBatch.STATE_TO_EXIT_REASON:
dict
[
str
,
toil.batchSystems.abstractBatchSystem.BatchJobExitReason
]
toil.batchSystems.awsBatch.MAX_POLL_COUNT = 100
toil.batchSystems.awsBatch.MIN_REQUESTABLE_MIB = 4
toil.batchSystems.awsBatch.MIN_REQUESTABLE_CORES = 1
class toil.batchSystems.awsBatch.AWSBatchBatchSystem(config,
maxCores,
maxMemory, maxDisk)
Bases: toil.batchSystems.cleanup_support.BatchSystemCleanupSupport
Adds cleanup
support when the last running job leaves a node, for batch
systems that can't provide it using the backing scheduler.
Parameters
|
• |
config ( toil.common.Config ) |
|||
|
• |
maxCores ( float ) |
|||
|
• |
maxMemory ( int ) |
|||
|
• |
maxDisk ( int ) |
classmethod supportsAutoDeployment()
Whether this batch system supports auto-deployment of the user script itself.
If it does, the setUserScript() can be invoked to set the resource object representing the user script.
Note to
implementors: If your implementation returns True here, it
should also override
Return type
bool
|
region |
||
|
client |
||
|
queue |
job_role_arn
owner_tag
worker_work_dir
user_script:
toil.resource.Resource
|
None
= None
docker_image
job_definition:
str
|
None
=
None
bs_id_to_aws_id:
dict
[
int
,
str
]
aws_id_to_bs_id:
dict
[
str
,
int
]
killed_job_aws_ids:
set
[
str
]
setUserScript(user_script)
Set the user script for this workflow.
This method
must be called before the first job is issued to this batch
system, and only if
supportsAutoDeployment()
returns
True, otherwise it will raise an exception.
Parameters
|
• |
userScript -- the resource object representing the user script or module and the modules it depends on. |
||
|
• |
user_script ( toil.resource.Resource ) |
Return type
None
issueBatchJob(command, job_desc, job_environment=None)
Issues a job with the specified
command to the batch system and returns a unique job ID
number.
Parameters
|
• |
command ( str ) -- the command to execute somewhere to run the Toil worker process |
||
|
• |
job_desc ( toil.job.JobDescription ) -- the JobDescription for the job being run |
||
|
• |
job_environment ( Optional[dict[str, str]] ) -- a collection of job-specific environment variables to be set on the worker. |
Returns
a unique job ID number that can be used to reference the newly issued job
Return type
int
getUpdatedBatchJob(maxWait)
Returns information about job that has updated its status (i.e. ceased running, either successfully or with an error). Each such job will be returned exactly once.
Does not return
info for jobs killed by killBatchJobs, although they may
cause None to be returned earlier than maxWait.
Parameters
maxWait ( int ) -- the number of seconds to block, waiting for a result
Returns
If a result is available, returns UpdatedBatchJobInfo. Otherwise it returns None. wallTime is the number of seconds (a strictly positive float) in wall-clock time the job ran for, or None if this batch system does not support tracking wall time.
Return type
Optional[- toil.batchSystems.abstractBatchSystem.UpdatedBatchJobInfo ]
shutdown()
Called at the completion of a
toil invocation. Should cleanly terminate all worker
threads.
Return type
None
getIssuedBatchJobIDs()
Gets all currently issued jobs
Returns
A list of jobs (as job ID numbers) currently issued (may be running, or may be waiting to be run). Despite the result being a list, the ordering should not be depended upon.
Return type
list [ int ]
getRunningBatchJobIDs()
Gets a map of jobs as job ID
numbers that are currently running (not just waiting) and
how long they have been running, in seconds.
Returns
dictionary with currently running job ID number keys and how many seconds they have been running as the value
Return type
dict [ int , float ]
killBatchJobs(job_ids)
Kills the given job IDs. After
returning, the killed jobs will not appear in the results of
getRunningBatchJobIDs. The killed job will not be returned
from getUpdatedBatchJob.
Parameters
|
• |
jobIDs -- list of IDs of jobs to kill |
|||
|
• |
job_ids ( list[int] ) |
Return type
None
classmethod add_options(parser)
If this batch system provides
any command line options, add them to the given parser.
Parameters
parser ( Union[argparse.ArgumentParser, argparse._ArgumentGroup] )
Return type
None
classmethod setOptions(setOption)
Process command line or
configuration options relevant to this batch system.
Parameters
setOption ( toil.batchSystems.options.OptionSetter ) -- A function with signature setOption(option_name, parsing_function=None, check_function=None, default=None, env=None) returning nothing, used to update run configuration as a side effect.
Return type
None
toil.batchSystems.cleanup_support
Attributes
Classes
Module Contents
toil.batchSystems.cleanup_support.logger
class
toil.batchSystems.cleanup_support.BatchSystemCleanupSupport(config,
maxCores, maxMemory, maxDisk)
Bases: toil.batchSystems.local_support.BatchSystemLocalSupport
Adds cleanup
support when the last running job leaves a node, for batch
systems that can't provide it using the backing scheduler.
Parameters
|
• |
config ( toil.common.Config ) |
|||
|
• |
maxCores ( float ) |
|||
|
• |
maxMemory ( int ) |
|||
|
• |
maxDisk ( int ) |
classmethod supportsWorkerCleanup()
Whether this batch system supports worker cleanup.
Indicates
whether this batch system invokes
BatchSystemSupport.workerCleanup()
after the last job
for a particular workflow invocation finishes. Note that the
term
worker
refers to an entire node, not just a
worker process. A worker process may run more than one job
sequentially, and more than one concurrent worker process
may exist on a worker node, for the same workflow. The batch
system is said to
shut down
after the last worker
process terminates.
Return type
bool
getWorkerContexts()
Get a list of picklable context manager objects to wrap worker work in, in order.
Can be used to
ask the Toil worker to do things in-process (such as
configuring environment variables, hot-deploying user
scripts, or cleaning up a node) that would otherwise require
a wrapping "executor" process.
Return type
list [ContextManager[Any]]
class
toil.batchSystems.cleanup_support.WorkerCleanupContext(workerCleanupInfo)
Context manager used by BatchSystemCleanupSupport to implement cleanup on a node after the last worker is done working.
Gets wrapped
around the worker's work.
Parameters
workerCleanupInfo (- toil.batchSystems.abstractBatchSystem.WorkerCleanupInfo )
workerCleanupInfo
__enter__()
Return type
None
__exit__(type, value, traceback)
Parameters
|
• |
type ( Optional[WorkerCleanupContext.__exit__.type[- BaseException]] ) |
|||
|
• |
value ( Optional[BaseException] ) |
|||
|
• |
traceback ( Optional[types.TracebackType] ) |
Return type
None
toil.batchSystems.contained_executor
Executor for running inside a container.
Useful for Kubernetes batch system and TES batch system plugin.
Attributes
Functions
Module Contents
toil.batchSystems.contained_executor.logger
toil.batchSystems.contained_executor.pack_job(command,
user_script=None, environment=None)
Create a command that runs the
given command in an environment.
Parameters
|
• |
command ( str ) -- Worker command to run to run the job. |
||
|
• |
user_script ( Optional[toil.resource.Resource] ) -- User script that will be loaded before the job is run. |
||
|
• |
environment ( Optional[dict[str, str]] ) -- Environment variable dict that will be applied before the job is run. |
Returns
Command to run the job, as an argument list that can be run inside the Toil appliance container.
Return type
list [ str ]
toil.batchSystems.contained_executor.executor()
Main function of the _toil_contained_executor entrypoint.
Runs inside the Toil container.
Responsible for
setting up the user script and running the command for the
job (which may in turn invoke the Toil worker entrypoint).
Return type
None
toil.batchSystems.gridengine
Attributes
Classes
Module Contents
toil.batchSystems.gridengine.logger
class
toil.batchSystems.gridengine.GridEngineBatchSystem(config,
maxCores, maxMemory, maxDisk)
Bases: toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineBatchSystem
A partial
implementation of BatchSystemSupport for batch systems run
on a standard HPC cluster. By default auto-deployment is not
implemented.
Parameters
|
• |
config ( toil.common.Config ) |
|||
|
• |
maxCores ( float ) |
|||
|
• |
maxMemory ( int ) |
|||
|
• |
maxDisk ( int ) |
class
GridEngineThread(newJobsQueue, updatedJobsQueue,
killQueue, killedJobsQueue, boss)
Bases: toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineBatchSystem.GridEngineThread
Grid
Engine-specific AbstractGridEngineWorker methods
Parameters
|
• |
newJobsQueue ( queue.Queue ) |
|||
|
• |
updatedJobsQueue ( queue.Queue ) |
|||
|
• |
killQueue ( queue.Queue ) |
|||
|
• |
killedJobsQueue ( queue.Queue ) |
|||
|
• |
boss ( AbstractGridEngineBatchSystem ) |
getRunningJobIDs()
Get a list of running job IDs.
Implementation-specific; called by boss
AbstractGridEngineBatchSystem implementation via
AbstractGridEngineBatchSystem.getRunningBatchJobIDs()
Return type
list
killJob(jobID)
Kill specific job with the Toil
job ID. Implementation-specific; called by
GridEngineThread.killJobs()
Parameters
jobID ( string ) -- Toil job ID
prepareSubmission(cpu,
memory, jobID, command, jobName,
job_environment=None, gpus=None)
Preparation in putting together a command-line string for submitting to batch system (via submitJob().)
|
Param |
int cpu |
||
|
Param |
int memory |
||
|
Param |
int jobID: Toil job ID |
||
|
Param |
string subLine: the command line string to be called |
||
|
Param |
string jobName: the name of the Toil job, to provide metadata to batch systems if desired |
||
|
Param |
dict job_environment: the environment variables to be set on the worker |
Return type
List[ str ]
Parameters
|
• |
cpu ( int ) |
||
|
• |
memory ( int ) |
||
|
• |
jobID ( int ) |
||
|
• |
command ( str ) |
||
|
• |
jobName ( str ) |
||
|
• |
job_environment ( Optional[dict[str, str]] ) |
||
|
• |
gpus ( Optional[int] ) |
submitJob(subLine)
Wrapper routine for submitting the actual command-line call, then processing the output to get the batch system job ID
|
Param |
string subLine: the literal command line string to be called |
Return type
string: batch system job ID, which will be stored internally
getJobExitCode(sgeJobID)
Get job exist code, checking both qstat and qacct. Return None if still running. Higher level should retry on CalledProcessErrorStderr, for the case the job has finished and qacct result is stale.
prepareQsub(cpu, mem, jobID, job_environment=None)
Parameters
|
• |
cpu ( int ) |
||
|
• |
mem ( int ) |
||
|
• |
jobID ( int ) |
||
|
• |
job_environment ( Optional[dict[str, str]] ) |
Return type
list [ str ]
classmethod getWaitDuration()
toil.batchSystems.htcondor
Attributes
Classes
Module Contents
toil.batchSystems.htcondor.logger
toil.batchSystems.htcondor.JobTuple
toil.batchSystems.htcondor.schedd_lock
class toil.batchSystems.htcondor.HTCondorBatchSystem(config,
maxCores,
maxMemory, maxDisk)
Bases: toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineBatchSystem
A partial
implementation of BatchSystemSupport for batch systems run
on a standard HPC cluster. By default auto-deployment is not
implemented.
Parameters
|
• |
config ( toil.common.Config ) |
|||
|
• |
maxCores ( float ) |
|||
|
• |
maxMemory ( int ) |
|||
|
• |
maxDisk ( int ) |
class
GridEngineThread(newJobsQueue, updatedJobsQueue,
killQueue, killedJobsQueue, boss)
Bases: toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineBatchSystem.GridEngineThread
A class that represents a thread of control.
This class can
be safely subclassed in a limited fashion. There are two
ways to specify the activity: by passing a callable object
to the constructor, or by overriding the run() method in a
subclass.
Parameters
|
• |
newJobsQueue ( queue.Queue ) |
|||
|
• |
updatedJobsQueue ( queue.Queue ) |
|||
|
• |
killQueue ( queue.Queue ) |
|||
|
• |
killedJobsQueue ( queue.Queue ) |
|||
|
• |
boss ( AbstractGridEngineBatchSystem ) |
createJobs(newJob)
Create a new job with the given attributes.
Implementation-specific;
called by GridEngineThread.run()
Parameters
newJob ( JobTuple )
Return type
bool
prepareSubmission(cpu,
memory, disk, jobID, jobName,
command, environment)
Preparation in putting together a command-line string for submitting to batch system (via submitJob().)
|
Param |
int cpu |
||
|
Param |
int memory |
||
|
Param |
int jobID: Toil job ID |
||
|
Param |
string subLine: the command line string to be called |
||
|
Param |
string jobName: the name of the Toil job, to provide metadata to batch systems if desired |
||
|
Param |
dict job_environment: the environment variables to be set on the worker |
Return type
List[ str ]
Parameters
|
• |
cpu ( int ) |
|||
|
• |
memory ( int ) |
|||
|
• |
disk ( int ) |
|||
|
• |
jobID ( int ) |
|||
|
• |
jobName ( str ) |
|||
|
• |
command ( str ) |
|||
|
• |
environment ( dict[str, str] ) |
submitJob(submitObj)
Wrapper routine for submitting the actual command-line call, then processing the output to get the batch system job ID
|
Param |
string subLine: the literal command line string to be called |
Return type
string: batch system job ID, which will be stored internally
getRunningJobIDs()
Get a list of running job IDs.
Implementation-specific; called by boss
AbstractGridEngineBatchSystem implementation via
AbstractGridEngineBatchSystem.getRunningBatchJobIDs()
Return type
list
killJob(jobID)
Kill specific job with the Toil
job ID. Implementation-specific; called by
GridEngineThread.killJobs()
Parameters
jobID ( string ) -- Toil job ID
getJobExitCode(batchJobID)
Returns job exit code and possibly an instance of abstractBatchSystem.BatchJobExitReason.
Returns None if the job is still running.
If the job is not running but the exit code is not available, it will be EXIT_STATUS_UNAVAILABLE_VALUE. Implementation-specific; called by GridEngineThread.checkOnJobs().
The exit code
will only be 0 if the job affirmatively succeeded.
Parameters
batchjobID ( string ) -- batch system job ID
connectSchedd()
Connect to HTCondor Schedd and yield a Schedd object.
You can only use it inside the context. Handles locking to make sure that only one thread is trying to do this at a time.
duplicate_quotes(value)
Escape a string by doubling up all single and double quotes.
This is used
for arguments we pass to htcondor that need to be inside
both double and single quote enclosures.
Parameters
value ( str )
Return type
str
getEnvString(overrides)
Build an environment string that a HTCondor Submit object can use.
For examples of
valid strings, see: -
http://research.cs.wisc.edu/htcondor/manual/current/condor_submit.html#man-condor-submit-environment
Parameters
overrides ( dict[str, str] )
Return type
str
issueBatchJob(command, jobNode, job_environment=None)
Issues a job with the specified
command to the batch system and returns a unique job ID
number.
Parameters
|
• |
command ( str ) -- the command to execute somewhere to run the Toil worker process |
||
|
• |
job_desc -- the JobDescription for the job being run |
||
|
• |
job_environment ( Optional[dict[str, str]] ) -- a collection of job-specific environment variables to be set on the worker. |
Returns
a unique job ID number that can be used to reference the newly issued job
toil.batchSystems.kubernetes
Batch system for running Toil workflows on Kubernetes.
Ony useful with network-based job stores, like AWSJobStore.
Within non-privileged Kubernetes containers, additional Docker containers cannot yet be launched. That functionality will need to wait for user-mode Docker
Attributes
Classes
Functions
Module Contents
toil.batchSystems.kubernetes.logger
toil.batchSystems.kubernetes.retryable_kubernetes_errors:
list
[
type
[-
Exception
] |
toil.lib.retry.ErrorCondition
]
toil.batchSystems.kubernetes.is_retryable_kubernetes_error(e)
A function that determines
whether or not Toil should retry or stop given exceptions
thrown by Kubernetes.
Parameters
e ( Exception )
Return type
bool
toil.batchSystems.kubernetes.KeyValuesList
class
toil.batchSystems.kubernetes.KubernetesBatchSystem(config,
maxCores, maxMemory, maxDisk)
Bases: toil.batchSystems.cleanup_support.BatchSystemCleanupSupport
Adds cleanup
support when the last running job leaves a node, for batch
systems that can't provide it using the backing scheduler.
Parameters
|
• |
config ( toil.common.Config ) |
|||
|
• |
maxCores ( int ) |
|||
|
• |
maxMemory ( int ) |
|||
|
• |
maxDisk ( int ) |
classmethod supportsAutoDeployment()
Whether this batch system supports auto-deployment of the user script itself.
If it does, the setUserScript() can be invoked to set the resource object representing the user script.
Note to
implementors: If your implementation returns True here, it
should also override
Return type
bool
credential_time:
datetime.datetime
|
None
= None
namespace:
str
host_path:
str
|
None
service_account:
str
|
None
pod_timeout:
float
unique_id
job_prefix:
str
finished_job_ttl:
int
= 3600
user_script:
toil.resource.Resource
|
None
= None
docker_image:
str
worker_work_dir:
str
aws_secret_name:
str
|
None
enable_watching:
bool
run_id:
str
resource_sources:
list
[-
toil.batchSystems.abstractBatchSystem.ResourcePool
]
schedulingThread:
threading.Thread
class DecoratorWrapper(to_wrap, decorator)
Class to wrap an object so all
its methods are decorated.
Parameters
|
• |
to_wrap ( Any ) |
||
|
• |
decorator ( Callable[[Callable[P, Any]], Callable[P, Any]] ) |
||
|
P |
__getattr__(name)
Get a member as if we are
actually the wrapped object. If it looks callable, we will
decorate it.
Parameters
name ( str )
Return type
Any
|
ItemT |
CovItemT
|
P |
||
|
R |
setUserScript(userScript)
Set the user script for this workflow.
This method
must be called before the first job is issued to this batch
system, and only if
supportsAutoDeployment()
returns
True, otherwise it will raise an exception.
Parameters
userScript ( toil.resource.Resource ) -- the resource object representing the user script or module and the modules it depends on.
Return type
None
class Placement
Internal format for pod
placement constraints and preferences.
required_labels: KeyValuesList = []
Labels which are required to be present (with these values).
desired_labels: KeyValuesList = []
Labels which are optional, but preferred to be present (with these values).
prohibited_labels: KeyValuesList = []
Labels which are not allowed to be present (with these values).
tolerated_taints: KeyValuesList = []
Taints which are allowed to be present (with these values).
set_preemptible(preemptible)
Add constraints for a job being preemptible or not.
Preemptible jobs will be able to run on preemptible or non-preemptible nodes, and will prefer preemptible nodes if available.
Non-preemptible jobs will not be allowed to run on nodes that are marked as preemptible.
Understands the
labeling scheme used by EKS, and the taint scheme used by
GCE. The Toil-managed Kubernetes setup will mimic at least
one of these.
Parameters
preemptible ( bool )
Return type
None
apply(pod_spec)
Set
affinity
and/or
tolerations
fields on pod_spec, so that it runs on
the right kind of nodes for the constraints we represent.
Parameters
pod_spec ( kubernetes.client.V1PodSpec )
Return type
None
issueBatchJob(command, job_desc, job_environment=None)
Issues a job with the specified
command to the batch system and returns a unique job ID
number.
Parameters
|
• |
command ( str ) -- the command to execute somewhere to run the Toil worker process |
||
|
• |
job_desc ( toil.job.JobDescription ) -- the JobDescription for the job being run |
||
|
• |
job_environment ( Optional[dict[str, str]] ) -- a collection of job-specific environment variables to be set on the worker. |
Returns
a unique job ID number that can be used to reference the newly issued job
Return type
int
getUpdatedBatchJob(maxWait)
Returns information about job that has updated its status (i.e. ceased running, either successfully or with an error). Each such job will be returned exactly once.
Does not return
info for jobs killed by killBatchJobs, although they may
cause None to be returned earlier than maxWait.
Parameters
maxWait ( float ) -- the number of seconds to block, waiting for a result
Returns
If a result is available, returns UpdatedBatchJobInfo. Otherwise it returns None. wallTime is the number of seconds (a strictly positive float) in wall-clock time the job ran for, or None if this batch system does not support tracking wall time.
Return type
Optional[- toil.batchSystems.abstractBatchSystem.UpdatedBatchJobInfo ]
shutdown()
Called at the completion of a
toil invocation. Should cleanly terminate all worker
threads.
Return type
None
getIssuedBatchJobIDs()
Gets all currently issued jobs
Returns
A list of jobs (as job ID numbers) currently issued (may be running, or may be waiting to be run). Despite the result being a list, the ordering should not be depended upon.
Return type
list [ int ]
getRunningBatchJobIDs()
Gets a map of jobs as job ID
numbers that are currently running (not just waiting) and
how long they have been running, in seconds.
Returns
dictionary with currently running job ID number keys and how many seconds they have been running as the value
Return type
dict [ int , float ]
killBatchJobs(jobIDs)
Kills the given job IDs. After
returning, the killed jobs will not appear in the results of
getRunningBatchJobIDs. The killed job will not be returned
from getUpdatedBatchJob.
Parameters
jobIDs ( list[int] ) -- list of IDs of jobs to kill
Return type
None
classmethod get_default_kubernetes_owner()
Get the default
Kubernetes-acceptable username string to tack onto jobs.
Return type
str
class KubernetesConfig
Bases: Protocol
Type-enforcing protocol for Toil configs that have the extra Kubernetes batch system fields.
TODO: Until
MyPY lets protocols inherit form non-protocols, we will have
to let the fact that this also has to be a Config just be
manually enforced.
kubernetes_host_path:
str
|
None
kubernetes_owner:
str
kubernetes_service_account:
str
|
None
kubernetes_pod_timeout:
float
classmethod add_options(parser)
If this batch system provides
any command line options, add them to the given parser.
Parameters
parser ( Union[argparse.ArgumentParser, argparse._ArgumentGroup] )
Return type
None
OptionType
classmethod setOptions(setOption)
Process command line or
configuration options relevant to this batch system.
Parameters
setOption ( toil.batchSystems.options.OptionSetter ) -- A function with signature setOption(option_name, parsing_function=None, check_function=None, default=None, env=None) returning nothing, used to update run configuration as a side effect.
Return type
None
toil.batchSystems.local_support
Attributes
Classes
Module Contents
toil.batchSystems.local_support.logger
class
toil.batchSystems.local_support.BatchSystemLocalSupport(config,
maxCores, maxMemory, maxDisk)
Bases: toil.batchSystems.abstractBatchSystem.BatchSystemSupport
Adds a local
queue for helper jobs, useful for CWL & others.
Parameters
|
• |
config ( toil.common.Config ) |
|||
|
• |
maxCores ( float ) |
|||
|
• |
maxMemory ( int ) |
|||
|
• |
maxDisk ( int ) |
localBatch:
toil.batchSystems.singleMachine.SingleMachineBatchSystem
handleLocalJob(command, jobDesc)
To be called by issueBatchJob.
Returns the
jobID if the jobDesc has been submitted to the local queue,
otherwise returns None
Parameters
|
• |
command ( str ) |
|||
|
• |
jobDesc ( toil.job.JobDescription ) |
Return type
Optional[ int ]
killLocalJobs(jobIDs)
Will kill all local jobs that match the provided jobIDs.
To be called by
killBatchJobs.
Parameters
jobIDs ( list[int] )
Return type
None
getIssuedLocalJobIDs()
To be called by
getIssuedBatchJobIDs.
Return type
list [ int ]
getRunningLocalJobIDs()
To be called by
getRunningBatchJobIDs().
Return type
dict [ int , float ]
getUpdatedLocalJob(maxWait)
To be called by
getUpdatedBatchJob().
Parameters
maxWait ( int )
Return type
Optional[- toil.batchSystems.abstractBatchSystem.UpdatedBatchJobInfo ]
getNextJobID()
Must be used to get job IDs so
that the local and batch jobs do not conflict.
Return type
int
shutdownLocal()
To be called from shutdown().
Return type
None
toil.batchSystems.lsf
Attributes
Classes
Module Contents
toil.batchSystems.lsf.logger
class toil.batchSystems.lsf.LSFBatchSystem(config, maxCores,
maxMemory,
maxDisk)
Bases: toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineBatchSystem
A partial
implementation of BatchSystemSupport for batch systems run
on a standard HPC cluster. By default auto-deployment is not
implemented.
Parameters
|
• |
config ( toil.common.Config ) |
|||
|
• |
maxCores ( float ) |
|||
|
• |
maxMemory ( int ) |
|||
|
• |
maxDisk ( int ) |
class
GridEngineThread(newJobsQueue, updatedJobsQueue,
killQueue, killedJobsQueue, boss)
Bases: toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineBatchSystem.GridEngineThread
LSF specific
GridEngineThread methods.
Parameters
|
• |
newJobsQueue ( queue.Queue ) |
|||
|
• |
updatedJobsQueue ( queue.Queue ) |
|||
|
• |
killQueue ( queue.Queue ) |
|||
|
• |
killedJobsQueue ( queue.Queue ) |
|||
|
• |
boss ( AbstractGridEngineBatchSystem ) |
getRunningJobIDs()
Get a list of running job IDs.
Implementation-specific; called by boss
AbstractGridEngineBatchSystem implementation via
AbstractGridEngineBatchSystem.getRunningBatchJobIDs()
Return type
list
fallbackRunningJobIDs(currentjobs)
killJob(jobID)
Kill specific job with the Toil
job ID. Implementation-specific; called by
GridEngineThread.killJobs()
Parameters
jobID ( string ) -- Toil job ID
prepareSubmission(cpu,
memory, jobID, command, jobName,
job_environment=None, gpus=None)
Preparation in putting together a command-line string for submitting to batch system (via submitJob().)
|
Param |
int cpu |
||
|
Param |
int memory |
||
|
Param |
int jobID: Toil job ID |
||
|
Param |
string subLine: the command line string to be called |
||
|
Param |
string jobName: the name of the Toil job, to provide metadata to batch systems if desired |
||
|
Param |
dict job_environment: the environment variables to be set on the worker |
Return type
List[ str ]
Parameters
|
• |
cpu ( int ) |
||
|
• |
memory ( int ) |
||
|
• |
jobID ( int ) |
||
|
• |
command ( str ) |
||
|
• |
jobName ( str ) |
||
|
• |
job_environment ( Optional[dict[str, str]] ) |
||
|
• |
gpus ( Optional[int] ) |
submitJob(subLine)
Wrapper routine for submitting the actual command-line call, then processing the output to get the batch system job ID
|
Param |
string subLine: the literal command line string to be called |
Return type
string: batch system job ID, which will be stored internally
coalesce_job_exit_codes(batch_job_id_list)
Returns exit codes and possibly exit reasons for a list of jobs, or None if they are running.
Called by GridEngineThread.checkOnJobs().
The default
implementation falls back on self.getJobExitCode and polls
each job individually
Parameters
batch_job_id_list ( string ) -- List of batch system job ID
Return type
list
getJobExitCode(lsfJobID)
Returns job exit code and possibly an instance of abstractBatchSystem.BatchJobExitReason.
Returns None if the job is still running.
If the job is not running but the exit code is not available, it will be EXIT_STATUS_UNAVAILABLE_VALUE. Implementation-specific; called by GridEngineThread.checkOnJobs().
The exit code
will only be 0 if the job affirmatively succeeded.
Parameters
batchjobID ( string ) -- batch system job ID
Return type
Union[ int , tuple [ int , Optional[- toil.batchSystems.abstractBatchSystem.BatchJobExitReason ]], None]
parse_bjobs_record(bjobs_record, job)
Helper functions for
getJobExitCode and to parse the bjobs status record
Parameters
|
• |
bjobs_record ( dict ) |
|||
|
• |
job ( int ) |
Return type
Union[ int , tuple [ int , Optional[- toil.batchSystems.abstractBatchSystem.BatchJobExitReason ]], None]
getJobExitCodeBACCT(job)
Return type
Union[ int , tuple [ int , Optional[- toil.batchSystems.abstractBatchSystem.BatchJobExitReason ]], None]
fallbackGetJobExitCode(job)
Return type
Union[ int , tuple [ int , Optional[- toil.batchSystems.abstractBatchSystem.BatchJobExitReason ]], None]
prepareBsub(cpu, mem, jobID)
Make a bsub commandline to
execute.
params:
cpu: number of cores needed mem: number of bytes of memory needed jobID: ID number of the job
Parameters
|
• |
cpu ( int ) |
|||
|
• |
mem ( int ) |
|||
|
• |
jobID ( int ) |
Return type
list [ str ]
parseBjobs(bjobs_output_str)
Parse records from bjobs json
type output
Params bjobs_output_str
stdout of bjobs json type output
parseMaxMem(jobID)
Parse the maximum memory from
job.
Parameters
jobID -- ID number of the job
getWaitDuration()
We give LSF a second to catch its breath (in seconds)
toil.batchSystems.lsfHelper
Attributes
Functions
Module Contents
toil.batchSystems.lsfHelper.LSB_PARAMS_FILENAME
= 'lsb.params'
toil.batchSystems.lsfHelper.LSF_CONF_FILENAME = 'lsf.conf'
toil.batchSystems.lsfHelper.LSF_CONF_ENV = ['LSF_CONFDIR',
'LSF_ENVDIR']
toil.batchSystems.lsfHelper.DEFAULT_LSF_UNITS = 'KB'
toil.batchSystems.lsfHelper.DEFAULT_RESOURCE_UNITS = 'MB'
toil.batchSystems.lsfHelper.LSF_JSON_OUTPUT_MIN_VERSION =
'10.1.0.2'
toil.batchSystems.lsfHelper.logger
toil.batchSystems.lsfHelper.find(basedir, string)
walk basedir and return all files matching string
toil.batchSystems.lsfHelper.find_first_match(basedir, string)
return the first file that matches string starting from basedir
toil.batchSystems.lsfHelper.get_conf_file(filename,
env)
toil.batchSystems.lsfHelper.apply_conf_file(fn,
conf_filename)
toil.batchSystems.lsfHelper.per_core_reserve_from_stream(stream)
toil.batchSystems.lsfHelper.get_lsf_units_from_stream(stream)
toil.batchSystems.lsfHelper.tokenize_conf_stream(conf_handle)
convert the key=val pairs in a LSF config stream to tuples of tokens
toil.batchSystems.lsfHelper.apply_bparams(fn)
apply fn to each line of bparams, returning the result
toil.batchSystems.lsfHelper.apply_lsadmin(fn)
apply fn to each line of lsadmin, returning the result
toil.batchSystems.lsfHelper.get_lsf_units(resource=False)
check if we can find
LSF_UNITS_FOR_LIMITS in lsadmin and lsf.conf files,
preferring the value in bparams, then lsadmin, then the
lsf.conf file
Parameters
resource ( bool )
Return type
str
toil.batchSystems.lsfHelper.parse_mem_and_cmd_from_output(output)
Use regex to find "MAX
MEM" and "Command" inside of an output.
Parameters
output ( str )
toil.batchSystems.lsfHelper.get_lsf_version()
Get current LSF version
toil.batchSystems.lsfHelper.check_lsf_json_output_supported()
Check if the current LSF system supports bjobs json output.
toil.batchSystems.lsfHelper.parse_memory(mem)
Parse memory parameter.
Parameters
mem ( float )
Return type
str
toil.batchSystems.lsfHelper.per_core_reservation()
returns True if the cluster is configured for reservations to be per core, False if it is per job
toil.batchSystems.mesos
Submodules
toil.batchSystems.mesos.batchSystem
Attributes
Classes
Module Contents
toil.batchSystems.mesos.batchSystem.log
class
toil.batchSystems.mesos.batchSystem.MesosBatchSystem(config,
maxCores, maxMemory, maxDisk)
Bases: toil.batchSystems.local_support.BatchSystemLocalSupport , toil.batchSystems.abstractBatchSystem.AbstractScalableBatchSystem , pymesos.Scheduler
A Toil batch
system implementation that uses Apache Mesos to distribute
toil jobs as Mesos tasks over a cluster of agent nodes. A
Mesos framework consists of a scheduler and an executor.
This class acts as the scheduler and is typically run on the
master node that also runs the Mesos master process with
which the scheduler communicates via a driver component. The
executor is implemented in a separate class. It is run on
each agent node and communicates with the Mesos agent
process via another driver object. The scheduler may also be
run on a separate node from the master, which we then call
somewhat ambiguously the driver node.
classmethod supportsAutoDeployment()
Whether this batch system supports auto-deployment of the user script itself.
If it does, the setUserScript() can be invoked to set the resource object representing the user script.
Note to implementors: If your implementation returns True here, it should also override
classmethod supportsWorkerCleanup()
Whether this batch system supports worker cleanup.
Indicates whether this batch system invokes BatchSystemSupport.workerCleanup() after the last job for a particular workflow invocation finishes. Note that the term worker refers to an entire node, not just a worker process. A worker process may run more than one job sequentially, and more than one concurrent worker process may exist on a worker node, for the same workflow. The batch system is said to shut down after the last worker process terminates.
class ExecutorInfo(nodeAddress, agentId, nodeInfo, lastSeen)
nodeAddress
agentId
nodeInfo
lastSeen
userScript = None
|
Type |
toil.resource.Resource |
jobQueues
mesos_endpoint
mesos_name
killedJobIds
killJobIds
intendedKill
hostToJobIDs
nodeFilter = []
runningJobMap
taskResources
updatedJobsQueue
driver = None
frameworkId = None
executors
agentsByID
nonPreemptibleNodes
executor
lastTimeOfferLogged = 0
logPeriod = 30
ignoredNodes
setUserScript(userScript)
Set the user script for this workflow.
This method
must be called before the first job is issued to this batch
system, and only if
supportsAutoDeployment()
returns
True, otherwise it will raise an exception.
Parameters
userScript -- the resource object representing the user script or module and the modules it depends on.
ignoreNode(nodeAddress)
Stop sending jobs to this node.
Used in autoscaling when the autoscaler is ready to
terminate a node, but jobs are still running. This allows
the node to be terminated after the current jobs have
finished.
Parameters
nodeAddress -- IP address of node to ignore.
unignoreNode(nodeAddress)
Stop ignoring this address, presumably after a node with this address has been terminated. This allows for the possibility of a new node having the same address as a terminated one.
issueBatchJob(command, jobNode, job_environment=None)
Issues the following command
returning a unique jobID. Command is the string to run,
memory is an int giving the number of bytes the job needs to
run in and cores is the number of cpus needed for the job
and error-file is the path of the file to place any
std-err/std-out in.
Parameters
|
• |
command ( str ) |
|||
|
• |
jobNode ( toil.job.JobDescription ) |
|||
|
• |
job_environment ( Optional[dict[str, str]] ) |
killBatchJobs(jobIDs)
Kills the given job IDs. After
returning, the killed jobs will not appear in the results of
getRunningBatchJobIDs. The killed job will not be returned
from getUpdatedBatchJob.
Parameters
jobIDs -- list of IDs of jobs to kill
getIssuedBatchJobIDs()
Gets all currently issued jobs
Returns
A list of jobs (as job ID numbers) currently issued (may be running, or may be waiting to be run). Despite the result being a list, the ordering should not be depended upon.
getRunningBatchJobIDs()
Gets a map of jobs as job ID
numbers that are currently running (not just waiting) and
how long they have been running, in seconds.
Returns
dictionary with currently running job ID number keys and how many seconds they have been running as the value
getUpdatedBatchJob(maxWait)
Returns information about job that has updated its status (i.e. ceased running, either successfully or with an error). Each such job will be returned exactly once.
Does not return
info for jobs killed by killBatchJobs, although they may
cause None to be returned earlier than maxWait.
Parameters
maxWait -- the number of seconds to block, waiting for a result
Returns
If a result is available, returns UpdatedBatchJobInfo. Otherwise it returns None. wallTime is the number of seconds (a strictly positive float) in wall-clock time the job ran for, or None if this batch system does not support tracking wall time.
nodeInUse(nodeIP)
Can be used to determine if a
worker node is running any tasks. If the node is doesn't
exist, this function should simply return False.
Parameters
nodeIP ( str ) -- The worker nodes private IP address
Returns
True if the worker node has been issued any tasks, else False
Return type
bool
getWaitDuration()
Gets the period of time to wait (floating point, in seconds) between checking for missing/overlong jobs.
shutdown()
Called at the completion of a
toil invocation. Should cleanly terminate all worker
threads.
Return type
None
registered(driver, frameworkId, masterInfo)
Invoked when the scheduler successfully registers with a Mesos master
resourceOffers(driver, offers)
Invoked when resources have been offered to this framework.
statusUpdate(driver, update)
Invoked when the status of a task has changed (e.g., a agent is lost and so the task is lost, a task finishes and an executor sends a status update saying so, etc). Note that returning from this callback _acknowledges_ receipt of this status update! If for whatever reason the scheduler aborts during this callback (or the process exits) another status update will be delivered (note, however, that this is currently not true if the agent sending the status update is lost/fails during that time).
frameworkMessage(driver, executorId, agentId, message)
Invoked when an executor sends a message.
getNodes(preemptible=None, timeout=None)
Return all nodes that match:
|
• |
preemptible status (None includes all) |
||
|
• |
timeout period (seen within the last # seconds, or None for all) |
Parameters
|
• |
preemptible ( Optional[bool] ) |
|||
|
• |
timeout ( Optional[int] ) |
Return type
dict [ str , toil.batchSystems.abstractBatchSystem.NodeInfo ]
reregistered(driver, masterInfo)
Invoked when the scheduler re-registers with a newly elected Mesos master.
executorLost(driver, executorId, agentId, status)
Invoked when an executor has exited/terminated abnormally.
classmethod get_default_mesos_endpoint()
Get the default IP/hostname and
port that we will look for Mesos at.
Return type
str
classmethod add_options(parser)
If this batch system provides
any command line options, add them to the given parser.
Parameters
parser ( Union[argparse.ArgumentParser, argparse._ArgumentGroup] )
Return type
None
classmethod setOptions(setOption)
Process command line or
configuration options relevant to this batch system.
Parameters
setOption ( toil.batchSystems.options.OptionSetter ) -- A function with signature setOption(option_name, parsing_function=None, check_function=None, default=None, env=None) returning nothing, used to update run configuration as a side effect.
toil.batchSystems.mesos.conftest
Attributes
Module Contents
toil.batchSystems.mesos.conftest.collect_ignore = []
toil.batchSystems.mesos.executor
Attributes
Classes
Functions
Module Contents
toil.batchSystems.mesos.executor.log
class toil.batchSystems.mesos.executor.MesosExecutor
Bases: pymesos.Executor
Part of Toil's
Mesos framework, runs on a Mesos agent. A Toil job is passed
to it via the task.data field, and launched via
call(toil.command).
popenLock
runningTasks
workerCleanupInfo = None
address = None
id = None
registered(driver, executorInfo, frameworkInfo,
agentInfo)
Invoked once the executor driver has been able to successfully connect with Mesos.
reregistered(driver, agentInfo)
Invoked when the executor re-registers with a restarted agent.
disconnected(driver)
Invoked when the executor becomes "disconnected" from the agent (e.g., the agent is being restarted due to an upgrade).
killTask(driver, taskId)
Kill parent task process and all its spawned children
shutdown(driver)
error(driver, message)
Invoked when a fatal error has occurred with the executor and/or executor driver.
launchTask(driver, task)
Invoked by SchedulerDriver when a Mesos task should be launched by this executor
frameworkMessage(driver, message)
Invoked when a framework message has arrived for this executor.
toil.batchSystems.mesos.executor.main()
toil.batchSystems.mesos.test
Attributes
Classes
Package Contents
toil.batchSystems.mesos.test.log
class toil.batchSystems.mesos.test.MesosTestSupport
Mixin for test cases that need
a running Mesos master and agent on the local host.
wait_for_master()
class MesosThread(numCores)
Bases: toil.lib.threading.ExceptionalThread
A thread whose join() method re-raises exceptions raised during run(). While join() is idempotent, the exception is only during the first invocation of join() that successfully joined the thread. If join() times out, no exception will be re reraised even though an exception might already have occurred in run().
When subclassing this thread, override tryRun() instead of run().
>>>
def f():
... assert 0
>>> t = ExceptionalThread(target=f)
>>> t.start()
>>> t.join()
Traceback (most recent call last):
...
AssertionError
>>>
class MyThread(ExceptionalThread):
... def tryRun( self ):
... assert 0
>>> t = MyThread()
>>> t.start()
>>> t.join()
Traceback (most recent call last):
...
AssertionError
|
lock |
numCores
abstract mesosCommand()
tryRun()
findMesosBinary(names)
class MesosMasterThread(numCores)
Bases: MesosThread
A thread whose join() method re-raises exceptions raised during run(). While join() is idempotent, the exception is only during the first invocation of join() that successfully joined the thread. If join() times out, no exception will be re reraised even though an exception might already have occurred in run().
When subclassing this thread, override tryRun() instead of run().
>>>
def f():
... assert 0
>>> t = ExceptionalThread(target=f)
>>> t.start()
>>> t.join()
Traceback (most recent call last):
...
AssertionError
>>>
class MyThread(ExceptionalThread):
... def tryRun( self ):
... assert 0
>>> t = MyThread()
>>> t.start()
>>> t.join()
Traceback (most recent call last):
...
AssertionError
mesosCommand()
class MesosAgentThread(numCores)
Bases: MesosThread
A thread whose join() method re-raises exceptions raised during run(). While join() is idempotent, the exception is only during the first invocation of join() that successfully joined the thread. If join() times out, no exception will be re reraised even though an exception might already have occurred in run().
When subclassing this thread, override tryRun() instead of run().
>>>
def f():
... assert 0
>>> t = ExceptionalThread(target=f)
>>> t.start()
>>> t.join()
Traceback (most recent call last):
...
AssertionError
>>>
class MyThread(ExceptionalThread):
... def tryRun( self ):
... assert 0
>>> t = MyThread()
>>> t.start()
>>> t.join()
Traceback (most recent call last):
...
AssertionError
mesosCommand()
Attributes
Classes
Package Contents
toil.batchSystems.mesos.TaskData
class toil.batchSystems.mesos.JobQueue
|
queues |
sortedTypes
= []
jobLock
insertJob(job, jobType)
jobIDs()
nextJobOfType(jobType)
typeEmpty(jobType)
class
toil.batchSystems.mesos.MesosShape(wallTime, memory, cores,
disk,
preemptible)
Bases: toil.provisioners.abstractProvisioner.Shape
Represents a job or a node's "shape", in terms of the dimensions of memory, cores, disk and wall-time allocation.
The wallTime attribute stores the number of seconds of a node allocation, e.g. 3600 for AWS. FIXME: and for jobs?
The memory and
disk attributes store the number of bytes required by a job
(or provided by a node) in RAM or on disk (SSD or HDD),
respectively.
Parameters
|
• |
wallTime ( Union[int, float] ) |
|||
|
• |
memory ( int ) |
|||
|
• |
cores ( Union[int, float] ) |
|||
|
• |
disk ( int ) |
|||
|
• |
preemptible ( bool ) |
__gt__(other)
Inverted. Returns True if self is less than other, else returns False.
This is because jobTypes are sorted in decreasing order, and this was done to give expensive jobs priority.
toil.batchSystems.mesos.ToilJob
toil.batchSystems.options
Attributes
Classes
Functions
Module Contents
toil.batchSystems.options.logger
class toil.batchSystems.options.OptionSetter
Bases: Protocol
Protocol for the setOption function we get to let us set up CLI options for each batch system.
Actual
functionality is defined in the Config class.
OptionType
__call__(option_name, parsing_function=None,
check_function=None, default=None, env=None,
old_names=None)
Parameters
|
• |
option_name ( str ) |
||
|
• |
parsing_function ( Optional[Callable[[Any], OptionType]] ) |
||
|
• |
check_function ( Optional[Callable[[OptionType], Union[None, bool]]] ) |
||
|
• |
default ( Optional[OptionType] ) |
||
|
• |
env ( Optional[list[str]] ) |
||
|
• |
old_names ( Optional[list[str]] ) |
Return type
bool
toil.batchSystems.options.set_batchsystem_options(batch_system,
set_option)
Call set_option for all the
options for the given named batch system, or all batch
systems if no name is provided.
Parameters
|
• |
batch_system ( Optional[str] ) |
|||
|
• |
set_option ( OptionSetter ) |
Return type
None
toil.batchSystems.options.add_all_batchsystem_options(parser)
Parameters
parser ( Union[argparse.ArgumentParser, argparse._ArgumentGroup] )
Return type
None
toil.batchSystems.registry
Attributes
Functions
Module Contents
toil.batchSystems.registry.logger
toil.batchSystems.registry.add_batch_system_factory(key,
class_factory)
Adds a batch system to the
registry for workflow or plugin-supplied batch systems.
Parameters
|
• |
class_factory ( Callable[[], type[- toil.batchSystems.abstractBatchSystem.AbstractBatchSystem]] ) -- A function that returns a batch system class (NOT an instance), which implements toil.batchSystems.abstractBatchSystem.AbstractBatchSystem . |
||
|
• |
key ( str ) |
toil.batchSystems.registry.get_batch_systems()
Get the names of all the
availsble batch systems.
Return type
collections.abc.Sequence [ str ]
toil.batchSystems.registry.get_batch_system(key)
Get a batch system class by name.
|
Raises |
KeyError if the key is not the name of a batch system, and ImportError if the batch system's class cannot be loaded. |
Parameters
key ( str )
Return type
type [- toil.batchSystems.abstractBatchSystem.AbstractBatchSystem ]
toil.batchSystems.registry.DEFAULT_BATCH_SYSTEM
= 'single_machine'
toil.batchSystems.registry.aws_batch_batch_system_factory()
toil.batchSystems.registry.gridengine_batch_system_factory()
toil.batchSystems.registry.lsf_batch_system_factory()
toil.batchSystems.registry.single_machine_batch_system_factory()
toil.batchSystems.registry.mesos_batch_system_factory()
toil.batchSystems.registry.slurm_batch_system_factory()
toil.batchSystems.registry.torque_batch_system_factory()
toil.batchSystems.registry.htcondor_batch_system_factory()
toil.batchSystems.registry.kubernetes_batch_system_factory()
toil.batchSystems.registry.__getattr__(name)
Implement a fallback attribute getter to handle deprecated constants.
See < https://stackoverflow.com/a/48242860 >.
toil.batchSystems.registry.addBatchSystemFactory(key,
batchSystemFactory)
Deprecated method to add a
batch system.
Parameters
|
• |
key ( str ) |
||
|
• |
batchSystemFactory ( Callable[[], type[- toil.batchSystems.abstractBatchSystem.AbstractBatchSystem]] ) |
toil.batchSystems.registry.save_batch_system_plugin_state()
Return a snapshot of the plugin
registry that can be restored to remove added plugins.
Useful for testing the plugin system in-process with other
tests.
Return type
tuple [ list [ str ], dict [ str , Callable[[], type [- toil.batchSystems.abstractBatchSystem.AbstractBatchSystem ]]]]
toil.batchSystems.registry.restore_batch_system_plugin_state(snapshot)
Restore the batch system
registry state to a snapshot from
save_batch_system_plugin_state().
Parameters
snapshot ( tuple[list[str], dict[str, Callable[[], type[- toil.batchSystems.abstractBatchSystem.AbstractBatchSystem]]]] )
toil.batchSystems.singleMachine
Attributes
Classes
Module Contents
toil.batchSystems.singleMachine.logger
class
toil.batchSystems.singleMachine.SingleMachineBatchSystem(config,
maxCores, maxMemory, maxDisk, max_jobs=None)
Bases: toil.batchSystems.abstractBatchSystem.BatchSystemSupport
The interface for running jobs on a single machine, runs all the jobs you give it as they come in, but in parallel.
Uses a single "daddy" thread to manage a fleet of child processes.
Communication with the daddy thread happens via two queues: one queue of jobs waiting to be run (the input queue), and one queue of jobs that are finished/stopped and need to be returned by getUpdatedBatchJob (the output queue).
When the batch system is shut down, the daddy thread is stopped.
If running in
debug-worker mode, jobs are run immediately as they are sent
to the batch system, in the sending thread, and the daddy
thread is not run. But the queues are still used.
Parameters
|
• |
config ( toil.common.Config ) |
|||
|
• |
maxCores ( float ) |
|||
|
• |
maxMemory ( int ) |
|||
|
• |
maxDisk ( int ) |
|||
|
• |
max_jobs ( Optional[int] ) |
classmethod supportsAutoDeployment()
Whether this batch system supports auto-deployment of the user script itself.
If it does, the setUserScript() can be invoked to set the resource object representing the user script.
Note to implementors: If your implementation returns True here, it should also override
classmethod supportsWorkerCleanup()
Whether this batch system supports worker cleanup.
Indicates whether this batch system invokes BatchSystemSupport.workerCleanup() after the last job for a particular workflow invocation finishes. Note that the term worker refers to an entire node, not just a worker process. A worker process may run more than one job sequentially, and more than one concurrent worker process may exist on a worker node, for the same workflow. The batch system is said to shut down after the last worker process terminates.
numCores
minCores = 0.1
The minimal fractional CPU. Tasks with a smaller core requirement will be rounded up to this value.
physicalMemory
|
config |
physicalDisk
|
scale |
debugWorker
jobIndex = 1
jobIndexLock
jobs:
dict
[
int
,
toil.job.JobDescription
]
inputQueue
outputQueue
runningJobs:
dict
[
int
,
Info
]
children:
dict
[
int
,
subprocess.Popen
]
childToJob:
dict
[
int
,
str
]
accelerator_identities
resource_sources
schedulingStatusMessage = None
shuttingDown
daddyThread = None
daddyException:
Exception
|
None
= None
daddy()
Be the "daddy" thread.
Our job is to look at jobs from the input queue.
If a job fits in the available resources, we allocate resources for it and kick off a child process.
We also check on our children.
When a child finishes, we reap it, release its resources, and put its information in the output queue.
getSchedulingStatusMessage()
Get a log message fragment for the user about anything that might be going wrong in the batch system, if available.
If no useful message is available, return None.
This can be
used to report what resource is the limiting factor when
scheduling jobs, for example. If the leader thinks the
workflow is stuck, the message can be displayed to the user
to help them diagnose why it might be stuck.
Returns
User-directed message about scheduling state.
check_resource_request(requirer)
Check resource request is not
greater than that available or allowed.
Parameters
|
• |
requirer ( toil.job.Requirer ) -- Object whose requirements are being checked |
||
|
• |
job_name ( str ) -- Name of the job being checked, for generating a useful error report. |
||
|
• |
detail ( str ) -- Batch-system-specific message to include in the error. |
||
|
Raises |
InsufficientSystemResources -- raised when a resource is requested in an amount greater than allowed
Return type
None
issueBatchJob(command, job_desc, job_environment=None)
Adds the command and resources
to a queue to be run.
Parameters
|
• |
command ( str ) |
|||
|
• |
job_desc ( toil.job.JobDescription ) |
|||
|
• |
job_environment ( Optional[dict[str, str]] ) |
Return type
int
killBatchJobs(jobIDs)
Kills jobs by ID.
Parameters
jobIDs ( list[int] )
Return type
None
getIssuedBatchJobIDs()
Just returns all the jobs that
have been run, but not yet returned as updated.
Return type
list [ int ]
getRunningBatchJobIDs()
Gets a map of jobs as job ID
numbers that are currently running (not just waiting) and
how long they have been running, in seconds.
Returns
dictionary with currently running job ID number keys and how many seconds they have been running as the value
Return type
dict [ int , float ]
shutdown()
Terminate cleanly and join
daddy thread.
Return type
None
getUpdatedBatchJob(maxWait)
Returns a tuple of a
no-longer-running job, the return value of its process, and
its runtime, or None.
Parameters
maxWait ( int )
Return type
Optional[- toil.batchSystems.abstractBatchSystem.UpdatedBatchJobInfo ]
classmethod add_options(parser)
If this batch system provides
any command line options, add them to the given parser.
Parameters
parser ( Union[argparse.ArgumentParser, argparse._ArgumentGroup] )
Return type
None
classmethod setOptions(setOption)
Process command line or
configuration options relevant to this batch system.
Parameters
setOption ( toil.batchSystems.options.OptionSetter ) -- A function with signature setOption(option_name, parsing_function=None, check_function=None, default=None, env=None) returning nothing, used to update run configuration as a side effect.
class
toil.batchSystems.singleMachine.Info(startTime, popen,
resources,
killIntended)
Record for a running job.
Stores the start time of the job, the Popen object representing its child (or None), the tuple of (coreFractions, memory, disk) it is using (or None), and whether the job is supposed to be being killed.
|
time |
||
|
popen |
resources
killIntended
toil.batchSystems.slurm
Attributes
Classes
Functions
Module Contents
toil.batchSystems.slurm.logger
toil.batchSystems.slurm.TERMINAL_STATES:
dict
[
str
,
toil.batchSystems.abstractBatchSystem.BatchJobExitReason
]
toil.batchSystems.slurm.NONTERMINAL_STATES:
set
[
str
]
toil.batchSystems.slurm.parse_slurm_time(slurm_time)
Parse a Slurm-style time duration like 7-00:00:00 to a number of seconds.
Raises
ValueError if not parseable.
Parameters
slurm_time ( str )
Return type
int
class
toil.batchSystems.slurm.SlurmBatchSystem(config, maxCores,
maxMemory, maxDisk)
Bases: toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineBatchSystem
A partial
implementation of BatchSystemSupport for batch systems run
on a standard HPC cluster. By default auto-deployment is not
implemented.
Parameters
|
• |
config ( toil.common.Config ) |
|||
|
• |
maxCores ( float ) |
|||
|
• |
maxMemory ( int ) |
|||
|
• |
maxDisk ( int ) |
class PartitionInfo
Bases:
NamedTuple
partition_name:
str
gres:
bool
time_limit:
float
priority:
int
cpus:
str
memory:
str
class PartitionSet
Set of available partitions
detected on the slurm batch system
default_gpu_partition:
SlurmBatchSystem
|
None
all_partitions:
list
[
SlurmBatchSystem
]
gpu_partitions:
set
[
str
]
get_partition(time_limit)
Get the partition name to use
for a job with the given time limit.
Parameters
time_limit ( float | None )
Return type
str | None
class
GridEngineThread(newJobsQueue, updatedJobsQueue,
killQueue, killedJobsQueue, boss)
Bases: toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineBatchSystem.GridEngineThread
A class that represents a thread of control.
This class can
be safely subclassed in a limited fashion. There are two
ways to specify the activity: by passing a callable object
to the constructor, or by overriding the run() method in a
subclass.
Parameters
|
• |
newJobsQueue ( queue.Queue ) |
|||
|
• |
updatedJobsQueue ( queue.Queue ) |
|||
|
• |
killQueue ( queue.Queue ) |
|||
|
• |
killedJobsQueue ( queue.Queue ) |
|||
|
• |
boss ( AbstractGridEngineBatchSystem ) |
boss:
SlurmBatchSystem
getRunningJobIDs()
Get a list of running job IDs.
Implementation-specific; called by boss
AbstractGridEngineBatchSystem implementation via
AbstractGridEngineBatchSystem.getRunningBatchJobIDs()
Return type
list
killJob(jobID)
Kill specific job with the Toil
job ID. Implementation-specific; called by
GridEngineThread.killJobs()
Parameters
jobID ( string ) -- Toil job ID
Return type
None
prepareSubmission(cpu,
memory, jobID, command, jobName,
job_environment=None, gpus=None)
Preparation in putting together a command-line string for submitting to batch system (via submitJob().)
|
Param |
int cpu |
||
|
Param |
int memory |
||
|
Param |
int jobID: Toil job ID |
||
|
Param |
string subLine: the command line string to be called |
||
|
Param |
string jobName: the name of the Toil job, to provide metadata to batch systems if desired |
||
|
Param |
dict job_environment: the environment variables to be set on the worker |
Return type
List[ str ]
Parameters
|
• |
cpu ( int ) |
|||
|
• |
memory ( int ) |
|||
|
• |
jobID ( int ) |
|||
|
• |
command ( str ) |
|||
|
• |
jobName ( str ) |
|||
|
• |
job_environment ( dict[str, str] | None ) |
|||
|
• |
gpus ( int | None ) |
submitJob(subLine)
Wrapper routine for submitting the actual command-line call, then processing the output to get the batch system job ID
|
Param |
string subLine: the literal command line string to be called |
Return type
string: batch system job ID, which will be stored internally
Parameters
subLine ( list[str] )
coalesce_job_exit_codes(batch_job_id_list)
Collect all job exit codes in a
single call. :param batch_job_id_list: list of Job ID
strings, where each string has the form
"<job>[.<task>]". :return: list of job
exit codes or exit code, exit reason pairs associated with
the list of job IDs.
Parameters
batch_job_id_list ( list[str] )
Return type
list [ int | tuple [ int , toil.batchSystems.abstractBatchSystem.BatchJobExitReason | None] | None]
getJobExitCode(batchJobID)
Get job exit code for given
batch job ID. :param batchJobID: string of the form
"<job>[.<task>]". :return: integer job
exit code.
Parameters
batchJobID ( str )
Return type
int | tuple [ int , toil.batchSystems.abstractBatchSystem.BatchJobExitReason | None] | None
prepareSbatch(cpu, mem,
jobID, jobName, job_environment,
gpus)
Returns the sbatch command line
to run to queue the job.
Parameters
|
• |
cpu ( int ) |
|||
|
• |
mem ( int ) |
|||
|
• |
jobID ( int ) |
|||
|
• |
jobName ( str ) |
|||
|
• |
job_environment ( dict[str, str] | None ) |
|||
|
• |
gpus ( int | None ) |
Return type
list [ str ]
partitions
issueBatchJob(command, job_desc,
job_environment=None)
Issues a job with the specified
command to the batch system and returns a unique job ID
number.
Parameters
|
• |
command ( str ) -- the command to execute somewhere to run the Toil worker process |
||
|
• |
job_desc ( toil.job.JobDescription ) -- the JobDescription for the job being run |
||
|
• |
job_environment ( dict[str, str] | None ) -- a collection of job-specific environment variables to be set on the worker. |
Returns
a unique job ID number that can be used to reference the newly issued job
Return type
int
classmethod add_options(parser)
If this batch system provides
any command line options, add them to the given parser.
Parameters
parser ( argparse.ArgumentParser | argparse._ArgumentGroup )
Return type
None
OptionType
classmethod setOptions(setOption)
Process command line or
configuration options relevant to this batch system.
Parameters
setOption ( toil.batchSystems.options.OptionSetter ) -- A function with signature setOption(option_name, parsing_function=None, check_function=None, default=None, env=None) returning nothing, used to update run configuration as a side effect.
Return type
None
toil.batchSystems.torque
Attributes
Classes
Module Contents
toil.batchSystems.torque.logger
class toil.batchSystems.torque.TorqueBatchSystem(config,
maxCores,
maxMemory, maxDisk)
Bases: toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineBatchSystem
A partial
implementation of BatchSystemSupport for batch systems run
on a standard HPC cluster. By default auto-deployment is not
implemented.
Parameters
|
• |
config ( toil.common.Config ) |
|||
|
• |
maxCores ( float ) |
|||
|
• |
maxMemory ( int ) |
|||
|
• |
maxDisk ( int ) |
class
GridEngineThread(newJobsQueue, updatedJobsQueue,
killQueue, killedJobsQueue, boss)
Bases: toil.batchSystems.abstractGridEngineBatchSystem.AbstractGridEngineBatchSystem.GridEngineThread
A class that represents a thread of control.
This class can
be safely subclassed in a limited fashion. There are two
ways to specify the activity: by passing a callable object
to the constructor, or by overriding the run() method in a
subclass.
getRunningJobIDs()
Get a list of running job IDs.
Implementation-specific; called by boss
AbstractGridEngineBatchSystem implementation via
AbstractGridEngineBatchSystem.getRunningBatchJobIDs()
Return type
list
getUpdatedBatchJob(maxWait)
killJob(jobID)
Kill specific job with the Toil
job ID. Implementation-specific; called by
GridEngineThread.killJobs()
Parameters
jobID ( string ) -- Toil job ID
prepareSubmission(cpu,
memory, jobID, command, jobName,
job_environment=None, gpus=None)
Preparation in putting together a command-line string for submitting to batch system (via submitJob().)
|
Param |
int cpu |
||
|
Param |
int memory |
||
|
Param |
int jobID: Toil job ID |
||
|
Param |
string subLine: the command line string to be called |
||
|
Param |
string jobName: the name of the Toil job, to provide metadata to batch systems if desired |
||
|
Param |
dict job_environment: the environment variables to be set on the worker |
Return type
List[ str ]
Parameters
|
• |
cpu ( int ) |
||
|
• |
memory ( int ) |
||
|
• |
jobID ( int ) |
||
|
• |
command ( str ) |
||
|
• |
jobName ( str ) |
||
|
• |
job_environment ( Optional[dict[str, str]] ) |
||
|
• |
gpus ( Optional[int] ) |
submitJob(subLine)
Wrapper routine for submitting the actual command-line call, then processing the output to get the batch system job ID
|
Param |
string subLine: the literal command line string to be called |
Return type
string: batch system job ID, which will be stored internally
getJobExitCode(torqueJobID)
Returns job exit code and possibly an instance of abstractBatchSystem.BatchJobExitReason.
Returns None if the job is still running.
If the job is not running but the exit code is not available, it will be EXIT_STATUS_UNAVAILABLE_VALUE. Implementation-specific; called by GridEngineThread.checkOnJobs().
The exit code
will only be 0 if the job affirmatively succeeded.
Parameters
batchjobID ( string ) -- batch system job ID
prepareQsub(cpu, mem, jobID, job_environment)
Parameters
|
• |
cpu ( int ) |
||
|
• |
mem ( int ) |
||
|
• |
jobID ( int ) |
||
|
• |
job_environment ( Optional[dict[str, str]] ) |
Return type
list [ str ]
generateTorqueWrapper(command, jobID)
A very simple script generator that just wraps the command given; for now this goes to default tempdir
Exceptions
Package Contents
exception toil.batchSystems.DeadlockException(msg)
Bases: Exception
Exception thrown by the Leader or BatchSystem when a deadlock is encountered due to insufficient resources to run the workflow
|
msg |
__str__()
Stringify the exception, including the message.
toil.bus
Message types and message bus for leader component coordination.
Historically, the Toil Leader has been organized around functions calling other functions to "handle" different things happening. Over time, it has become very brittle: exactly the right handling functions need to be called in exactly the right order, or it gets confused and does the wrong thing.
The MessageBus is meant to let the leader avoid this by more loosely coupling its components together, by having them communicate by sending messages instead of by calling functions.
When events occur (like a job coming back from the batch system with a failed exit status), this will be translated into a message that will be sent to the bus. Then, all the leader components that need to react to this message in some way (by, say, decrementing the retry count) would listen for the relevant messages on the bus and react to them. If a new component needs to be added, it can be plugged into the message bus and receive and react to messages without interfering with existing components' ability to react to the same messages.
Eventually, the different aspects of the Leader could become separate objects.
By default, messages stay entirely within the Toil leader process, and are not persisted anywhere, not even in the JobStore.
The Message Bus also provides an extension point: its messages can be serialized to a file by the leader (see the --writeMessages option), and they can then be decoded using MessageBus.scan_bus_messages() (as is done in the Toil WES server backend). By replaying the messages and tracking their effects on job state, you can get an up-to-date view of the state of the jobs in a workflow. This includes information, such as whether jobs are issued or running, or what jobs have completely finished, which is not persisted in the JobStore.
The MessageBus instance for the leader process is owned by the Toil leader, but the BatchSystem has an opportunity to connect to it, and can send (or listen for) messages. Right now the BatchSystem deos not have to send or receive any messages; the Leader is responsible for polling it via the BatchSystem API and generating the events. But a BatchSystem implementation may send additional events (like JobAnnotationMessage).
Currently, the MessageBus is implemented using pypubsub, and so messages are always handled in a single Thread, the Toil leader's main loop thread. If other components send events, they will be shipped over to that thread inside the MessageBus. Communication between processes is allowed using MessageBus.connect_output_file() and MessageBus.scan_bus_messages().
Attributes
Classes
Functions
Module Contents
toil.bus.logger
class toil.bus.Names
Bases: NamedTuple
Stores all the
kinds of name a job can have.
job_name:
str
unit_name:
str
display_name:
str
stats_name:
str
job_store_id:
str
toil.bus.get_job_kind(names)
Return an identifying string for the job.
The result may
contain spaces.
Returns: Either the unit name, job name, or display name,
which
identifies
the kind of job it is to toil. Otherwise "Unknown Job" in case no identifier is available
Parameters
names ( Names )
Return type
str
class toil.bus.JobIssuedMessage
Bases: NamedTuple
Produced when a
job is issued to run on the batch system.
job_type:
str
job_id:
str
toil_batch_id:
int
class toil.bus.JobUpdatedMessage
Bases: NamedTuple
Produced when a
job is "updated" and ready to have something
happen to it.
job_id:
str
result_status:
int
class toil.bus.JobCompletedMessage
Bases: NamedTuple
Produced when a
job is completed, whether successful or not.
job_type:
str
job_id:
str
exit_code:
int
class toil.bus.JobFailedMessage
Bases: NamedTuple
Produced when a
job is completely failed, and will not be retried again.
job_type:
str
job_id:
str
class toil.bus.JobMissingMessage
Bases: NamedTuple
Produced when a
job goes missing and should be in the batch system but
isn't.
job_id:
str
class toil.bus.JobAnnotationMessage
Bases: NamedTuple
Produced when
extra information (such as an AWS Batch job ID from the
AWSBatchBatchSystem) is available that goes with a job.
job_id:
str
annotation_name:
str
annotation_value:
str
class toil.bus.ExternalBatchIdMessage
Bases: NamedTuple
Produced when
using a batch system, links toil assigned batch ID to Batch
system ID (Whatever's returned by local implementation, PID,
batch ID, etc)
toil_batch_id:
int
external_batch_id:
str
batch_system:
str
class toil.bus.QueueSizeMessage
Bases: NamedTuple
Produced to
describe the size of the queue of jobs issued but not yet
completed. Theoretically recoverable from other messages.
queue_size:
int
class toil.bus.ClusterSizeMessage
Bases: NamedTuple
Produced by the
Toil-integrated autoscaler describe the number of instances
of a certain type in a cluster.
instance_type:
str
current_size:
int
class toil.bus.ClusterDesiredSizeMessage
Bases: NamedTuple
Produced by the
Toil-integrated autoscaler to describe the number of
instances of a certain type that it thinks will be needed.
instance_type:
str
desired_size:
int
toil.bus.message_to_bytes(message)
Convert a plain-old-data named
tuple into a byte string.
Parameters
message ( NamedTuple )
Return type
bytes
toil.bus.MessageType
toil.bus.bytes_to_message(message_type, data)
Convert bytes from
message_to_bytes back to a message of the given type.
Parameters
|
• |
message_type ( type[MessageType] ) |
|||
|
• |
data ( bytes ) |
Return type
MessageType
class toil.bus.MessageBus
Holds messages that should cause jobs to change their scheduling states. Messages are put in and buffered, and can be taken out and handled as batches when convenient.
All messages are NamedTuple objects of various subtypes.
Message order
is guaranteed to be preserved within a type.
publish(message)
Put a message onto the bus. Can
be called from any thread.
Parameters
message ( Any )
Return type
None
check()
If we are in the owning thread,
deliver any messages that are in the queue for us. Must be
called every once in a while in the main thread, possibly
through inbox objects.
Return type
None
MessageType
subscribe(message_type, handler)
Register the given callable to
be called when messages of the given type are sent. It will
be called with messages sent after the subscription is
created. Returns a subscription object; when the
subscription object is GC'd the subscription will end.
Parameters
|
• |
message_type ( type[MessageType] ) |
|||
|
• |
handler ( Callable[[MessageType], Any] ) |
Return type
pubsub.core.listener.Listener
connect(wanted_types)
Get a connection object that
serves as an inbox for messages of the given types. Messages
of those types will accumulate in the inbox until it is
destroyed. You can check for them at any time.
Parameters
wanted_types ( list[type] )
Return type
MessageBusConnection
outbox()
Get a connection object that
only allows sending messages.
Return type
MessageOutbox
connect_output_file(file_path)
Send copies of all messages to the given output file.
Returns
connection data which must be kept alive for the connection
to persist. That data is opaque: the user is not supposed to
look at it or touch it or do anything with it other than
store it somewhere or delete it.
Parameters
file_path ( str )
Return type
Any
classmethod scan_bus_messages(stream, message_types)
Get an iterator over all
messages in the given log stream of the given types, in
order. Discard any trailing partial messages.
Parameters
|
• |
stream ( IO[bytes] ) |
|||
|
• |
message_types ( list[type[NamedTuple]] ) |
Return type
collections.abc.Iterator [Any]
class toil.bus.MessageBusClient
Base class for clients (inboxes and outboxes) of a message bus. Handles keeping a reference to the message bus.
class toil.bus.MessageInbox
Bases: MessageBusClient
A buffered
connection to a message bus that lets us receive messages.
Buffers incoming messages until you are ready for them. Does
not preserve ordering between messages of different types.
count(message_type)
Get the number of pending
messages of the given type.
Parameters
message_type ( type )
Return type
int
empty()
Return True if no messages are
pending, and false otherwise.
Return type
bool
MessageType
for_each(message_type)
Loop over all messages currently pending of the given type. Each that is handled without raising an exception will be removed.
Messages sent
while this function is running will not be yielded by the
current call.
Parameters
message_type ( type[MessageType] )
Return type
collections.abc.Iterator [MessageType]
class toil.bus.MessageOutbox
Bases: MessageBusClient
A connection to
a message bus that lets us publish messages.
publish(message)
Publish the given message to the connected message bus.
We have this so
you don't need to store both the bus and your connection.
Parameters
message ( Any )
Return type
None
class toil.bus.MessageBusConnection
Bases: MessageInbox , MessageOutbox
A two-way connection to a message bus. Buffers incoming messages until you are ready for them, and lets you send messages.
class toil.bus.JobStatus
Records the status of a job.
When exit_code
is -1, this means the job is either not observed or
currently running.
job_store_id:
str
name:
str
exit_code:
int
annotations:
dict
[
str
,
str
]
toil_batch_id:
int
external_batch_id:
str
batch_system:
str
__repr__()
Return type
str
is_running()
Return type
bool
toil.bus.replay_message_bus(path)
Replay all the messages and work out what they mean for jobs.
We track the state and name of jobs here, by ID. We would use a list of two items but MyPy can't understand a list of items of multiple types, so we need to define a new class.
Returns a
dictionary from the job_id to a dataclass, JobStatus. A
JobStatus contains information about a job which we have
gathered from the message bus, including the job store id,
name of the job the exit code, any associated annotations,
the toil batch id the external batch id, and the batch
system on which the job is running.
Parameters
path ( str )
Return type
dict [ str , JobStatus ]
toil.bus.gen_message_bus_path(tmpdir=None)
Return a file path in tmp to store the message bus at. Calling function is responsible for cleaning the generated file.
The tmpdir
argument will override the directory that the message bus
will be made in. If not provided, the standard tempfile
order will be used.
Parameters
tmpdir ( Optional[str] )
Return type
str
toil.common
Attributes
Exceptions
Classes
Functions
Module Contents
toil.common.UUID_LENGTH
= 32
toil.common.logger
toil.common.TOIL_HOME_DIR:
str
toil.common.DEFAULT_CONFIG_FILE:
str
class toil.common.Config
Class to represent
configuration operations for a toil workflow run.
logFile:
str
|
None
logRotating:
bool
cleanWorkDir:
str
max_jobs:
int
max_local_jobs:
int
manualMemArgs:
bool
run_local_jobs_on_workers:
bool
coalesceStatusCalls:
bool
mesos_endpoint:
str
|
None
mesos_framework_id:
str
|
None
mesos_role:
str
|
None
mesos_name:
str
kubernetes_host_path:
str
|
None
kubernetes_owner:
str
|
None
kubernetes_service_account:
str
|
None
kubernetes_pod_timeout:
float
kubernetes_privileged:
bool
tes_endpoint:
str
tes_user:
str
tes_password:
str
tes_bearer_token:
str
aws_batch_region:
str
|
None
aws_batch_queue:
str
|
None
aws_batch_job_role_arn:
str
|
None
scale:
float
batchSystem:
str
batch_logs_dir:
str
|
None
The backing scheduler will be instructed, if possible, to save logs to this directory, where the leader can read them.
statePollingWait:
int
state_polling_timeout:
int
disableAutoDeployment:
bool
workflowID:
str
|
None
This attribute uniquely identifies the job store and therefore the workflow. It is necessary in order to distinguish between two consecutive workflows for which self.jobStore is the same, e.g. when a job store name is reused after a previous run has finished successfully and its job store has been clean up.
workflowAttemptNumber:
int
jobStore:
str
logLevel:
str
colored_logs:
bool
workDir:
str
|
None
coordination_dir:
str
|
None
noStdOutErr:
bool
stats:
bool
clean:
str
|
None
clusterStats:
str
restart:
bool
caching:
bool
|
None
symlinkImports:
bool
moveOutputs:
bool
symlink_job_store_reads:
bool
provisioner:
str
|
None
nodeTypes:
list
[
tuple
[
set
[
str
],
float
|
None
]]
minNodes:
list
[
int
]
maxNodes:
list
[
int
]
targetTime:
float
betaInertia:
float
scaleInterval:
int
preemptibleCompensation:
float
nodeStorage:
int
nodeStorageOverrides:
list
[
str
]
metrics:
bool
assume_zero_overhead:
bool
maxPreemptibleServiceJobs:
int
maxServiceJobs:
int
deadlockWait:
float
|
int
deadlockCheckInterval:
float
|
int
defaultMemory:
int
defaultCores:
float
|
int
defaultDisk:
int
defaultPreemptible:
bool
defaultAccelerators:
list
[
toil.job.AcceleratorRequirement
]
maxCores:
int
maxMemory:
int
maxDisk:
int
retryCount:
int
enableUnlimitedPreemptibleRetries:
bool
doubleMem:
bool
maxJobDuration:
int
rescueJobsFrequency:
int
job_store_timeout:
float
maxLogFileSize:
int
writeLogs:
str
writeLogsGzip:
str
writeLogsFromAllJobs:
bool
write_messages:
str
|
None
realTimeLogging:
bool
environment:
dict
[
str
,
str
]
disableChaining:
bool
disableJobStoreChecksumVerification:
bool
sseKey:
str
|
None
servicePollingInterval:
int
useAsync:
bool
forceDockerAppliance:
bool
statusWait:
int
disableProgress:
bool
readGlobalFileMutableByDefault:
bool
debugWorker:
bool
disableWorkerOutputCapture:
bool
badWorker:
float
badWorkerFailInterval:
float
kill_polling_interval:
int
cwl:
bool
memory_is_product:
bool
set_from_default_config()
Return type
None
prepare_start()
After options are set, prepare
for initial start of workflow.
Return type
None
prepare_restart()
Before restart options are set,
prepare for a restart of a workflow. Set up any
execution-specific parameters and clear out any stale ones.
Return type
None
setOptions(options)
Creates a config object from
the options object.
Parameters
options ( argparse.Namespace )
Return type
None
check_configuration_consistency()
Old checks that cannot be fit
into an action class for argparse
Return type
None
__eq__(other)
Parameters
other ( object )
Return type
bool
__hash__()
Return type
int
toil.common.check_and_create_toil_home_dir()
Ensure that TOIL_HOME_DIR exists.
Raises an error
if it does not exist and cannot be created. Safe to run
simultaneously in multiple processes.
Return type
None
toil.common.check_and_create_default_config_file()
If the default config file does not exist, create it in the Toil home directory. Create the Toil home directory if needed
Raises an error if the default config file cannot be created. Safe to run simultaneously in multiple processes. If this process runs this function, it will always see the default config file existing with parseable contents, even if other processes are racing to create it.
No process will
see an empty or partially-written default config file.
Return type
None
toil.common.check_and_create_config_file(filepath)
If the config file at the
filepath does not exist, try creating it. The parent
directory should be created prior to calling this :param
filepath: path to config file :return: None
Parameters
filepath ( str )
Return type
None
toil.common.generate_config(filepath)
Write a Toil config file to the given path.
Safe to run simultaneously in multiple processes. No process will see an empty or partially-written file at the given path.
Set include to
"cwl" or "wdl" to include cwl options
and wdl options respectfully
Parameters
filepath ( str )
Return type
None
toil.common.parser_with_common_options(provisioner_options=False,
jobstore_option=True, prog=None,
default_log_level=None)
Parameters
|
• |
provisioner_options ( bool ) |
|||
|
• |
jobstore_option ( bool ) |
|||
|
• |
prog ( Optional[str] ) |
|||
|
• |
default_log_level ( Optional[int] ) |
Return type
configargparse.ArgParser
toil.common.addOptions(parser,
jobstore_as_flag=False, cwl=False,
wdl=False)
Add all Toil command line options to a parser.
Support for
config files if using configargparse. This will also check
and set up the default config file.
Parameters
|
• |
jobstore_as_flag ( bool ) -- make the job store option a --jobStore flag instead of a required jobStore positional argument. |
||
|
• |
cwl ( bool ) -- Whether CWL options are expected. If so, CWL options won't be suppressed. |
||
|
• |
wdl ( bool ) -- Whether WDL options are expected. If so, WDL options won't be suppressed. |
||
|
• |
parser ( argparse.ArgumentParser ) |
Return type
None
toil.common.getNodeID()
Return unique ID of the current node (host). The resulting string will be convertible to a uuid.UUID.
Tries several methods until success. The returned ID should be identical across calls from different processes on the same node at least until the next OS reboot.
The last resort
method is uuid.getnode() that in some rare OS configurations
may return a random ID each time it is called. However, this
method should never be reached on a Linux system, because
reading from /proc/sys/kernel/random/boot_id will be tried
prior to that. If uuid.getnode() is reached, it will be
called twice, and exception raised if the values are not
identical.
Return type
str
class toil.common.Toil(options)
Bases: ContextManager [ Toil ]
A context manager that represents a Toil workflow.
Specifically
the batch system, job store, and its configuration.
Parameters
options ( argparse.Namespace )
config:
Config
options
__enter__()
Derive configuration from the command line options.
Then load the
job store and, on restart, consolidate the derived
configuration with the one from the previous invocation of
the workflow.
Return type
Toil
__exit__(exc_type, exc_val, exc_tb)
Clean up after a workflow invocation.
Depending on
the configuration, delete the job store.
Parameters
|
• |
exc_type ( Optional[type[BaseException]] ) |
|||
|
• |
exc_val ( Optional[BaseException] ) |
|||
|
• |
exc_tb ( Optional[types.TracebackType] ) |
Return type
Literal[False]
start(rootJob)
Invoke a Toil workflow with the given job as the root for an initial run.
This method
must be called in the body of a
with Toil(...) as
toil:
statement. This method should not be called more
than once for a workflow that has not finished.
Parameters
rootJob ( toil.job.Job ) -- The root job of the workflow
Returns
The root job's return value
Return type
Any
restart()
Restarts a workflow that has
been interrupted.
Returns
The root job's return value
Return type
Any
classmethod getJobStore(locator)
Create an instance of the
concrete job store implementation that matches the given
locator.
Parameters
locator ( str ) -- The location of the job store to be represent by the instance
Returns
an instance of a concrete subclass of AbstractJobStore
Return type
toil.jobStores.abstractJobStore.AbstractJobStore
static parseLocator(locator)
Parameters
locator ( str )
Return type
tuple [ str , str ]
static buildLocator(name, rest)
Parameters
|
• |
name ( str ) |
|||
|
• |
rest ( str ) |
Return type
str
classmethod resumeJobStore(locator)
Parameters
locator ( str )
Return type
toil.jobStores.abstractJobStore.AbstractJobStore
static createBatchSystem(config)
Create an instance of the batch
system specified in the given config.
Parameters
config ( Config ) -- the current configuration
Returns
an instance of a concrete subclass of AbstractBatchSystem
Return type
toil.batchSystems.abstractBatchSystem.AbstractBatchSystem
url_exists(src_uri)
Parameters
src_uri ( str )
Return type
bool
importFile(srcUrl:
str
, sharedFileName:
str
,
symlink:
bool
=
True) ->
None
importFile(srcUrl:
str
, sharedFileName:
None
= None, symlink:
bool
= True) ->
toil.fileStores.FileID
import_file(src_uri:
str
,
shared_file_name:
str
, symlink:
bool
=
True, check_existence:
bool
= True) ->
None
import_file(src_uri:
str
,
shared_file_name:
None
= None,
symlink:
bool
= True, check_existence:
Literal[True] = True) ->
toil.fileStores.FileID
import_file(src_uri:
str
,
shared_file_name:
None
= None,
symlink:
bool
= True, check_existence:
bool
= True) ->
toil.fileStores.FileID
|
None
Import the file at the given URL into the job store.
By default,
returns None if the file does not exist.
Parameters
check_existence -- If true, raise FileNotFoundError if the file does not exist. If false, return None when the file does not exist.
See toil.jobStores.abstractJobStore.AbstractJobStore.importFile() for a full description
exportFile(jobStoreFileID, dstUrl)
Parameters
|
• |
jobStoreFileID ( toil.fileStores.FileID ) |
|||
|
• |
dstUrl ( str ) |
Return type
None
export_file(file_id, dst_uri)
Export file to destination pointed at by the destination URL.
See
toil.jobStores.abstractJobStore.AbstractJobStore.exportFile()
for a full description
Parameters
|
• |
file_id ( toil.fileStores.FileID ) |
|||
|
• |
dst_uri ( str ) |
Return type
None
static normalize_uri(uri, check_existence=False)
Given a URI, if it has no
scheme, prepend "file:".
Parameters
|
• |
check_existence ( bool ) -- If set, raise FileNotFoundError if a URI points to a local file that does not exist. |
||
|
• |
uri ( str ) |
Return type
str
static getToilWorkDir(configWorkDir=None)
Return a path to a writable directory under which per-workflow directories exist.
This directory
is always required to exist on a machine, even if the Toil
worker has not run yet. If your workers and leader have
different temp directories, you may need to set
TOIL_WORKDIR.
Parameters
configWorkDir ( Optional[str] ) -- Value passed to the program using the --workDir flag
Returns
Path to the Toil work directory, constant across all machines
Return type
str
classmethod
get_toil_coordination_dir(config_work_dir,
config_coordination_dir)
Return a path to a writable
directory, which will be in memory if convenient. Ought to
be used for file locking and coordination.
Parameters
|
• |
config_work_dir ( Optional[str] ) -- Value passed to the program using the --workDir flag |
||
|
• |
config_coordination_dir ( Optional[str] ) -- Value passed to the program using the --coordinationDir flag |
||
|
• |
workflow_id -- Used if a tmpdir_prefix exists to create full directory paths unique per workflow |
Returns
Path to the Toil coordination directory. Ought to be on a POSIX filesystem that allows directories containing open files to be deleted.
Return type
str
static get_workflow_path_component(workflow_id)
Get a safe filesystem path component for a workflow.
Will be
consistent for all processes on a given machine, and
different for all processes on different machines.
Parameters
workflow_id ( str ) -- The ID of the current Toil workflow.
Return type
str
classmethod getLocalWorkflowDir(workflowID, configWorkDir=None)
Return the directory where
worker directories and the cache will be located for this
workflow on this machine.
Parameters
|
• |
configWorkDir ( Optional[str] ) -- Value passed to the program using the --workDir flag |
||
|
• |
workflowID ( str ) |
Returns
Path to the local workflow directory on this machine
Return type
str
classmethod
get_local_workflow_coordination_dir(workflow_id,
config_work_dir, config_coordination_dir)
Return the directory where coordination files should be located for this workflow on this machine. These include internal Toil databases and lock files for the machine.
If an in-memory
filesystem is available, it is used. Otherwise, the local
workflow directory, which may be on a shared network
filesystem, is used.
Parameters
|
• |
workflow_id ( str ) -- Unique ID of the current workflow. |
||
|
• |
config_work_dir ( Optional[str] ) -- Value used for the work directory in the current Toil Config. |
||
|
• |
config_coordination_dir ( Optional[str] ) -- Value used for the coordination directory in the current Toil Config. |
Returns
Path to the local workflow coordination directory on this machine.
Return type
str
exception toil.common.ToilRestartException(message)
Bases: Exception
Common base
class for all non-exit exceptions.
Parameters
message ( str )
exception toil.common.ToilContextManagerException
Bases: Exception
Common base class for all non-exit exceptions.
class toil.common.ToilMetrics(bus, provisioner=None)
Parameters
|
• |
bus ( toil.bus.MessageBus ) |
||
|
• |
provisioner ( Optional[- toil.provisioners.abstractProvisioner.AbstractProvisioner] ) |
mtailImage
grafanaImage
prometheusImage
nodeExporterProc:
subprocess.Popen
[
bytes
] |
None
= None
startDashboard(clusterName, zone)
Parameters
|
• |
clusterName ( str ) |
|||
|
• |
zone ( str ) |
Return type
None
add_prometheus_data_source()
Return type
None
log(message)
Parameters
message ( str )
Return type
None
logClusterSize(m)
Parameters
m ( toil.bus.ClusterSizeMessage )
Return type
None
logClusterDesiredSize(m)
Parameters
m ( toil.bus.ClusterDesiredSizeMessage )
Return type
None
logQueueSize(m)
Parameters
m ( toil.bus.QueueSizeMessage )
Return type
None
logMissingJob(m)
Parameters
m ( toil.bus.JobMissingMessage )
Return type
None
logIssuedJob(m)
Parameters
m ( toil.bus.JobIssuedMessage )
Return type
None
logFailedJob(m)
Parameters
m ( toil.bus.JobFailedMessage )
Return type
None
logCompletedJob(m)
Parameters
m ( toil.bus.JobCompletedMessage )
Return type
None
shutdown()
Return type
None
toil.common.cacheDirName(workflowID)
Returns
Name of the cache directory.
Parameters
workflowID ( str )
Return type
str
toil.common.getDirSizeRecursively(dirPath)
This method will return the cumulative number of bytes occupied by the files on disk in the directory and its subdirectories.
If the method
is unable to access a file or directory (due to insufficient
permissions, or due to the file or directory having been
removed while this function was attempting to traverse it),
the error will be handled internally, and a (possibly 0)
lower bound on the size of the directory will be returned.
Parameters
dirPath ( str ) -- A valid path to a directory or file.
Returns
Total size, in bytes, of the file or directory at dirPath.
Return type
int
toil.common.getFileSystemSize(dirPath)
Return the free space, and
total size of the file system hosting
dirPath
.
Parameters
dirPath ( str ) -- A valid path to a directory.
Returns
free space and total size of file system
Return type
tuple [ int , int ]
toil.common.safeUnpickleFromStream(stream)
Parameters
stream ( IO[Any] )
Return type
Any
toil.cwl
Submodules
toil.cwl.conftest
Attributes
Module Contents
toil.cwl.conftest.collect_ignore = []
toil.cwl.cwltoil
Implemented support for Common Workflow Language (CWL) for Toil.
Attributes
Exceptions
Classes
Functions
Module Contents
toil.cwl.cwltoil.logger
toil.cwl.cwltoil.DEFAULT_TMPDIR
toil.cwl.cwltoil.DEFAULT_TMPDIR_PREFIX
toil.cwl.cwltoil.cwltoil_was_removed()
Complain about deprecated
entrypoint.
Return type
None
class toil.cwl.cwltoil.UnresolvedDict
Bases: dict [ Any , Any ]
Tag to indicate a dict contains promises that must be resolved.
class toil.cwl.cwltoil.SkipNull
Internal sentinel object.
Indicates a null value produced by each port of a skipped conditional step. The CWL 1.2 specification calls for treating this the exactly the same as a null value.
toil.cwl.cwltoil.filter_skip_null(name, value)
Recursively filter out SkipNull
objects from 'value'.
Parameters
|
• |
name ( str ) -- Name of port producing this value. Only used when we find an unhandled null from a conditional step and we print out a warning. The name allows the user to better localize which step/port was responsible for the unhandled null. |
||
|
• |
value ( Any ) -- port output value object |
Return type
Any
toil.cwl.cwltoil.ensure_no_collisions(directory, dir_description=None)
Make sure no items in the given CWL Directory have the same name.
If any do, raise a WorkflowException about a "File staging conflict".
Does not
recurse into subdirectories.
Parameters
|
• |
directory ( cwltool.utils.DirectoryType ) |
|||
|
• |
dir_description ( Optional[str] ) |
Return type
None
toil.cwl.cwltoil.try_prepull(cwl_tool_uri,
runtime_context,
batchsystem)
Try to prepull all containers
in a CWL workflow with Singularity or Docker. This will not
prepull the default container specified on the command line.
:param cwl_tool_uri: CWL workflow URL. Fragments are
accepted as well :param runtime_context: runtime context of
cwltool :param batchsystem: type of Toil batchsystem
:return:
Parameters
|
• |
cwl_tool_uri ( str ) |
|||
|
• |
runtime_context ( cwltool.context.RuntimeContext ) |
|||
|
• |
batchsystem ( str ) |
Return type
None
class
toil.cwl.cwltoil.Conditional(expression=None, outputs=None,
requirements=None, container_engine='docker')
Object holding conditional expression until we are ready to evaluate it.
Evaluation
occurs before the enclosing step's inputs are type-checked.
Parameters
|
• |
expression ( Optional[str] ) |
||
|
• |
outputs ( Union[dict[str, cwltool.utils.CWLOutputType], None] ) |
||
|
• |
requirements ( Optional[- list[cwltool.utils.CWLObjectType]] ) |
||
|
• |
container_engine ( str ) |
expression
outputs
requirements
container_engine
is_false(job)
Determine if expression
evaluates to False given completed step inputs.
Parameters
job ( cwltool.utils.CWLObjectType ) -- job output object
Returns
bool
Return type
bool
skipped_outputs()
Generate a dict of SkipNull
objects corresponding to the output structure.
Return type
dict [ str , SkipNull ]
class toil.cwl.cwltoil.ResolveSource(name, input, source_key, promises)
Apply linkMerge and pickValue
operators to values coming into a port.
Parameters
|
• |
name ( str ) |
|||
|
• |
input ( dict[str, cwltool.utils.CWLObjectType] ) |
|||
|
• |
source_key ( str ) |
|||
|
• |
promises ( dict[str, toil.job.Job] ) |
promise_tuples:
list
[
tuple
[
str
,
toil.job.Promise
]] |
tuple
[
str
,
toil.job.Promise
]
__repr__()
Allow for debug printing.
Return type
str
resolve()
First apply linkMerge then
pickValue if either present.
Return type
Any
link_merge(values)
Apply linkMerge operator to
values
object.
Parameters
values ( cwltool.utils.CWLObjectType ) -- result of step
Return type
Union[ list [cwltool.utils.CWLOutputType], cwltool.utils.CWLOutputType]
pick_value(values)
Apply pickValue operator to
values
object.
Parameters
values ( Union[list[Union[str, SkipNull]], Any] ) -- Intended to be a list, but other types will be returned without modification.
Returns
Return type
Any
class
toil.cwl.cwltoil.StepValueFrom(expr, source, req,
container_engine)
A workflow step input which has a valueFrom expression attached to it.
The valueFrom
expression will be evaluated to produce the actual input
object for the step.
Parameters
|
• |
expr ( str ) |
|||
|
• |
source ( Any ) |
|||
|
• |
req ( list[cwltool.utils.CWLObjectType] ) |
|||
|
• |
container_engine ( str ) |
|||
|
expr |
||||
|
source |
context = None
|
req |
container_engine
__repr__()
Allow for debug printing.
Return type
str
eval_prep(step_inputs, file_store)
Resolve the contents of any file in a set of inputs.
The inputs must be associated with the StepValueFrom object's self.source.
Called when
loadContents is specified.
Parameters
|
• |
step_inputs ( cwltool.utils.CWLObjectType ) -- Workflow step inputs. |
||
|
• |
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore ) -- A toil file store, needed to resolve toilfile:// paths. |
Return type
None
resolve()
Resolve the promise in the
valueFrom expression's context.
Returns
object that will serve as expression context
Return type
Any
do_eval(inputs)
Evaluate the valueFrom
expression with the given input object.
Parameters
inputs ( cwltool.utils.CWLObjectType )
Returns
object
Return type
Any
class toil.cwl.cwltoil.DefaultWithSource(default, source)
A workflow step input that has
both a source and a default value.
Parameters
|
• |
default ( Any ) |
|||
|
• |
source ( Any ) |
default
|
source |
__repr__()
Allow for debug printing.
Return type
str
resolve()
Determine the final input value when the time is right.
(when the
source can be resolved)
Returns
dict
Return type
Any
class toil.cwl.cwltoil.JustAValue(val)
A simple value masquerading as
a 'resolve'-able object.
Parameters
val ( Any )
|
val |
__repr__()
Allow for debug printing.
Return type
str
resolve()
Return the value.
Return type
Any
toil.cwl.cwltoil.resolve_dict_w_promises(dict_w_promises,
file_store=None)
Resolve a dictionary of
promises evaluate expressions to produce the actual values.
Parameters
|
• |
dict_w_promises ( Union[UnresolvedDict, cwltool.utils.CWLObjectType, dict[str, Union[str, StepValueFrom]]] ) -- input dict for these values |
||
|
• |
file_store ( Optional[- toil.fileStores.abstractFileStore.AbstractFileStore] ) |
Returns
dictionary of actual values
Return type
cwltool.utils.CWLObjectType
toil.cwl.cwltoil.simplify_list(maybe_list)
Turn a length one list loaded by cwltool into a scalar.
Anything else
is passed as-is, by reference.
Parameters
maybe_list ( Any )
Return type
Any
class
toil.cwl.cwltoil.ToilPathMapper(referenced_files, basedir,
stagedir, separateDirs=True, get_file=None,
stage_listing=False,
streaming_allowed=True)
Bases: cwltool.pathmapper.PathMapper
Keeps track of files in a Toil way.
Maps between
the symbolic identifier of a file (the Toil FileID), its
local path on the host (the value returned by
readGlobalFile) and the location of the file inside the
software container.
Parameters
|
• |
referenced_files ( list[cwltool.utils.CWLObjectType] ) |
|||
|
• |
basedir ( str ) |
|||
|
• |
stagedir ( str ) |
|||
|
• |
separateDirs ( bool ) |
|||
|
• |
get_file ( Union[Any, None] ) |
|||
|
• |
stage_listing ( bool ) |
|||
|
• |
streaming_allowed ( bool ) |
get_file
stage_listing
streaming_allowed
visit(obj, stagedir, basedir, copy=False,
staged=False)
Iterate over a CWL object, resolving File and Directory path references.
This is called
on each File or Directory CWL object. The Files and
Directories all have "location" fields. For the
Files, these are from upload_file(), and for the
Directories, these are from upload_directory() or cwltool
internally. With upload_directory(), they and their children
will be assigned locations based on listing the Directories
using ToilFsAccess. With cwltool, locations will be set as
absolute paths.
Parameters
|
• |
obj ( cwltool.utils.CWLObjectType ) -- The CWL File or Directory to process |
||
|
• |
stagedir ( str ) -- The base path for target paths to be generated under, except when a File or Directory has an overriding parent directory in dirname |
||
|
• |
basedir ( str ) -- The directory from which relative paths should be resolved; used as the base directory for the StdFsAccess that generated the listing being processed. |
||
|
• |
copy ( bool ) -- If set, use writable types for Files and Directories. |
||
|
• |
staged ( bool ) -- Starts as True at the top of the recursion. Set to False when entering a directory that we can actually download, so we don't stage files and subdirectories separately from the directory as a whole. Controls the staged flag on generated mappings, and therefore whether files and directories are actually placed at their mapped-to target locations. If stage_listing is True, we will leave this True throughout and stage everything. |
Return type
None
Produces one MapperEnt for every unique location for a File or Directory. These MapperEnt objects are instructions to cwltool's stage_files function: - https://github.com/common-workflow-language/cwltool/blob/a3e3a5720f7b0131fa4f9c0b3f73b62a347278a6/cwltool/process.py#L254
The MapperEnt has fields:
resolved: An absolute local path anywhere on the filesystem where the file/directory can be found, or the contents of a file to populate it with if type is CreateWritableFile or CreateFile. Or, a URI understood by the StdFsAccess in use (for example, toilfile:).
target: An absolute path under stagedir that the file or directory will then be placed at by cwltool. Except if a File or Directory has a dirname field, giving its parent path, that is used instead.
type: One of:
File: cwltool will copy or link the file from resolved to target, if possible.
CreateFile: cwltool will create the file at target, treating resolved as the contents.
WritableFile: cwltool will copy the file from resolved to target, making it writable.
CreateWritableFile: cwltool will create the file at target, treating resolved as the contents, and make it writable.
Directory: cwltool will copy or link the directory from resolved to target, if possible. Otherwise, cwltool will make the directory at target if resolved starts with "_:". Otherwise it will do nothing.
WritableDirectory: cwltool will copy the directory from resolved to target, if possible. Otherwise, cwltool will make the directory at target if resolved starts with "_:". Otherwise it will do nothing.
staged: if set to False, cwltool will not make or copy anything for this entry
class toil.cwl.cwltoil.ToilSingleJobExecutor
Bases: cwltool.executors.SingleJobExecutor
A SingleJobExecutor that does not assume it is at the top level of the workflow.
We need this
because otherwise every job thinks it is top level and tries
to discover secondary files, which may exist when they
haven't actually been passed at the top level and thus
aren't supposed to be visible.
run_jobs(process, job_order_object, logger,
runtime_context)
run_jobs from
SingleJobExecutor, but not in a top level runtime context.
Parameters
|
• |
process ( cwltool.process.Process ) |
||
|
• |
job_order_object ( cwltool.utils.CWLObjectType ) |
||
|
• |
logger ( logging.Logger ) |
||
|
• |
runtime_context ( cwltool.context.RuntimeContext ) |
Return type
None
class toil.cwl.cwltoil.ToilTool(*args, **kwargs)
Mixin to hook Toil into a
cwltool tool type.
Parameters
|
• |
args ( Any ) |
|||
|
• |
kwargs ( Any ) |
connect_toil_job(job)
Attach the Toil tool to the
Toil job that is executing it. This allows it to use the
Toil job to stop at certain points if debugging flags are
set.
Parameters
job ( toil.job.Job )
Return type
None
make_path_mapper(reffiles,
stagedir, runtimeContext,
separateDirs)
Create the appropriate
PathMapper for the situation.
Parameters
|
• |
reffiles ( list[Any] ) |
|||
|
• |
stagedir ( str ) |
|||
|
• |
runtimeContext ( cwltool.context.RuntimeContext ) |
|||
|
• |
separateDirs ( bool ) |
Return type
cwltool.pathmapper.PathMapper
__str__()
Return string representation of
this tool type.
Return type
str
class toil.cwl.cwltoil.ToilCommandLineTool(*args, **kwargs)
Bases: ToilTool , cwltool.command_line_tool.CommandLineTool
Subclass the
cwltool command line tool to provide the custom
ToilPathMapper.
Parameters
|
• |
args ( Any ) |
|||
|
• |
kwargs ( Any ) |
class toil.cwl.cwltoil.ToilExpressionTool(*args, **kwargs)
Bases: ToilTool , cwltool.command_line_tool.ExpressionTool
Subclass the
cwltool expression tool to provide the custom
ToilPathMapper.
Parameters
|
• |
args ( Any ) |
|||
|
• |
kwargs ( Any ) |
toil.cwl.cwltoil.toil_make_tool(toolpath_object, loadingContext)
Emit custom ToilCommandLineTools.
This factory
function is meant to be passed to cwltool.load_tool().
Parameters
|
• |
toolpath_object ( ruamel.yaml.comments.CommentedMap ) |
|||
|
• |
loadingContext ( cwltool.context.LoadingContext ) |
Return type
cwltool.process.Process
toil.cwl.cwltoil.MISSING_FILE
= 'missing://'
toil.cwl.cwltoil.DirectoryContents
toil.cwl.cwltoil.check_directory_dict_invariants(contents)
Make sure a directory structure dict makes sense. Throws an error otherwise.
Currently just
checks to make sure no empty-string keys exist.
Parameters
contents ( DirectoryContents )
Return type
None
toil.cwl.cwltoil.decode_directory(dir_path)
Decode a directory from a "toildir:" path to a directory (or a file in it).
Returns the
decoded directory dict, the remaining part of the path
(which may be None), and the deduplication key string that
uniquely identifies the directory.
Parameters
dir_path ( str )
Return type
tuple [DirectoryContents, Optional[ str ], str ]
toil.cwl.cwltoil.encode_directory(contents)
Encode a directory from a "toildir:" path to a directory (or a file in it).
Takes the
directory dict, which is a dict from name to URI for a file
or dict for a subdirectory.
Parameters
contents ( DirectoryContents )
Return type
str
class toil.cwl.cwltoil.ToilFsAccess(basedir, file_store=None)
Bases: cwltool.stdfsaccess.StdFsAccess
Custom filesystem access class which handles toil filestore references.
Normal file paths will be resolved relative to basedir, but 'toilfile:' and 'toildir:' URIs will be fulfilled from the Toil file store.
Also supports
URLs supported by Toil job store implementations.
Parameters
|
• |
basedir ( str ) |
||
|
• |
file_store ( Optional[- toil.fileStores.abstractFileStore.AbstractFileStore] ) |
file_store
dir_to_download:
dict
[
str
,
str
]
glob(pattern)
Return a possibly empty list of
absolute URI paths that match pathname.
Parameters
pattern ( str )
Return type
list [ str ]
open(fn, mode)
Parameters
|
• |
fn ( str ) |
|||
|
• |
mode ( str ) |
Return type
IO[Any]
exists(path)
Test for file existence.
Parameters
path ( str )
Return type
bool
size(path)
Parameters
path ( str )
Return type
int
isfile(fn)
Parameters
fn ( str )
Return type
bool
isdir(fn)
Parameters
fn ( str )
Return type
bool
listdir(fn)
Return a list containing the
absolute path URLs of the entries in the directory given by
path.
Parameters
fn ( str )
Return type
list [ str ]
join(path, *paths)
Parameters
|
• |
path ( str ) |
|||
|
• |
paths ( str ) |
Return type
str
realpath(fn)
Parameters
fn ( str )
Return type
str
toil.cwl.cwltoil.toil_get_file(file_store,
index, existing, uri,
streamable=False, streaming_allowed=True,
pipe_threads=None)
Set up the given file or directory from the Toil jobstore at a file URI where it can be accessed locally.
Run as part of
the tool setup, inside jobs on the workers. Also used as
part of reorganizing files to get them uploaded at the end
of a tool.
Parameters
|
• |
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore ) -- The Toil file store to download from. |
||
|
• |
index ( dict[str, str] ) -- Maps from downloaded file path back to input Toil URI. |
||
|
• |
existing ( dict[str, str] ) -- Maps from URI to downloaded file path. |
||
|
• |
uri ( str ) -- The URI for the file to download. |
||
|
• |
streamable ( bool ) -- If the file is has 'streamable' flag set |
||
|
• |
streaming_allowed ( bool ) -- If streaming is allowed |
||
|
• |
pipe_threads ( Optional[list[tuple[threading.Thread, int]]] ) -- List of threads responsible for streaming the data and open file descriptors corresponding to those files. Caller is responsible to close the file descriptors (to break the pipes) and join the threads |
Return type
str
toil.cwl.cwltoil.convert_file_uri_to_toil_uri(applyFunc,
index,
existing, file_uri)
Given a file URI, convert it to a toil file URI. Uses applyFunc to handle the conversion.
Runs once on every unique file URI.
'existing' is a set of files retrieved as inputs from toil_get_file. This ensures they are mapped back as the same name if passed through.
Returns a toil
uri path to the object.
Parameters
|
• |
applyFunc ( Callable[[str], toil.fileStores.FileID] ) |
|||
|
• |
index ( dict[str, str] ) |
|||
|
• |
existing ( dict[str, str] ) |
|||
|
• |
file_uri ( str ) |
Return type
str
toil.cwl.cwltoil.path_to_loc(obj)
Make a path into a location.
(If a CWL
object has a "path" and not a
"location")
Parameters
obj ( cwltool.utils.CWLObjectType )
Return type
None
toil.cwl.cwltoil.extract_file_uri_once(fileindex,
existing,
file_metadata, mark_broken=False, skip_remote=False)
Extract the filename from a CWL file record.
This function matches the predefined function signature in visit_files, which ensures that this function is called on all files inside a CWL object.
Ensures no
duplicate files are returned according to fileindex. If a
file has not been resolved already (and had
file://
prepended) then resolve symlinks. :param fileindex: Forward
mapping of filename :param existing: Reverse mapping of
filename. This function does not use this :param
file_metadata: CWL file record :param mark_broken: Whether
files should be marked as missing :param skip_remote:
Whether to skip remote files :return:
Parameters
|
• |
fileindex ( dict[str, str] ) |
|||
|
• |
existing ( dict[str, str] ) |
|||
|
• |
file_metadata ( cwltool.utils.CWLObjectType ) |
|||
|
• |
mark_broken ( bool ) |
|||
|
• |
skip_remote ( bool ) |
Return type
Optional[ str ]
toil.cwl.cwltoil.V
class toil.cwl.cwltoil.VisitFunc
Bases:
Protocol
[
V
]
__call__(fileindex, existing, file_metadata, mark_broken,
skip_remote)
Parameters
|
• |
fileindex ( dict[str, str] ) |
|||
|
• |
existing ( dict[str, str] ) |
|||
|
• |
file_metadata ( cwltool.utils.CWLObjectType ) |
|||
|
• |
mark_broken ( bool ) |
|||
|
• |
skip_remote ( bool ) |
Return type
V
toil.cwl.cwltoil.visit_files(func,
fs_access, fileindex, existing,
cwl_object, mark_broken=False, skip_remote=False,
bypass_file_store=False)
Prepare all files and directories.
Will be executed from the leader or worker in the context of the given CWL tool, order, or output object to be used on the workers. Make sure their sizes are set and import all the files.
Recurses inside directories using the fs_access to find files to upload and subdirectory structure to encode, even if their listings are not set or not recursive.
Preserves any listing fields.
If a file cannot be found (like if it is an optional secondary file that doesn't exist), fails, unless mark_broken is set, in which case it applies a sentinel location.
Also does some
miscellaneous normalization.
Parameters
|
• |
import_function -- The function used to upload a URI and get a Toil FileID for it. |
||
|
• |
fs_access ( cwltool.stdfsaccess.StdFsAccess ) -- the CWL FS access object we use to access the filesystem to find files to import. Needs to support the URI schemes used. |
||
|
• |
fileindex ( dict[str, str] ) -- Forward map to fill in from file URI to Toil storage location, used by write_file to deduplicate writes. |
||
|
• |
existing ( dict[str, str] ) -- Reverse map to fill in from Toil storage location to file URI. Not read from. |
||
|
• |
cwl_object ( Optional[cwltool.utils.CWLObjectType] ) -- CWL tool (or workflow order) we are importing files for |
||
|
• |
mark_broken ( bool ) -- If True, when files can't be imported because they e.g. don't exist, set their locations to MISSING_FILE rather than failing with an error. |
||
|
• |
skp_remote -- If True, leave remote URIs in place instead of importing files. |
||
|
• |
bypass_file_store ( bool ) -- If True, leave file:// URIs in place instead of importing files and directories. |
||
|
• |
log_level -- Log imported files at the given level. |
||
|
• |
func ( VisitFunc[V] ) |
||
|
• |
skip_remote ( bool ) |
Return type
list [V]
toil.cwl.cwltoil.upload_directory(directory_metadata,
directory_contents, mark_broken=False)
Upload a Directory object.
Ignores the listing (which may not be recursive and isn't safe or efficient to touch), and instead uses directory_contents, which is a recursive dict structure from filename to file URI or subdirectory contents dict.
Makes sure the directory actually exists, and rewrites its location to be something we can use on another machine.
If mark_broken is set, ignores missing directories and replaces them with directories containing the given (possibly empty) contents.
We can't rely
on the directory's listing as visible to the next tool as a
complete recursive description of the files we will need to
present to the tool, since some tools require it to be
cleared or single-level but still expect to see its contents
in the filesystem.
Parameters
|
• |
directory_metadata ( cwltool.utils.CWLObjectType ) |
|||
|
• |
directory_contents ( DirectoryContents ) |
|||
|
• |
mark_broken ( bool ) |
Return type
None
toil.cwl.cwltoil.extract_and_convert_file_to_toil_uri(convertfunc,
fileindex, existing, file_metadata, mark_broken=False,
skip_remote=False)
Extract the file URI out of a file object and convert it to a Toil URI.
Runs convertfunc on the file URI to handle conversion.
Is used to handle importing files into the jobstore.
If a file doesn't exist, fails with an error, unless mark_broken is set, in which case the missing file is given a special sentinel location.
Unless
skip_remote is set, also run on remote files and sets their
locations to toil URIs as well.
Parameters
|
• |
convertfunc ( Callable[[str], toil.fileStores.FileID] ) |
|||
|
• |
fileindex ( dict[str, str] ) |
|||
|
• |
existing ( dict[str, str] ) |
|||
|
• |
file_metadata ( cwltool.utils.CWLObjectType ) |
|||
|
• |
mark_broken ( bool ) |
|||
|
• |
skip_remote ( bool ) |
Return type
None
toil.cwl.cwltoil.writeGlobalFileWrapper(file_store, fileuri)
Wrap writeGlobalFile to accept
file://
URIs.
Parameters
|
• |
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore ) |
||
|
• |
fileuri ( str ) |
Return type
toil.fileStores.FileID
toil.cwl.cwltoil.remove_empty_listings(rec)
Parameters
rec ( cwltool.utils.CWLObjectType )
Return type
None
class
toil.cwl.cwltoil.CWLNamedJob(cores=1, memory='1GiB',
disk='1MiB',
accelerators=None, preemptible=None, tool_id=None,
parent_name=None,
subjob_name=None, local=None)
Bases: toil.job.Job
Base class for
all CWL jobs that do user work, to give them useful names.
Parameters
|
• |
cores ( Union[float, None] ) |
||
|
• |
memory ( Union[int, str, None] ) |
||
|
• |
disk ( Union[int, str, None] ) |
||
|
• |
accelerators ( Optional[list[- toil.job.AcceleratorRequirement]] ) |
||
|
• |
preemptible ( Optional[bool] ) |
||
|
• |
tool_id ( Optional[str] ) |
||
|
• |
parent_name ( Optional[str] ) |
||
|
• |
subjob_name ( Optional[str] ) |
||
|
• |
local ( Optional[bool] ) |
class toil.cwl.cwltoil.ResolveIndirect(cwljob, parent_name=None)
Bases: CWLNamedJob
Helper Job.
Accepts an
unresolved dict (containing promises) and produces a
dictionary of actual values.
Parameters
|
• |
cwljob ( toil.job.Promised[cwltool.utils.CWLObjectType] ) |
||
|
• |
parent_name ( Optional[str] ) |
||
|
cwljob |
run(file_store)
Evaluate the promises and
return their values.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
cwltool.utils.CWLObjectType
toil.cwl.cwltoil.toilStageFiles(toil,
cwljob, outdir, destBucket=None,
log_level=logging.DEBUG)
Copy input files out of the
global file store and update location and path.
Parameters
|
• |
destBucket ( Union[str, None] ) -- If set, export to this base URL instead of to the local filesystem. |
||
|
• |
log_level ( int ) -- Log each file transferred at the given level. |
||
|
• |
toil ( toil.common.Toil ) |
||
|
• |
cwljob ( Union[cwltool.utils.CWLObjectType, list[cwltool.utils.CWLObjectType]] ) |
||
|
• |
outdir ( str ) |
Return type
None
class
toil.cwl.cwltoil.CWLJobWrapper(tool, cwljob,
runtime_context,
parent_name, conditional=None)
Bases: CWLNamedJob
Wrap a CWL job that uses dynamic resources requirement.
When executed,
this creates a new child job which has the correct resource
requirement set.
Parameters
|
• |
tool ( cwltool.process.Process ) |
|||
|
• |
cwljob ( cwltool.utils.CWLObjectType ) |
|||
|
• |
runtime_context ( cwltool.context.RuntimeContext ) |
|||
|
• |
parent_name ( Optional[str] ) |
|||
|
• |
conditional ( Union[Conditional, None] ) |
cwltool
|
cwljob |
runtime_context
conditional
parent_name
run(file_store)
Create a child job with the
correct resource requirements set.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
Any
class
toil.cwl.cwltoil.CWLJob(tool, cwljob, runtime_context,
parent_name=None, conditional=None)
Bases: CWLNamedJob
Execute a CWL
tool using cwltool.executors.SingleJobExecutor.
Parameters
|
• |
tool ( cwltool.process.Process ) |
|||
|
• |
cwljob ( cwltool.utils.CWLObjectType ) |
|||
|
• |
runtime_context ( cwltool.context.RuntimeContext ) |
|||
|
• |
parent_name ( Optional[str] ) |
|||
|
• |
conditional ( Union[Conditional, None] ) |
cwltool
conditional
|
cwljob |
runtime_context
step_inputs
workdir:
str
required_env_vars(cwljob)
Yield environment variables
from EnvVarRequirement.
Parameters
cwljob ( Any )
Return type
Iterator[ tuple [ str , str ]]
populate_env_vars(cwljob)
Prepare environment variables necessary at runtime for the job.
Env vars
specified in the CWL "requirements" section should
already be loaded in self.cwltool.requirements, however
those specified with "EnvVarRequirement" take
precedence and are only populated here. Therefore, this not
only returns a dictionary with all evaluated
"EnvVarRequirement" env vars, but checks
self.cwltool.requirements for any env vars with the same
name and replaces their value with that found in the
"EnvVarRequirement" env var if it exists.
Parameters
cwljob ( cwltool.utils.CWLObjectType )
Return type
dict [ str , str ]
run(file_store)
Execute the CWL document.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
Any
toil.cwl.cwltoil.get_container_engine(runtime_context)
Parameters
runtime_context ( cwltool.context.RuntimeContext )
Return type
str
toil.cwl.cwltoil.makeRootJob(tool,
jobobj, runtime_context,
initialized_job_order, options, toil)
Create the Toil root Job object for the CWL tool. Is the same as makeJob() except this also handles import logic.
Actually
creates what might be a subgraph of two jobs. The second of
which may be the follow on of the first. If only one job is
created, it is returned twice.
Returns
Parameters
|
• |
tool ( cwltool.process.Process ) |
|||
|
• |
jobobj ( cwltool.utils.CWLObjectType ) |
|||
|
• |
runtime_context ( cwltool.context.RuntimeContext ) |
|||
|
• |
initialized_job_order ( cwltool.utils.CWLObjectType ) |
|||
|
• |
options ( configargparse.Namespace ) |
|||
|
• |
toil ( toil.common.Toil ) |
Return type
CWLNamedJob
toil.cwl.cwltoil.makeJob(tool,
jobobj, runtime_context, parent_name,
conditional)
Create the correct Toil Job object for the CWL tool.
Actually creates what might be a subgraph of two jobs. The second of which may be the follow on of the first. If only one job is created, it is returned twice.
Types:
workflow, job, or job wrapper for dynamic resource
requirements.
Returns
"wfjob, followOn" if the input tool is a workflow, and "job, job" otherwise
Parameters
|
• |
tool ( cwltool.process.Process ) |
|||
|
• |
jobobj ( cwltool.utils.CWLObjectType ) |
|||
|
• |
runtime_context ( cwltool.context.RuntimeContext ) |
|||
|
• |
parent_name ( Optional[str] ) |
|||
|
• |
conditional ( Union[Conditional, None] ) |
Return type
Union[ tuple [ CWLWorkflow , ResolveIndirect ], tuple [ CWLJob , CWLJob ], tuple [ CWLJobWrapper , CWLJobWrapper ]]
class
toil.cwl.cwltoil.CWLScatter(step, cwljob, runtime_context,
parent_name, conditional)
Bases: toil.job.Job
Implement workflow scatter step.
When run, this
creates a child job for each parameterization of the
scatter.
Parameters
|
• |
step ( cwltool.workflow.WorkflowStep ) |
|||
|
• |
cwljob ( cwltool.utils.CWLObjectType ) |
|||
|
• |
runtime_context ( cwltool.context.RuntimeContext ) |
|||
|
• |
parent_name ( Optional[str] ) |
|||
|
• |
conditional ( Union[Conditional, None] ) |
|||
|
step |
||||
|
cwljob |
runtime_context
conditional
parent_name
flat_crossproduct_scatter(joborder, scatter_keys, outputs,
postScatterEval)
Cartesian product of the
inputs, then flattened.
Parameters
|
• |
joborder ( cwltool.utils.CWLObjectType ) |
||
|
• |
scatter_keys ( list[str] ) |
||
|
• |
outputs (- list[toil.job.Promised[cwltool.utils.CWLObjectType]] ) |
||
|
• |
postScatterEval ( Callable[[cwltool.utils.CWLObjectType], cwltool.utils.CWLObjectType] ) |
Return type
None
nested_crossproduct_scatter(joborder,
scatter_keys,
postScatterEval)
Cartesian product of the
inputs.
Parameters
|
• |
joborder ( cwltool.utils.CWLObjectType ) |
|||
|
• |
scatter_keys ( list[str] ) |
|||
|
• |
postScatterEval ( Callable[[cwltool.utils.CWLObjectType], cwltool.utils.CWLObjectType] ) |
Return type
list [toil.job.Promised[cwltool.utils.CWLObjectType]]
run(file_store)
Generate the follow on scatter
jobs.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
list [toil.job.Promised[cwltool.utils.CWLObjectType]]
class toil.cwl.cwltoil.CWLGather(step, outputs)
Bases: toil.job.Job
Follows on to a scatter Job.
This gathers
the outputs of each job in the scatter into an array for
each output parameter.
Parameters
|
• |
step ( cwltool.workflow.WorkflowStep ) |
|||
|
• |
outputs ( toil.job.Promised[Union[cwltool.utils.CWLObjectType, list[cwltool.utils.CWLObjectType]]] ) |
|||
|
step |
outputs
static extract(obj, k)
Extract the given key from the obj.
If the object
is a list, extract it from all members of the list.
Parameters
|
• |
obj ( Union[cwltool.utils.CWLObjectType, list[cwltool.utils.CWLObjectType]] ) |
||
|
• |
k ( str ) |
Return type
Union[cwltool.utils.CWLOutputType, list [cwltool.utils.CWLObjectType]]
run(file_store)
Gather all the outputs of the
scatter.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
dict [ str , Any]
class toil.cwl.cwltoil.SelfJob(j, v)
Bases: toil.job.Job
Fake job object
to facilitate implementation of CWLWorkflow.run().
Parameters
|
• |
j ( CWLWorkflow ) |
|||
|
• |
v ( cwltool.utils.CWLObjectType ) |
|||
|
j |
||||
|
v |
rv(*path)
Return our properties
dictionary.
Parameters
path ( Any )
Return type
Any
addChild(c)
Add a child to our workflow.
Parameters
c ( toil.job.Job )
Return type
Any
hasChild(c)
Check if the given child is in
our workflow.
Parameters
c ( toil.job.Job )
Return type
Any
toil.cwl.cwltoil.ProcessType
toil.cwl.cwltoil.remove_pickle_problems(obj)
Doc_loader does not pickle correctly, causing Toil errors, remove from objects.
See github
issue:
https://github.com/mypyc/mypyc/issues/804
Parameters
obj ( ProcessType )
Return type
ProcessType
class
toil.cwl.cwltoil.CWLWorkflow(cwlwf, cwljob, runtime_context,
parent_name=None, conditional=None)
Bases: CWLNamedJob
Toil Job to convert a CWL workflow graph into a Toil job graph.
The Toil job
graph will include the appropriate dependencies.
Parameters
|
• |
cwlwf ( cwltool.workflow.Workflow ) |
|||
|
• |
cwljob ( cwltool.utils.CWLObjectType ) |
|||
|
• |
runtime_context ( cwltool.context.RuntimeContext ) |
|||
|
• |
parent_name ( Optional[str] ) |
|||
|
• |
conditional ( Union[Conditional, None] ) |
|||
|
cwlwf |
||||
|
cwljob |
runtime_context
conditional
run(file_store)
Convert a CWL Workflow graph into a Toil job graph.
Always runs on
the leader, because the batch system knows to schedule it as
a local job.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
Union[ UnresolvedDict , dict [ str , SkipNull ]]
class
toil.cwl.cwltoil.CWLInstallImportsJob(initialized_job_order,
tool, basedir, skip_remote, bypass_file_store, import_data,
**kwargs)
Bases: toil.job.Job
Class
represents a unit of work in toil.
Parameters
|
• |
initialized_job_order ( toil.job.Promised[cwltool.utils.CWLObjectType] ) |
||
|
• |
tool ( toil.job.Promised[cwltool.process.Process] ) |
||
|
• |
basedir ( str ) |
||
|
• |
skip_remote ( bool ) |
||
|
• |
bypass_file_store ( bool ) |
||
|
• |
import_data ( toil.job.Promised[dict[str, toil.fileStores.FileID]] ) |
||
|
• |
kwargs ( Any ) |
initialized_job_order
|
tool |
basedir
skip_remote
bypass_file_store
import_data
static fill_in_files(initialized_job_order, tool,
candidate_to_fileid, basedir, skip_remote,
bypass_file_store)
Given a mapping of filenames to
Toil file IDs, replace the filename with the file IDs
throughout the CWL object.
Parameters
|
• |
initialized_job_order ( cwltool.utils.CWLObjectType ) |
||
|
• |
tool ( cwltool.process.Process ) |
||
|
• |
candidate_to_fileid ( dict[str, toil.fileStores.FileID] ) |
||
|
• |
basedir ( str ) |
||
|
• |
skip_remote ( bool ) |
||
|
• |
bypass_file_store ( bool ) |
Return type
tuple [ cwltool.process.Process , cwltool.utils.CWLObjectType]
run(file_store)
Convert the filenames in the
workflow inputs into the URIs :return: Promise of
transformed workflow inputs. A tuple of the job order and
process
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
Tuple[ cwltool.process.Process , cwltool.utils.CWLObjectType]
class
toil.cwl.cwltoil.CWLImportWrapper(initialized_job_order,
tool,
runtime_context, file_to_data, options)
Bases: CWLNamedJob
Job to organize importing files on workers instead of the leader. Responsible for extracting filenames and metadata, calling ImportsJob, applying imports to the job objects, and scheduling the start workflow job
This class is
only used when runImportsOnWorkers is enabled.
Parameters
|
• |
initialized_job_order ( cwltool.utils.CWLObjectType ) |
|||
|
• |
tool ( cwltool.process.Process ) |
|||
|
• |
runtime_context ( cwltool.context.RuntimeContext ) |
|||
|
• |
file_to_data ( dict[str, toil.job.FileMetadata] ) |
|||
|
• |
options ( configargparse.Namespace ) |
initialized_job_order
|
tool |
options
runtime_context
file_to_data
run(file_store)
Override this function to
perform work and dynamically create successor jobs.
Parameters
|
• |
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process. |
||
|
• |
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore ) |
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
Return type
Any
class
toil.cwl.cwltoil.CWLStartJob(tool, initialized_job_order,
runtime_context, **kwargs)
Bases: CWLNamedJob
Job responsible for starting the CWL workflow.
Takes in the
workflow/tool and inputs after all files are imported and
creates jobs to run those workflows.
Parameters
|
• |
tool ( toil.job.Promised[cwltool.process.Process] ) |
|||
|
• |
initialized_job_order ( toil.job.Promised[cwltool.utils.CWLObjectType] ) |
|||
|
• |
runtime_context ( cwltool.context.RuntimeContext ) |
|||
|
• |
kwargs ( Any ) |
|||
|
tool |
initialized_job_order
runtime_context
run(file_store)
Override this function to
perform work and dynamically create successor jobs.
Parameters
|
• |
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process. |
||
|
• |
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore ) |
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
Return type
Any
toil.cwl.cwltoil.extract_workflow_inputs(options,
initialized_job_order, tool)
Collect all the workflow input
files to import later. :param options: namespace :param
initialized_job_order: cwl object :param tool: tool object
:return:
Parameters
|
• |
options ( configargparse.Namespace ) |
|||
|
• |
initialized_job_order ( cwltool.utils.CWLObjectType ) |
|||
|
• |
tool ( cwltool.process.Process ) |
Return type
list [ str ]
toil.cwl.cwltoil.import_workflow_inputs(jobstore,
options,
initialized_job_order, tool,
log_level=logging.DEBUG)
Import all workflow inputs on the leader.
Ran when not
importing on workers. :param jobstore: Toil jobstore :param
options: Namespace :param initialized_job_order: CWL object
:param tool: CWL tool :param log_level: log level :return:
Parameters
|
• |
jobstore (- toil.jobStores.abstractJobStore.AbstractJobStore ) |
||
|
• |
options ( configargparse.Namespace ) |
||
|
• |
initialized_job_order ( cwltool.utils.CWLObjectType ) |
||
|
• |
tool ( cwltool.process.Process ) |
||
|
• |
log_level ( int ) |
Return type
None
toil.cwl.cwltoil.T
toil.cwl.cwltoil.visitSteps(cmdline_tool, op)
Iterate over a CWL Process
object, running the op on each tool description CWL object.
Parameters
|
• |
cmdline_tool ( cwltool.process.Process ) |
||
|
• |
op ( Callable[[ruamel.yaml.comments.CommentedMap], list[T]] ) |
Return type
list [T]
toil.cwl.cwltoil.rm_unprocessed_secondary_files(job_params)
Parameters
job_params ( Any )
Return type
None
toil.cwl.cwltoil.filtered_secondary_files(unfiltered_secondary_files)
Remove unprocessed secondary files.
Interpolated strings and optional inputs in secondary files were added to CWL in version 1.1.
The CWL libraries we call do successfully resolve the interpolated strings, but add the resolved fields to the list of unresolved fields so we remove them here after the fact.
We keep
secondary files with anything other than MISSING_FILE as
their location. The 'required' logic seems to be handled
deeper in cwltool.builder.Builder(), and correctly
determines which files should be imported. Therefore we
remove the files here and if this file is SUPPOSED to exist,
it will still give the appropriate file does not exist
error, but just a bit further down the track.
Parameters
unfiltered_secondary_files ( cwltool.utils.CWLObjectType )
Return type
list [cwltool.utils.CWLObjectType]
toil.cwl.cwltoil.scan_for_unsupported_requirements(tool,
bypass_file_store=False)
Scan the given CWL tool for any unsupported optional features.
If it has them,
raise an informative UnsupportedRequirement.
Parameters
|
• |
tool ( cwltool.process.Process ) -- The CWL tool to check for unsupported requirements. |
||
|
• |
bypass_file_store ( bool ) -- True if the Toil file store is not being used to transport files between nodes, and raw origin node file:// URIs are exposed to tools instead. |
Return type
None
toil.cwl.cwltoil.determine_load_listing(tool)
Determine the directory.listing feature in CWL.
In CWL, any input directory can have a DIRECTORY_NAME.listing (where DIRECTORY_NAME is any variable name) set to one of the following three options:
|
1. |
no_listing: DIRECTORY_NAME.listing will be undefined.
e.g.
inputs.DIRECTORY_NAME.listing == unspecified
|
2. |
shallow_listing:
DIRECTORY_NAME.listing will return a list
one level
deep of DIRECTORY_NAME's contents. e.g.
inputs.DIRECTORY_NAME.listing
== [items in
directory]
inputs.DIRECTORY_NAME.listing[0].listing == undefined inputs.DIRECTORY_NAME.listing.length == # of items in directory
|
3. |
deep_listing:
DIRECTORY_NAME.listing will return a list of
the entire
contents of DIRECTORY_NAME. e.g.
inputs.DIRECTORY_NAME.listing == [items in directory] inputs.DIRECTORY_NAME.listing[0].listing == [items in subdirectory if it exists and is the first item listed] inputs.DIRECTORY_NAME.listing.length == # of items in directory
See - https://www.commonwl.org/v1.1/CommandLineTool.html#LoadListingRequirement and - https://www.commonwl.org/v1.1/CommandLineTool.html#LoadListingEnum
DIRECTORY_NAME.listing
should be determined first from loadListing. If that's not
specified, from LoadListingRequirement. Else, default to
"no_listing" if unspecified.
Parameters
tool ( cwltool.process.Process ) -- ToilCommandLineTool
Return str
One of 'no_listing', 'shallow_listing', or 'deep_listing'.
Return type
Literal['no_listing', 'shallow_listing', 'deep_listing']
exception toil.cwl.cwltoil.NoAvailableJobStoreException
Bases: Exception
Indicates that no job store name is available.
toil.cwl.cwltoil.generate_default_job_store(batch_system_name,
provisioner_name, local_directory)
Choose a default job store
appropriate to the requested batch system and provisioner,
and installed modules. Raises an error if no good default is
available and the user must choose manually.
Parameters
|
• |
batch_system_name ( Optional[str] ) -- Registry name of the batch system the user has requested, if any. If no name has been requested, should be None. |
||
|
• |
provisioner_name ( Optional[str] ) -- Name of the provisioner the user has requested, if any. Recognized provisioners include 'aws' and 'gce'. None indicates that no provisioner is in use. |
||
|
• |
local_directory ( str ) -- Path to a nonexistent local directory suitable for use as a file job store. |
Return str
Job store specifier for a usable job store.
Return type
str
toil.cwl.cwltoil.usage_message
toil.cwl.cwltoil.get_options(args)
Parse given args and properly
add non-Toil arguments into the cwljob of the Namespace.
:param args: List of args from command line :return: options
namespace
Parameters
args ( list[str] )
Return type
configargparse.Namespace
toil.cwl.cwltoil.main(args=None, stdout=sys.stdout)
Run the main loop for
toil-cwl-runner.
Parameters
|
• |
args ( Optional[list[str]] ) |
|||
|
• |
stdout ( TextIO ) |
Return type
int
toil.cwl.cwltoil.find_default_container(args, builder)
Find the default constructor by
consulting a Toil.options object.
Parameters
|
• |
args ( configargparse.Namespace ) |
|||
|
• |
builder ( cwltool.builder.Builder ) |
Return type
Optional[ str ]
toil.cwl.utils
Utility functions used for Toil's CWL interpreter.
Attributes
Exceptions
Functions
Module Contents
toil.cwl.utils.logger
toil.cwl.utils.CWL_UNSUPPORTED_REQUIREMENT_EXIT_CODE = 33
exception toil.cwl.utils.CWLUnsupportedException
Bases: Exception
Fallback exception.
toil.cwl.utils.CWL_UNSUPPORTED_REQUIREMENT_EXCEPTION:
type
[-
cwltool.errors.UnsupportedRequirement
] |
type
[
CWLUnsupportedException
]
toil.cwl.utils.visit_top_cwl_class(rec, classes, op)
Apply the given operation to all top-level CWL objects with the given named CWL class.
Like cwltool's
visit_class but doesn't look inside any object visited.
Parameters
|
• |
rec ( Any ) |
|||
|
• |
classes ( collections.abc.Iterable[str] ) |
|||
|
• |
op ( Callable[[Any], Any] ) |
Return type
None
toil.cwl.utils.DownReturnType
toil.cwl.utils.UpReturnType
toil.cwl.utils.visit_cwl_class_and_reduce(rec, classes,
op_down, op_up)
Apply the given operations to all CWL objects with the given named CWL class.
Applies the
down operation top-down, and the up operation bottom-up, and
passes the down operation's result and a list of the up
operation results for all child keys (flattening across
lists and collapsing nodes of non-matching classes) to the
up operation.
Returns
The flattened list of up operation results from all calls.
Parameters
|
• |
rec ( Any ) |
||
|
• |
classes ( collections.abc.Iterable[str] ) |
||
|
• |
op_down ( Callable[[Any], DownReturnType] ) |
||
|
• |
op_up ( Callable[[Any, DownReturnType, list[UpReturnType]], UpReturnType] ) |
Return type
list [UpReturnType]
toil.cwl.utils.DirectoryStructure
toil.cwl.utils.get_from_structure(dir_dict, path)
Given a relative path, follow it in the given directory structure.
Return the
string URI for files, the directory dict for subdirectories,
or None for nonexistent things.
Parameters
|
• |
dir_dict ( DirectoryStructure ) |
|||
|
• |
path ( str ) |
Return type
Union[ str , DirectoryStructure, None]
toil.cwl.utils.download_structure(file_store,
index, existing,
dir_dict, into_dir)
Download nested dictionary from the Toil file store to a local path.
Guaranteed to
fill the structure with real files, and not symlinks out of
it to elsewhere. File URIs may be toilfile: URIs or any
other URI that Toil's job store system can read.
Parameters
|
• |
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore ) -- The Toil file store to download from. |
||
|
• |
index ( dict[str, str] ) -- Maps from downloaded file path back to input URI. |
||
|
• |
existing ( dict[str, str] ) -- Maps from file_store_id URI to downloaded file path. |
||
|
• |
dir_dict ( DirectoryStructure ) -- a dict from string to string (for files) or dict (for subdirectories) describing a directory structure. |
||
|
• |
into_dir ( str ) -- The directory to download the top-level dict's files into. |
Return type
None
Functions
Package Contents
toil.cwl.check_cwltool_version()
Check if the installed cwltool version matches Toil's expected version.
A warning is
printed to standard error if the versions differ. We do not
assume that logging is set up already. Safe to call
repeatedly; only one warning will be printed.
Return type
None
toil.deferred
Attributes
Classes
Module Contents
toil.deferred.logger
class toil.deferred.DeferredFunction
Bases: namedtuple ( 'DeferredFunction' , 'function args kwargs name module' )
>>>
from collections import defaultdict
>>> df = DeferredFunction.create(defaultdict, None,
{'x':1}, y=2)
>>> df
DeferredFunction(defaultdict, ...)
>>> df.invoke() == defaultdict(None, x=1, y=2)
True
classmethod create(function, *args, **kwargs)
Capture the given callable and
arguments as an instance of this class.
Parameters
|
• |
function ( callable ) -- The deferred action to take in the form of a function |
||
|
• |
args ( tuple ) -- Non-keyword arguments to the function |
||
|
• |
kwargs ( dict ) -- Keyword arguments to the function |
invoke()
Invoke the captured function with the captured arguments.
__str__()
Return str(self).
__repr__
Return repr(self).
class toil.deferred.DeferredFunctionManager(stateDirBase)
Implements a deferred function system. Each Toil worker will have an instance of this class. When a job is executed, it will happen inside a context manager from this class. If the job registers any "deferred" functions, they will be executed when the context manager is exited.
If the Python process terminates before properly exiting the context manager and running the deferred functions, and some other worker process enters or exits the per-job context manager of this class at a later time, or when the DeferredFunctionManager is shut down on the worker, the earlier job's deferred functions will be picked up and run.
Note that deferred function cleanup is on a best-effort basis, and deferred functions may end up getting executed multiple times.
Internally, deferred functions are serialized into files in the given directory, which are locked by the owning process.
If that process
dies, other processes can detect that the files are able to
be locked, and will take them over.
Parameters
stateDirBase ( str )
STATE_DIR_STEM = 'deferred'
PREFIX = 'func'
WIP_SUFFIX = '.tmp'
stateDir
stateFileName
stateFileOut
stateFileIn
__del__()
Clean up our state on disk. We assume that the deferred functions we manage have all been executed, and none are currently recorded.
|
open() |
Yields a single-argument function that allows for deferred functions of type toil.DeferredFunction to be registered. We use this design so deferred functions can be registered only inside this context manager. |
Not thread safe.
classmethod cleanupWorker(stateDirBase)
Called by the batch system when
it shuts down the node, after all workers are done, if the
batch system supports worker cleanup. Checks once more for
orphaned deferred functions and runs them.
Parameters
stateDirBase ( str )
Return type
None
toil.exceptions
Neutral place for exceptions, to break import cycles.
Attributes
Exceptions
Module Contents
toil.exceptions.logger
exception toil.exceptions.FailedJobsException(job_store,
failed_jobs,
exit_code=1)
Bases: Exception
Common base
class for all non-exit exceptions.
Parameters
|
• |
job_store (- toil.jobStores.abstractJobStore.AbstractJobStore ) |
||
|
• |
failed_jobs ( list[toil.job.JobDescription] ) |
||
|
• |
exit_code ( int ) |
||
|
msg |
exit_code
jobStoreLocator
numberOfFailedJobs
__str__()
Stringify the exception,
including the message.
Return type
str
toil.fileStores
Submodules
toil.fileStores.abstractFileStore
Attributes
Classes
Module Contents
toil.fileStores.abstractFileStore.logger
class
toil.fileStores.abstractFileStore.AbstractFileStore(jobStore,
jobDesc, file_store_dir, waitForPreviousCommit)
Bases: abc.ABC
Interface used to allow user code run by Toil to read and write files.
Also provides the interface to other Toil facilities used by user code, including:
|
• |
normal (non-real-time) logging |
|||
|
• |
finding the correct temporary directory for scratch work |
|||
|
• |
importing and exporting files into and out of the workflow |
Stores user files in the jobStore, but keeps them separate from actual jobs.
May implement caching.
Passed as argument to the toil.job.Job.run() method.
Access to files is only permitted inside the context manager provided by toil.fileStores.abstractFileStore.AbstractFileStore.open() .
Also
responsible for committing completed jobs back to the job
store with an update operation, and allowing that commit
operation to be waited for.
Parameters
|
• |
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore ) |
||
|
• |
jobDesc ( toil.job.JobDescription ) |
||
|
• |
file_store_dir ( str ) |
||
|
• |
waitForPreviousCommit ( Callable[[], Any] ) |
jobStore
jobDesc
localTempDir:
str
workflow_dir:
str
coordination_dir:
str
jobName:
str
waitForPreviousCommit
logging_messages:
list
[
dict
[
str
,
int
|
str
]] = []
logging_user_streams:
list
[
dict
[
str
,
str
]] = []
filesToDelete:
set
[
str
]
static createFileStore(jobStore, jobDesc, file_store_dir,
waitForPreviousCommit, caching)
Create a concreate FileStore.
Parameters
|
• |
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore ) |
||
|
• |
jobDesc ( toil.job.JobDescription ) |
||
|
• |
file_store_dir ( str ) |
||
|
• |
waitForPreviousCommit ( Callable[[], Any] ) |
||
|
• |
caching ( Optional[bool] ) |
Return type
Union[- toil.fileStores.nonCachingFileStore.NonCachingFileStore , toil.fileStores.cachingFileStore.CachingFileStore ]
static
shutdownFileStore(workflowID, config_work_dir,
config_coordination_dir)
Carry out any necessary filestore-specific cleanup.
This is a destructive operation and it is important to ensure that there are no other running processes on the system that are modifying or using the file store for this workflow.
This is the
intended to be the last call to the file store in a Toil
run, called by the batch system cleanup function upon batch
system shutdown.
Parameters
|
• |
workflowID ( str ) -- The workflow ID for this invocation of the workflow |
||
|
• |
config_work_dir ( Optional[str] ) -- The path to the work directory in the Toil Config. |
||
|
• |
config_coordination_dir ( Optional[str] ) -- The path to the coordination directory in the Toil Config. |
Return type
None
open(job)
Create the context manager around tasks prior and after a job has been run.
File operations are only permitted inside the context manager.
Implementations
must only yield from within
with super().open(job):
.
Parameters
job ( toil.job.Job ) -- The job instance of the toil job to run.
Return type
collections.abc.Generator [None, None, None]
get_disk_usage()
Get the number of bytes of disk used by the last job run under open().
Disk usage is
measured at the end of the job. TODO: Sample periodically
and record peak usage.
Return type
Optional[ int ]
getLocalTempDir()
Get a new local temporary directory in which to write files.
The directory
will only persist for the duration of the job.
Returns
The absolute path to a new local temporary directory. This directory will exist for the duration of the job only, and is guaranteed to be deleted once the job terminates, removing all files it contains recursively.
Return type
str
getLocalTempFile(suffix=None, prefix=None)
Get a new local temporary file
that will persist for the duration of the job.
Parameters
|
• |
suffix ( Optional[str] ) -- If not None, the file name will end with this string. Otherwise, default value ".tmp" will be used |
||
|
• |
prefix ( Optional[str] ) -- If not None, the file name will start with this string. Otherwise, default value "tmp" will be used |
Returns
The absolute path to a local temporary file. This file will exist for the duration of the job only, and is guaranteed to be deleted once the job terminates.
Return type
str
getLocalTempFileName(suffix=None, prefix=None)
Get a valid name for a new
local file. Don't actually create a file at the path.
Parameters
|
• |
suffix ( Optional[str] ) -- If not None, the file name will end with this string. Otherwise, default value ".tmp" will be used |
||
|
• |
prefix ( Optional[str] ) -- If not None, the file name will start with this string. Otherwise, default value "tmp" will be used |
Returns
Path to valid file
Return type
str
abstract writeGlobalFile(localFileName, cleanup=False)
Upload a file (as a path) to the job store.
If the file is in a FileStore-managed temporary directory (i.e. from toil.fileStores.abstractFileStore.AbstractFileStore.getLocalTempDir() ), it will become a local copy of the file, eligible for deletion by toil.fileStores.abstractFileStore.AbstractFileStore.deleteLocalFile() .
If an
executable file on the local filesystem is uploaded, its
executability will be preserved when it is downloaded again.
Parameters
|
• |
localFileName ( str ) -- The path to the local file to upload. The last path component (basename of the file) will remain associated with the file in the file store, if supported by the backing JobStore, so that the file can be searched for by name or name glob. |
||
|
• |
cleanup ( bool ) -- if True then the copy of the global file will be deleted once the job and all its successors have completed running. If not the global file must be deleted manually. |
Returns
an ID that can be used to retrieve the file.
Return type
toil.fileStores.FileID
writeGlobalFileStream(cleanup=False,
basename=None,
encoding=None, errors=None)
Similar to writeGlobalFile, but
allows the writing of a stream to the job store. The yielded
file handle does not need to and should not be closed
explicitly.
Parameters
|
• |
encoding ( Optional[str] ) -- The name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( Optional[str] ) -- Specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
• |
cleanup ( bool ) -- is as in toil.fileStores.abstractFileStore.AbstractFileStore.writeGlobalFile() . |
||
|
• |
basename ( Optional[str] ) -- If supported by the backing JobStore, use the given file basename so that when searching the job store with a query matching that basename, the file will be detected. |
Returns
A context manager yielding a tuple of 1) a file handle which can be written to and 2) the toil.fileStores.FileID of the resulting file in the job store.
Return type
collections.abc.Iterator [ tuple [- toil.lib.io.WriteWatchingStream , toil.fileStores.FileID ]]
logAccess(fileStoreID, destination=None)
Record that the given file was read by the job.
(to be announced if the job fails)
If destination is not None, it gives the path that the file was downloaded to. Otherwise, assumes that the file was streamed.
Must be called
by
readGlobalFile()
and
readGlobalFileStream()
implementations.
Parameters
|
• |
fileStoreID ( Union[toil.fileStores.FileID, str] ) |
||
|
• |
destination ( Union[str, None] ) |
Return type
None
abstract
readGlobalFile(fileStoreID, userPath=None, cache=True,
mutable=False, symlink=False)
Make the file associated with fileStoreID available locally.
If mutable is True, then a copy of the file will be created locally so that the original is not modified and does not change the file for other jobs. If mutable is False, then a link can be created to the file, saving disk resources. The file that is downloaded will be executable if and only if it was originally uploaded from an executable file on the local filesystem.
If a user path is specified, it is used as the destination. If a user path isn't specified, the file is stored in the local temp directory with an encoded name.
The destination file must not be deleted by the user; it can only be deleted through deleteLocalFile.
Implementations
must call
logAccess()
to report the download.
Parameters
|
• |
fileStoreID ( str ) -- job store id for the file |
||
|
• |
userPath ( Optional[str] ) -- a path to the name of file to which the global file will be copied or hard-linked (see below). |
||
|
• |
cache ( bool ) -- Described in toil.fileStores.CachingFileStore.readGlobalFile() |
||
|
• |
mutable ( bool ) -- Described in toil.fileStores.CachingFileStore.readGlobalFile() |
||
|
• |
symlink ( bool ) -- True if caller can accept symlink, False if caller can only accept a normal file or hardlink |
Returns
An absolute path to a local, temporary copy of the file keyed by fileStoreID.
Return type
str
readGlobalFileStream(fileStoreID:
str
, encoding: Literal[None] =
None, errors:
str
|
None
= None)
-> ContextManager[IO[
bytes
]]
readGlobalFileStream(fileStoreID:
str
,
encoding:
str
, errors:
str
|
None
= None) ->
ContextManager[IO[
str
]]
Read a stream from the job store; similar to readGlobalFile.
The yielded
file handle does not need to and should not be closed
explicitly.
Parameters
|
• |
encoding -- the name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
Implementations
must call
logAccess()
to report the download.
Returns
a context manager yielding a file handle which can be read from.
getGlobalFileSize(fileStoreID)
Get the size of the file pointed to by the given ID, in bytes.
If a FileID or something else with a non-None 'size' field, gets that.
Otherwise, asks the job store to poll the file's size.
Note that the
job store may overestimate the file's size, for example if
it is encrypted and had to be augmented with an IV or other
encryption framing.
Parameters
fileStoreID ( Union[toil.fileStores.FileID, str] ) -- File ID for the file
Returns
File's size in bytes, as stored in the job store
Return type
int
abstract deleteLocalFile(fileStoreID)
Delete local copies of files associated with the provided job store ID.
Raises an OSError with an errno of errno.ENOENT if no such local copies exist. Thus, cannot be called multiple times in succession.
The files
deleted are all those previously read from this file ID via
readGlobalFile by the current job into the job's
file-store-provided temp directory, plus the file that was
written to create the given file ID, if it was written by
the current job from the job's file-store-provided temp
directory.
Parameters
fileStoreID ( Union[toil.fileStores.FileID, str] ) -- File Store ID of the file to be deleted.
Return type
None
abstract deleteGlobalFile(fileStoreID)
Delete local files and then permanently deletes them from the job store.
To ensure that
the job can be restarted if necessary, the delete will not
happen until after the job's run method has completed.
Parameters
fileStoreID ( Union[toil.fileStores.FileID, str] ) -- the File Store ID of the file to be deleted.
Return type
None
importFile(srcUrl, sharedFileName=None)
Parameters
|
• |
srcUrl ( str ) |
|||
|
• |
sharedFileName ( Optional[str] ) |
Return type
Optional[ toil.fileStores.FileID ]
import_file(src_uri, shared_file_name=None)
Parameters
|
• |
src_uri ( str ) |
|||
|
• |
shared_file_name ( Optional[str] ) |
Return type
Optional[ toil.fileStores.FileID ]
exportFile(jobStoreFileID, dstUrl)
Parameters
|
• |
jobStoreFileID ( toil.fileStores.FileID ) |
|||
|
• |
dstUrl ( str ) |
Return type
None
abstract export_file(file_id, dst_uri)
Parameters
|
• |
file_id ( toil.fileStores.FileID ) |
|||
|
• |
dst_uri ( str ) |
Return type
None
log_to_leader(text, level=logging.INFO)
Send a logging message to the
leader. The message will also be logged by the worker at the
same level.
Parameters
|
• |
text ( str ) -- The string to log. |
|||
|
• |
level ( int ) -- The logging level. |
Return type
None
logToMaster(text, level=logging.INFO)
Parameters
|
• |
text ( str ) |
|||
|
• |
level ( int ) |
Return type
None
log_user_stream(name, stream)
Send a stream of UTF-8 text to the leader as a named log stream.
Useful for
things like the error logs of Docker containers. The leader
will show it to the user or organize it appropriately for
user-level log information.
Parameters
|
• |
name ( str ) -- A hierarchical, .-delimited string. |
||
|
• |
stream ( IO[bytes] ) -- A stream of encoded text. Encoding errors will be tolerated. |
Return type
None
abstract startCommit(jobState=False)
Update the status of the job on the disk.
May bump the version number of the job.
May start an
asynchronous process. Call waitForCommit() to wait on that
process. You must waitForCommit() before committing any
further updates to the job. During the asynchronous process,
it is safe to modify the job; modifications after this call
will not be committed until the next call.
Parameters
jobState ( bool ) -- If True, commit the state of the FileStore's job, and file deletes. Otherwise, commit only file creates/updates.
Return type
None
abstract waitForCommit()
Blocks while startCommit is running.
This function is called by this job's successor to ensure that it does not begin modifying the job store until after this job has finished doing so.
Might be called
when startCommit is never called on a particular instance,
in which case it does not block.
Returns
Always returns True
Return type
bool
classmethod shutdown(shutdown_info)
Abstractmethod
Parameters
shutdown_info ( Any )
Return type
None
Shutdown the filestore on this node.
This is
intended to be called on batch system shutdown.
Parameters
shutdown_info ( Any ) -- The implementation-specific shutdown information, for shutting down the file store and removing all its state and all job local temp directories from the node.
Return type
None
toil.fileStores.cachingFileStore
Attributes
Exceptions
Classes
Module Contents
toil.fileStores.cachingFileStore.logger
toil.fileStores.cachingFileStore.SQLITE_TIMEOUT_SECS = 60.0
exception
toil.fileStores.cachingFileStore.CacheError(message)
Bases: Exception
Error Raised if the user attempts to add a non-local file to cache
exception toil.fileStores.cachingFileStore.CacheUnbalancedError
Bases: CacheError
Raised if file
store can't free enough space for caching
message = 'Unable unable to free enough space for caching.
This
error frequently arises due to jobs using...
exception
toil.fileStores.cachingFileStore.IllegalDeletionCacheError(deletedFile)
Bases: CacheError
Error raised if the caching code discovers a file that represents a reference to a cached file to have gone missing.
This can be a big problem if a hard link is moved, because then the cache will be unable to evict the file it links to.
Remember that files read with readGlobalFile may not be deleted by the user and need to be deleted with deleteLocalFile.
exception
toil.fileStores.cachingFileStore.InvalidSourceCacheError(message)
Bases: CacheError
Error raised if the user attempts to add a non-local file to cache
class
toil.fileStores.cachingFileStore.CachingFileStore(jobStore,
jobDesc, file_store_dir, waitForPreviousCommit)
Bases: toil.fileStores.abstractFileStore.AbstractFileStore
A cache-enabled file store.
Provides files that are read out as symlinks or hard links into a cache directory for the node, if permitted by the workflow.
Also attempts to write files back to the backing JobStore asynchronously, after quickly taking them into the cache. Writes are only required to finish when the job's actual state after running is committed back to the job store.
Internaly, manages caching using a database. Each node has its own database, shared between all the workers on the node. The database contains several tables:
files contains one entry for each file in the cache. Each entry knows the path to its data on disk. It also knows its global file ID, its state, and its owning worker PID. If the owning worker dies, another worker will pick it up. It also knows its size.
File states are:
|
• |
"cached": happily stored in the cache. Reads can happen immediately. Owner is null. May be adopted and moved to state "deleting" by anyone, if it has no outstanding immutable references. |
||
|
• |
"downloading": in the process of being saved to the cache by a non-null owner. Reads must wait for the state to become "cached". If the worker dies, goes to state "deleting", because we don't know if it was fully downloaded or if anyone still needs it. No references can be created to a "downloading" file except by the worker responsible for downloading it. |
||
|
• |
"uploadable": stored in the cache and ready to be written to the job store by a non-null owner. Transitions to "uploading" when a (thread of) the owning worker process picks it up and begins uploading it, to free cache space or to commit a completed job. If the worker dies, goes to state "cached", because it may have outstanding immutable references from the dead-but-not-cleaned-up job that was going to write it. |
||
|
• |
"uploading": stored in the cache and being written to the job store by a non-null owner. Transitions to "cached" when successfully uploaded. If the worker dies, goes to state "cached", because it may have outstanding immutable references from the dead-but-not-cleaned-up job that was writing it. |
||
|
• |
"deleting": in the process of being removed from the cache by a non-null owner. Will eventually be removed from the database. |
refs contains one entry for each outstanding reference to a cached file (hard link, symlink, or full copy). The table name is refs instead of references because references is an SQL reserved word. It remembers what job ID has the reference, and the path the reference is at. References have three states:
|
• |
"immutable": represents a hardlink or symlink to a file in the cache. Dedicates the file's size in bytes of the job's disk requirement to the cache, to be used to cache this file or to keep around other files without references. May be upgraded to "copying" if the link can't actually be created. |
||
|
• |
"copying": records that a file in the cache is in the process of being copied to a path. Will be upgraded to a mutable reference eventually. |
||
|
• |
"mutable": records that a file from the cache was copied to a certain path. Exist only to support deleteLocalFile's API. Only files with only mutable references (or no references) are eligible for eviction. |
jobs contains one entry for each job currently running. It keeps track of the job's ID, the worker that is supposed to be running the job, the job's disk requirement, and the job's local temp dir path that will need to be cleaned up. When workers check for jobs whose workers have died, they null out the old worker, and grab ownership of and clean up jobs and their references until the null-worker jobs are gone.
properties
contains key, value pairs for tracking total space
available, and whether caching is free for this run.
Parameters
|
• |
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore ) |
||
|
• |
jobDesc ( toil.job.JobDescription ) |
||
|
• |
file_store_dir ( str ) |
||
|
• |
waitForPreviousCommit ( Callable[[], Any] ) |
forceNonFreeCaching = False
forceDownloadDelay = None
contentionBackoff = 15
localCacheDir
jobName:
str
|
jobID |
jobDiskBytes:
float
|
None
= None
workflowAttemptNumber
|
dbPath |
process_identity_lock
commitThread = None
as_process()
Assume the process's identity to act on the caching database.
Yields the
process's name in the caching database, and holds onto a
lock while your thread has it.
Return type
collections.abc.Generator [ str , None, None]
property con: sqlite3.Connection
Get the database connection to
be used for the current thread.
Return type
sqlite3.Connection
property cur: sqlite3.Cursor
Get the main cursor to be used
for the current thread.
Return type
sqlite3.Cursor
getCacheLimit()
Return the total number of bytes to which the cache is limited.
If no limit is available, raises an error.
getCacheUsed()
Return the total number of bytes used in the cache.
If no value is available, raises an error.
getCacheExtraJobSpace()
Return the total number of bytes of disk space requested by jobs running against this cache but not yet used.
We can get into a situation where the jobs on the node take up all its space, but then they want to write to or read from the cache. So when that happens, we need to debit space from them somehow...
If no value is available, raises an error.
getCacheAvailable()
Return the total number of free bytes available for caching, or, if negative, the total number of bytes of cached files that need to be evicted to free up enough space for all the currently scheduled jobs.
If no value is available, raises an error.
getSpaceUsableForJobs()
Return the total number of bytes that are not taken up by job requirements, ignoring files and file usage. We can't ever run more jobs than we actually have room for, even with caching.
If not retrievable, raises an error.
getCacheUnusedJobRequirement()
Return the total number of bytes of disk space requested by the current job and not used by files the job is using in the cache.
Mutable references don't count, but immutable/uploading ones do.
If no value is available, raises an error.
adjustCacheLimit(newTotalBytes)
Adjust the total cache size limit to the given number of bytes.
fileIsCached(fileID)
Return true if the given file is currently cached, and false otherwise.
Note that this can't really be relied upon because a file may go cached -> deleting after you look at it. If you need to do something with the file you need to do it in a transaction.
getFileReaderCount(fileID)
Return the number of current outstanding reads of the given file.
Counts mutable references too.
cachingIsFree()
Return true if files can be cached for free, without taking up space. Return false otherwise.
This will be true when working with certain job stores in certain configurations, most notably the FileJobStore.
open(job)
This context manager decorated
method allows cache-specific operations to be conducted
before and after the execution of a job in worker.py
Parameters
job ( toil.job.Job )
Return type
collections.abc.Generator [None, None, None]
writeGlobalFile(localFileName, cleanup=False, executable=False)
Creates a file in the jobstore and returns a FileID reference.
readGlobalFile(fileStoreID,
userPath=None, cache=True,
mutable=False, symlink=False)
Make the file associated with fileStoreID available locally.
If mutable is True, then a copy of the file will be created locally so that the original is not modified and does not change the file for other jobs. If mutable is False, then a link can be created to the file, saving disk resources. The file that is downloaded will be executable if and only if it was originally uploaded from an executable file on the local filesystem.
If a user path is specified, it is used as the destination. If a user path isn't specified, the file is stored in the local temp directory with an encoded name.
The destination file must not be deleted by the user; it can only be deleted through deleteLocalFile.
Implementations
must call
logAccess()
to report the download.
Parameters
|
• |
fileStoreID -- job store id for the file |
||
|
• |
userPath -- a path to the name of file to which the global file will be copied or hard-linked (see below). |
||
|
• |
cache -- Described in toil.fileStores.CachingFileStore.readGlobalFile() |
||
|
• |
mutable -- Described in toil.fileStores.CachingFileStore.readGlobalFile() |
||
|
• |
symlink -- True if caller can accept symlink, False if caller can only accept a normal file or hardlink |
Returns
An absolute path to a local, temporary copy of the file keyed by fileStoreID.
readGlobalFileStream(fileStoreID, encoding=None, errors=None)
Read a stream from the job store; similar to readGlobalFile.
The yielded
file handle does not need to and should not be closed
explicitly.
Parameters
|
• |
encoding -- the name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
Implementations
must call
logAccess()
to report the download.
Returns
a context manager yielding a file handle which can be read from.
deleteLocalFile(fileStoreID)
Delete local copies of files associated with the provided job store ID.
Raises an OSError with an errno of errno.ENOENT if no such local copies exist. Thus, cannot be called multiple times in succession.
The files
deleted are all those previously read from this file ID via
readGlobalFile by the current job into the job's
file-store-provided temp directory, plus the file that was
written to create the given file ID, if it was written by
the current job from the job's file-store-provided temp
directory.
Parameters
fileStoreID -- File Store ID of the file to be deleted.
deleteGlobalFile(fileStoreID)
Delete local files and then permanently deletes them from the job store.
To ensure that
the job can be restarted if necessary, the delete will not
happen until after the job's run method has completed.
Parameters
fileStoreID -- the File Store ID of the file to be deleted.
exportFile(jobStoreFileID, dstUrl)
Parameters
|
• |
jobStoreFileID ( toil.fileStores.FileID ) |
|||
|
• |
dstUrl ( str ) |
Return type
None
export_file(file_id, dst_uri)
Parameters
|
• |
file_id ( toil.fileStores.FileID ) |
|||
|
• |
dst_uri ( str ) |
Return type
None
waitForCommit()
Blocks while startCommit is running.
This function is called by this job's successor to ensure that it does not begin modifying the job store until after this job has finished doing so.
Might be called
when startCommit is never called on a particular instance,
in which case it does not block.
Returns
Always returns True
Return type
bool
startCommit(jobState=False)
Update the status of the job on the disk.
May bump the version number of the job.
May start an
asynchronous process. Call waitForCommit() to wait on that
process. You must waitForCommit() before committing any
further updates to the job. During the asynchronous process,
it is safe to modify the job; modifications after this call
will not be committed until the next call.
Parameters
jobState -- If True, commit the state of the FileStore's job, and file deletes. Otherwise, commit only file creates/updates.
startCommitThread(state_to_commit)
Run in a thread to actually
commit the current job.
Parameters
state_to_commit ( Optional[- toil.job.JobDescription] )
classmethod shutdown(shutdown_info)
Parameters
shutdown_info ( tuple[str, str] ) -- Tuple of the coordination directory (where the cache database is) and the cache directory (where the cached data is).
Return type
None
Job local temp directories will be removed due to their appearance in the database.
__del__()
Cleanup function that is run when destroying the class instance that ensures that all the file writing threads exit.
toil.fileStores.nonCachingFileStore
Attributes
Classes
Module Contents
toil.fileStores.nonCachingFileStore.logger:
logging.Logger
class
toil.fileStores.nonCachingFileStore.NonCachingFileStore(jobStore,
jobDesc, file_store_dir, waitForPreviousCommit)
Bases: toil.fileStores.abstractFileStore.AbstractFileStore
Interface used to allow user code run by Toil to read and write files.
Also provides the interface to other Toil facilities used by user code, including:
|
• |
normal (non-real-time) logging |
|||
|
• |
finding the correct temporary directory for scratch work |
|||
|
• |
importing and exporting files into and out of the workflow |
Stores user files in the jobStore, but keeps them separate from actual jobs.
May implement caching.
Passed as argument to the toil.job.Job.run() method.
Access to files is only permitted inside the context manager provided by toil.fileStores.abstractFileStore.AbstractFileStore.open() .
Also
responsible for committing completed jobs back to the job
store with an update operation, and allowing that commit
operation to be waited for.
Parameters
|
• |
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore ) |
||
|
• |
jobDesc ( toil.job.JobDescription ) |
||
|
• |
file_store_dir ( str ) |
||
|
• |
waitForPreviousCommit ( Callable[[], Any] ) |
jobStateFile:
str
|
None
= None
localFileMap: DefaultDict[
str
,
list
[
str
]]
static
check_for_coordination_corruption(coordination_dir)
Make sure the coordination directory hasn't been deleted unexpectedly.
Slurm has been
known to delete XDG_RUNTIME_DIR out from under processes it
was promised to, so it is possible that in certain
misconfigured environments the coordination directory and
everything in it could go away unexpectedly. We are going to
regularly make sure that the things we think should exist
actually exist, and we are going to abort if they do not.
Parameters
coordination_dir ( Optional[str] )
Return type
None
check_for_state_corruption()
Make sure state tracking
information hasn't been deleted unexpectedly.
Return type
None
open(job)
Create the context manager around tasks prior and after a job has been run.
File operations are only permitted inside the context manager.
Implementations
must only yield from within
with super().open(job):
.
Parameters
job ( toil.job.Job ) -- The job instance of the toil job to run.
Return type
collections.abc.Generator [None, None, None]
writeGlobalFile(localFileName, cleanup=False)
Upload a file (as a path) to the job store.
If the file is in a FileStore-managed temporary directory (i.e. from toil.fileStores.abstractFileStore.AbstractFileStore.getLocalTempDir() ), it will become a local copy of the file, eligible for deletion by toil.fileStores.abstractFileStore.AbstractFileStore.deleteLocalFile() .
If an
executable file on the local filesystem is uploaded, its
executability will be preserved when it is downloaded again.
Parameters
|
• |
localFileName ( str ) -- The path to the local file to upload. The last path component (basename of the file) will remain associated with the file in the file store, if supported by the backing JobStore, so that the file can be searched for by name or name glob. |
||
|
• |
cleanup ( bool ) -- if True then the copy of the global file will be deleted once the job and all its successors have completed running. If not the global file must be deleted manually. |
Returns
an ID that can be used to retrieve the file.
Return type
toil.fileStores.FileID
readGlobalFile(fileStoreID,
userPath=None, cache=True,
mutable=False, symlink=False)
Make the file associated with fileStoreID available locally.
If mutable is True, then a copy of the file will be created locally so that the original is not modified and does not change the file for other jobs. If mutable is False, then a link can be created to the file, saving disk resources. The file that is downloaded will be executable if and only if it was originally uploaded from an executable file on the local filesystem.
If a user path is specified, it is used as the destination. If a user path isn't specified, the file is stored in the local temp directory with an encoded name.
The destination file must not be deleted by the user; it can only be deleted through deleteLocalFile.
Implementations
must call
logAccess()
to report the download.
Parameters
|
• |
fileStoreID ( str ) -- job store id for the file |
||
|
• |
userPath ( Optional[str] ) -- a path to the name of file to which the global file will be copied or hard-linked (see below). |
||
|
• |
cache ( bool ) -- Described in toil.fileStores.CachingFileStore.readGlobalFile() |
||
|
• |
mutable ( bool ) -- Described in toil.fileStores.CachingFileStore.readGlobalFile() |
||
|
• |
symlink ( bool ) -- True if caller can accept symlink, False if caller can only accept a normal file or hardlink |
Returns
An absolute path to a local, temporary copy of the file keyed by fileStoreID.
Return type
str
readGlobalFileStream(fileStoreID:
str
, encoding: Literal[None] =
None, errors:
str
|
None
= None)
-> ContextManager[IO[
bytes
]]
readGlobalFileStream(fileStoreID:
str
,
encoding:
str
, errors:
str
|
None
= None) ->
ContextManager[IO[
str
]]
Read a stream from the job store; similar to readGlobalFile.
The yielded
file handle does not need to and should not be closed
explicitly.
Parameters
|
• |
encoding -- the name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
Implementations
must call
logAccess()
to report the download.
Returns
a context manager yielding a file handle which can be read from.
exportFile(jobStoreFileID, dstUrl)
Parameters
|
• |
jobStoreFileID ( toil.fileStores.FileID ) |
|||
|
• |
dstUrl ( str ) |
Return type
None
export_file(file_id, dst_uri)
Parameters
|
• |
file_id ( toil.fileStores.FileID ) |
|||
|
• |
dst_uri ( str ) |
Return type
None
deleteLocalFile(fileStoreID)
Delete local copies of files associated with the provided job store ID.
Raises an OSError with an errno of errno.ENOENT if no such local copies exist. Thus, cannot be called multiple times in succession.
The files
deleted are all those previously read from this file ID via
readGlobalFile by the current job into the job's
file-store-provided temp directory, plus the file that was
written to create the given file ID, if it was written by
the current job from the job's file-store-provided temp
directory.
Parameters
fileStoreID ( str ) -- File Store ID of the file to be deleted.
Return type
None
deleteGlobalFile(fileStoreID)
Delete local files and then permanently deletes them from the job store.
To ensure that
the job can be restarted if necessary, the delete will not
happen until after the job's run method has completed.
Parameters
fileStoreID ( str ) -- the File Store ID of the file to be deleted.
Return type
None
waitForCommit()
Blocks while startCommit is running.
This function is called by this job's successor to ensure that it does not begin modifying the job store until after this job has finished doing so.
Might be called
when startCommit is never called on a particular instance,
in which case it does not block.
Returns
Always returns True
Return type
bool
startCommit(jobState=False)
Update the status of the job on the disk.
May bump the version number of the job.
May start an
asynchronous process. Call waitForCommit() to wait on that
process. You must waitForCommit() before committing any
further updates to the job. During the asynchronous process,
it is safe to modify the job; modifications after this call
will not be committed until the next call.
Parameters
jobState ( bool ) -- If True, commit the state of the FileStore's job, and file deletes. Otherwise, commit only file creates/updates.
Return type
None
__del__()
Cleanup function that is run
when destroying the class instance. Nothing to do since
there are no async write events.
Return type
None
classmethod shutdown(shutdown_info)
Parameters
shutdown_info ( str ) -- The coordination directory.
Return type
None
Classes
Package Contents
class toil.fileStores.FileID(fileStoreID, size, executable=False)
Bases: str
A small wrapper around Python's builtin string class.
It is used to represent a file's ID in the file store, and has a size attribute that is the file's size in bytes. This object is returned by importFile and writeGlobalFile.
Calls into the
file store can use bare strings; size will be queried from
the job store if unavailable in the ID.
Parameters
|
• |
fileStoreID ( str ) |
|||
|
• |
size ( int ) |
|||
|
• |
executable ( bool ) |
|||
|
size |
executable
|
pack() |
Pack the FileID into a string so it can be passed through external code. |
Return type
str
classmethod forPath(fileStoreID, filePath)
Parameters
|
• |
fileStoreID ( str ) |
|||
|
• |
filePath ( str ) |
Return type
FileID
classmethod unpack(packedFileStoreID)
Unpack the result of pack()
into a FileID object.
Parameters
packedFileStoreID ( str )
Return type
FileID
toil.job
Attributes
Exceptions
Classes
Functions
Module Contents
toil.job.logger
exception toil.job.JobPromiseConstraintError(promisingJob,
recipientJob=None)
Bases: RuntimeError
Error for job being asked to promise its return value, but it not available.
(Due to the
return value not yet been hit in the topological order of
the job graph.)
Parameters
|
• |
promisingJob ( Job ) |
|||
|
• |
recipientJob ( Optional[Job] ) |
promisingJob
recipientJob
exception toil.job.ConflictingPredecessorError(predecessor, successor)
Bases: Exception
Common base
class for all non-exit exceptions.
Parameters
|
• |
predecessor ( Job ) |
|||
|
• |
successor ( Job ) |
exception toil.job.DebugStoppingPointReached
Bases: BaseException
Raised when a job reaches a point at which it has been instructed to stop for debugging.
exception
toil.job.FilesDownloadedStoppingPointReached(message,
host_and_job_paths=None)
Bases: DebugStoppingPointReached
Raised when a
job stops because it was asked to download its files, and
the files are downloaded.
Parameters
host_and_job_paths ( Optional[list[tuple[str, str]]] )
host_and_job_paths
class toil.job.TemporaryID
Placeholder for a unregistered
job ID used by a JobDescription.
Needs to be held:
|
• |
By JobDescription objects to record normal relationships. |
||
|
• |
By Jobs to key their connected-component registries and to record predecessor relationships to facilitate EncapsulatedJob adding itself as a child. |
||
|
• |
By Services to tie back to their hosting jobs, so the service tree can be built up from Service objects. |
__str__()
Return type
str
__repr__()
Return type
str
__hash__()
Return type
int
__eq__(other)
Parameters
other ( Any )
Return type
bool
__ne__(other)
Parameters
other ( Any )
Return type
bool
class toil.job.AcceleratorRequirement
Bases: TypedDict
Requirement for
one or more computational accelerators, like a GPU or FPGA.
count:
int
How many of the accelerator are needed to run the job.
kind: str
What kind of accelerator is required. Can be "gpu". Other kinds defined in the future might be "fpga", etc.
model: typing_extensions.NotRequired[ str ]
What model of accelerator is needed. The exact set of values available depends on what the backing scheduler calls its accelerators; strings like "nvidia-tesla-k80" might be expected to work. If a specific model of accelerator is not required, this should be absent.
brand: typing_extensions.NotRequired[ str ]
What brand or manufacturer of accelerator is required. The exact set of values available depends on what the backing scheduler calls the brands of its accleerators; strings like "nvidia" or "amd" might be expected to work. If a specific brand of accelerator is not required (for example, because the job can use multiple brands of accelerator that support a given API) this should be absent.
api: typing_extensions.NotRequired[ str ]
What API is to be used to communicate with the accelerator. This can be "cuda". Other APIs supported in the future might be "rocm", "opencl", "metal", etc. If the job does not need a particular API to talk to the accelerator, this should be absent.
toil.job.parse_accelerator(spec)
Parse an AcceleratorRequirement specified by user code.
Supports formats like:
>>>
parse_accelerator(8)
{'count': 8, 'kind': 'gpu'}
>>>
parse_accelerator("1")
{'count': 1, 'kind': 'gpu'}
>>>
parse_accelerator("nvidia-tesla-k80")
{'count': 1, 'kind': 'gpu', 'brand': 'nvidia', 'model':
'nvidia-tesla-k80'}
>>>
parse_accelerator("nvidia-tesla-k80:2")
{'count': 2, 'kind': 'gpu', 'brand': 'nvidia', 'model':
'nvidia-tesla-k80'}
>>>
parse_accelerator("gpu")
{'count': 1, 'kind': 'gpu'}
>>>
parse_accelerator("cuda:1")
{'count': 1, 'kind': 'gpu', 'brand': 'nvidia', 'api':
'cuda'}
>>>
parse_accelerator({"kind": "gpu"})
{'count': 1, 'kind': 'gpu'}
>>>
parse_accelerator({"brand": "nvidia",
"count": 5})
{'count': 5, 'kind': 'gpu', 'brand': 'nvidia'}
Assumes that if not specified, we are talking about GPUs, and about one of them. Knows that "gpu" is a kind, and "cuda" is an API, and "nvidia" is a brand.
|
Raises |
|||
|
• |
ValueError -- if it gets something it can't parse |
||
|
• |
TypeError -- if it gets something it can't parse because it's the wrong type. |
Parameters
spec ( Union[int, str, dict[str, Union[str, int]]] )
Return type
AcceleratorRequirement
toil.job.accelerator_satisfies(candidate, requirement, ignore=[])
Test if candidate partially
satisfies the given requirement.
Returns
True if the given candidate at least partially satisfies the given requirement (i.e. check all fields other than count).
Parameters
|
• |
candidate ( AcceleratorRequirement ) |
|||
|
• |
requirement ( AcceleratorRequirement ) |
|||
|
• |
ignore ( list[str] ) |
Return type
bool
toil.job.accelerators_fully_satisfy(candidates, requirement, ignore=[])
Determine if a set of accelerators satisfy a requirement.
Ignores fields
specified in ignore.
Returns
True if the requirement AcceleratorRequirement is fully satisfied by the ones in the list, taken together (i.e. check all fields including count).
Parameters
|
• |
candidates ( Optional[list[AcceleratorRequirement]] ) |
|||
|
• |
requirement ( AcceleratorRequirement ) |
|||
|
• |
ignore ( list[str] ) |
Return type
bool
class toil.job.RequirementsDict
Bases: TypedDict
Typed storage for requirements for a job.
Where
requirement values are of different types depending on the
requirement.
cores: typing_extensions.NotRequired[
int
|
float
]
memory: typing_extensions.NotRequired[
int
]
disk: typing_extensions.NotRequired[
int
]
accelerators:
typing_extensions.NotRequired[
list
[-
AcceleratorRequirement
]]
preemptible:
typing_extensions.NotRequired[
bool
]
toil.job.REQUIREMENT_NAMES =
['disk', 'memory', 'cores',
'accelerators', 'preemptible']
toil.job.ParsedRequirement
toil.job.ParseableIndivisibleResource
toil.job.ParseableDivisibleResource
toil.job.ParseableFlag
toil.job.ParseableAcceleratorRequirement
toil.job.ParseableRequirement
class toil.job.Requirer(requirements)
Base class implementing the storage and presentation of requirements.
Has cores,
memory, disk, and preemptability as properties.
Parameters
requirements ( Mapping[str, ParseableRequirement] )
assignConfig(config)
Assign the given config object to be used to provide default values.
Must be called
exactly once on a loaded JobDescription before any
requirements are queried.
Parameters
config ( toil.common.Config ) -- Config object to query
Return type
None
__getstate__()
Return the dict to use as the
instance's __dict__ when pickling.
Return type
dict [ str , Any]
__copy__()
Return a semantically-shallow
copy of the object, for
copy.copy()
.
Return type
Requirer
__deepcopy__(memo)
Return a semantically-deep copy
of the object, for
copy.deepcopy()
.
Parameters
memo ( Any )
Return type
Requirer
property requirements: RequirementsDict
Get dict containing all
non-None, non-defaulted requirements.
Return type
RequirementsDict
property disk: int
Get the maximum number of bytes
of disk required.
Return type
int
property memory: int
Get the maximum number of bytes
of memory required.
Return type
int
property cores: int | float
Get the number of CPU cores
required.
Return type
Union[ int , float ]
property preemptible: bool
Whether a preemptible node is
permitted, or a nonpreemptible one is required.
Return type
bool
preemptable(val)
Parameters
val ( ParseableFlag )
Return type
None
property accelerators: list [ AcceleratorRequirement ]
Any accelerators, such as GPUs,
that are needed.
Return type
list [ AcceleratorRequirement ]
scale(requirement, factor)
Return a copy of this object with the given requirement scaled up or down.
Only works on
requirements where that makes sense.
Parameters
|
• |
requirement ( str ) |
|||
|
• |
factor ( float ) |
Return type
Requirer
requirements_string()
Get a nice human-readable
string of our requirements.
Return type
str
class toil.job.JobBodyReference
Bases: NamedTuple
Reference from
a job description to its body.
file_store_id:
str
File ID (or special shared file name for the root job) of the job's body.
module_string: str
Stringified description of the module needed to load the body.
class
toil.job.JobDescription(requirements, jobName, unitName='',
displayName='', local=None, files=None)
Bases: Requirer
Stores all the
information that the Toil Leader ever needs to know about a
Job.
This includes:
|
• |
Resource requirements. |
||
|
• |
Which jobs are children or follow-ons or predecessors of this job. |
||
|
• |
A reference to the Job object in the job store. |
Can be obtained from an actual (i.e. executable) Job object, and can be used to obtain the Job object from the JobStore.
Never contains other Jobs or JobDescriptions: all reference is by ID.
Subclassed into
variants for checkpoint jobs and service jobs that have
their specific parameters.
Parameters
|
• |
requirements ( Mapping[str, Union[int, str, bool]] ) |
|||
|
• |
jobName ( str ) |
|||
|
• |
unitName ( Optional[str] ) |
|||
|
• |
displayName ( Optional[str] ) |
|||
|
• |
local ( Optional[bool] ) |
|||
|
• |
files ( Optional[set[toil.fileStores.FileID]] ) |
local:
bool
jobName
unitName
displayName
jobStoreID:
str
|
TemporaryID
filesToDelete = []
predecessorNumber = 0
predecessorsFinished
childIDs:
set
[
str
]
followOnIDs:
set
[
str
]
successor_phases:
list
[
set
[
str
]]
serviceTree
logJobStoreFileID = None
files_to_use
get_names()
Get the names and ID of this
job as a named tuple.
Return type
toil.bus.Names
get_chain()
Get all the jobs that executed in this job's chain, in order.
For each job, produces a named tuple with its various names and its original job store ID. The jobs in the chain are in execution order.
If the job
hasn't run yet or it didn't chain, produces a one-item list.
Return type
list [ toil.bus.Names ]
serviceHostIDsInBatches()
Find all batches of service host job IDs that can be started at the same time.
(in the order
they need to start in)
Return type
Iterator[ list [ str ]]
successorsAndServiceHosts()
Get an iterator over all child,
follow-on, and service job IDs.
Return type
Iterator[ str ]
allSuccessors()
Get an iterator over all child, follow-on, and chained, inherited successor job IDs.
Follow-ons will
come before children.
Return type
Iterator[ str ]
successors_by_phase()
Get an iterator over all child/follow-on/chained inherited successor job IDs, along with their phase number on the stack.
Phases execute
higher numbers to lower numbers.
Return type
Iterator[ tuple [ int , str ]]
property services
Get a collection of the IDs of service host jobs for this job, in arbitrary order.
Will be empty if the job has no unfinished services.
has_body()
Returns True if we have a job
body associated, and False otherwise.
Return type
bool
attach_body(file_store_id, user_script)
Attach a job body to this JobDescription.
Takes the file store ID that the body is stored at, and the required user script module.
The file store
ID can also be "firstJob" for the root job, stored
as a shared file instead.
Parameters
|
• |
file_store_id ( str ) |
|||
|
• |
user_script ( toil.resource.ModuleDescriptor ) |
Return type
None
detach_body()
Drop the body reference from a
JobDescription.
Return type
None
get_body()
Get the information needed to
load the job body.
Returns
a file store ID (or magic shared file name "firstJob") and a user script module.
Return type
tuple [ str , toil.resource.ModuleDescriptor ]
Fails if no body is attached; check has_body() first.
nextSuccessors()
Return the collection of job IDs for the successors of this job that are ready to run.
If those jobs have multiple predecessor relationships, they may still be blocked on other jobs.
Returns None
when at the final phase (all successors done), and an empty
collection if there are more phases but they can't be
entered yet (e.g. because we are waiting for the job itself
to run).
Return type
Optional[ set [ str ]]
filterSuccessors(predicate)
Keep only successor jobs for which the given predicate function approves.
The predicate function is called with the job's ID.
Treats all
other successors as complete and forgets them.
Parameters
predicate ( Callable[[str], bool] )
Return type
None
filterServiceHosts(predicate)
Keep only services for which the given predicate approves.
The predicate function is called with the service host job's ID.
Treats all
other services as complete and forgets them.
Parameters
predicate ( Callable[[str], bool] )
Return type
None
clear_nonexistent_dependents(job_store)
Remove all references to child, follow-on, and associated service jobs that do not exist.
That is to say,
all those that have been completed and removed.
Parameters
job_store (- toil.jobStores.abstractJobStore.AbstractJobStore )
Return type
None
clear_dependents()
Remove all references to
successor and service jobs.
Return type
None
is_subtree_done()
Check if the subtree is done.
Returns
True if the job appears to be done, and all related child, follow-on, and service jobs appear to be finished and removed.
Return type
bool
replace(other)
Take on the ID of another JobDescription, retaining our own state and type.
When updated in the JobStore, we will save over the other JobDescription.
Useful for chaining jobs: the chained-to job can replace the parent job.
Merges cleanup
state and successors other than this job from the job being
replaced into this one.
Parameters
other ( JobDescription ) -- Job description to replace.
Return type
None
assert_is_not_newer_than(other)
Make sure this JobDescription
is not newer than a prospective new version of the
JobDescription.
Parameters
other ( JobDescription )
Return type
None
is_updated_by(other)
Return True if the passed
JobDescription is a distinct, newer version of this one.
Parameters
other ( JobDescription )
Return type
bool
addChild(childID)
Make the job with the given ID
a child of the described job.
Parameters
childID ( str )
Return type
None
addFollowOn(followOnID)
Make the job with the given ID
a follow-on of the described job.
Parameters
followOnID ( str )
Return type
None
addServiceHostJob(serviceID, parentServiceID=None)
Make the ServiceHostJob with the given ID a service of the described job.
If a parent ServiceHostJob ID is given, that parent service will be started first, and must have already been added.
hasChild(childID)
Return True if the job with the
given ID is a child of the described job.
Parameters
childID ( str )
Return type
bool
hasFollowOn(followOnID)
Test if the job with the given
ID is a follow-on of the described job.
Parameters
followOnID ( str )
Return type
bool
hasServiceHostJob(serviceID)
Test if the ServiceHostJob is a
service of the described job.
Return type
bool
renameReferences(renames)
Apply the given dict of ID renames to all references to jobs.
Does not modify
our own ID or those of finished predecessors. IDs not
present in the renames dict are left as-is.
Parameters
renames ( dict[TemporaryID, str] ) -- Rename operations to apply.
Return type
None
addPredecessor()
Notify the JobDescription that
a predecessor has been added to its Job.
Return type
None
onRegistration(jobStore)
Perform setup work that requires the JobStore.
Called by the Job saving logic when this JobDescription meets the JobStore and has its ID assigned.
Overridden to
perform setup work (like hooking up flag files for service
jobs) that requires the JobStore.
Parameters
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore ) -- The job store we are being placed into
Return type
None
setupJobAfterFailure(exit_status=None, exit_reason=None)
Configure job after a failure.
Reduce the remainingTryCount if greater than zero and set the memory to be at least as big as the default memory (in case of exhaustion of memory, which is common).
Requires a
configuration to have been assigned (see
toil.job.Requirer.assignConfig()
).
Parameters
|
• |
exit_status ( Optional[int] ) -- The exit code from the job. |
||
|
• |
exit_reason ( Optional[- toil.batchSystems.abstractBatchSystem.BatchJobExitReason] ) -- The reason the job stopped, if available from the batch system. |
Return type
None
getLogFileHandle(jobStore)
Create a context manager that yields a file handle to the log file.
Assumes logJobStoreFileID is set.
property remainingTryCount
Get the number of tries remaining.
The try count set on the JobDescription, or the default based on the retry count from the config if none is set.
clearRemainingTryCount()
Clear remainingTryCount and set
it back to its default value.
Returns
True if a modification to the JobDescription was made, and False otherwise.
Return type
bool
__str__()
Produce a useful logging string
identifying this job.
Return type
str
__repr__()
reserve_versions(count)
Reserve a job version number
for later, for journaling asynchronously.
Parameters
count ( int )
Return type
None
pre_update_hook()
Run before pickling and saving a created or updated version of this job.
Called by the
job store.
Return type
None
class toil.job.ServiceJobDescription(*args, **kwargs)
Bases: JobDescription
A description
of a job that hosts a service.
terminateJobStoreID:
str
|
None
= None
startJobStoreID:
str
|
None
=
None
errorJobStoreID:
str
|
None
=
None
onRegistration(jobStore)
Setup flag files.
When a ServiceJobDescription first meets the JobStore, it needs to set up its flag files.
class toil.job.CheckpointJobDescription(*args, **kwargs)
Bases: JobDescription
A description
of a job that is a checkpoint.
checkpoint:
JobBodyReference
|
None
= None
checkpointFilesToDelete = []
set_checkpoint()
Save a body checkpoint into
self.checkpoint
Return type
str
restore_checkpoint()
Restore the body checkpoint
from self.checkpoint
Return type
None
restartCheckpoint(jobStore)
Restart a checkpoint after the total failure of jobs in its subtree.
Writes the changes to the jobStore immediately. All the checkpoint's successors will be deleted, but its try count will not be decreased.
Returns a list
with the IDs of any successors deleted.
Parameters
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore )
Return type
list [ str ]
class
toil.job.Job(memory=None, cores=None, disk=None,
accelerators=None, preemptible=None, preemptable=None,
unitName='',
checkpoint=False, displayName='', descriptionClass=None,
local=None,
files=None)
Class represents a unit of work
in toil.
Parameters
|
• |
memory ( Optional[ParseableIndivisibleResource] ) |
|||
|
• |
cores ( Optional[ParseableDivisibleResource] ) |
|||
|
• |
disk ( Optional[ParseableIndivisibleResource] ) |
|||
|
• |
accelerators ( Optional[ParseableAcceleratorRequirement] ) |
|||
|
• |
preemptible ( Optional[ParseableFlag] ) |
|||
|
• |
preemptable ( Optional[ParseableFlag] ) |
|||
|
• |
unitName ( Optional[str] ) |
|||
|
• |
checkpoint ( Optional[bool] ) |
|||
|
• |
displayName ( Optional[str] ) |
|||
|
• |
descriptionClass ( Optional[type] ) |
|||
|
• |
local ( Optional[bool] ) |
|||
|
• |
files ( Optional[set[toil.fileStores.FileID]] ) |
userModule:
toil.resource.ModuleDescriptor
__str__()
Produce a useful logging string to identify this Job and distinguish it from its JobDescription.
check_initialized()
Ensure that Job.__init__() has been called by any subclass __init__().
This uses the fact that the self._description instance variable should always be set after __init__().
If __init__()
has not been called, raise an error.
Return type
None
property jobStoreID: str | TemporaryID
Get the ID of this Job.
Return type
Union[ str , TemporaryID ]
property description: JobDescription
Expose the JobDescription that
describes this job.
Return type
JobDescription
property disk: int
The maximum number of bytes of
disk the job will require to run.
Return type
int
property memory
The maximum number of bytes of memory the job will require to run.
property cores: int | float
The number of CPU cores
required.
Return type
Union[ int , float ]
property accelerators: list [ AcceleratorRequirement ]
Any accelerators, such as GPUs,
that are needed.
Return type
list [ AcceleratorRequirement ]
property preemptible: bool
Whether the job can be run on a
preemptible node.
Return type
bool
preemptable()
property checkpoint:
bool
Determine if the job is a
checkpoint job or not.
Return type
bool
property files_to_use: set [ toil.fileStores.FileID ]
Return type
set [ toil.fileStores.FileID ]
add_to_files_to_use(val)
Parameters
val ( toil.fileStores.FileID )
remove_from_files_to_use(val)
Parameters
val ( toil.fileStores.FileID )
assignConfig(config)
Assign the given config object.
It will be used
by various actions implemented inside the Job class.
Parameters
config ( toil.common.Config ) -- Config object to query
Return type
None
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore (- toil.fileStores.abstractFileStore.AbstractFileStore ) -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
Return type
Any
addChild(childJob)
Add a childJob to be run as child of this job.
Child jobs will
be run directly after this job's
toil.job.Job.run()
method has completed.
Returns
childJob: for call chaining
Parameters
childJob ( Job )
Return type
Job
hasChild(childJob)
Check if childJob is already a
child of this job.
Returns
True if childJob is a child of the job, else False.
Parameters
childJob ( Job )
Return type
bool
addFollowOn(followOnJob)
Add a follow-on job.
Follow-on jobs
will be run after the child jobs and their successors have
been run.
Returns
followOnJob for call chaining
Parameters
followOnJob ( Job )
Return type
Job
hasPredecessor(job)
Check if a given job is already
a predecessor of this job.
Parameters
job ( Job )
Return type
bool
hasFollowOn(followOnJob)
Check if given job is already a
follow-on of this job.
Returns
True if the followOnJob is a follow-on of this job, else False.
Parameters
followOnJob ( Job )
Return type
bool
addService(service, parentService=None)
Add a service.
The toil.job.Job.Service.start() method of the service will be called after the run method has completed but before any successors are run. The service's toil.job.Job.Service.stop() method will be called once the successors of the job have been run.
Services allow things like databases and servers to be started and accessed by jobs in a workflow.
|
Raises |
toil.job.JobException -- If service has already been made the child of a job or another service. |
Parameters
|
• |
service ( Job ) -- Service to add. |
||
|
• |
parentService ( Optional[Job] ) -- Service that will be started before 'service' is started. Allows trees of services to be established. parentService must be a service of this job. |
Returns
a promise that will be replaced with the return value from toil.job.Job.Service.start() of service in any successor of the job.
Return type
Promise
hasService(service)
Return True if the given
Service is a service of this job, and False otherwise.
Parameters
service ( Job )
Return type
bool
addChildFn(fn, *args, **kwargs)
Add a function as a child job.
Parameters
fn ( Callable ) -- Function to be run as a child job with *args and **kwargs as arguments to this function. See toil.job.FunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Returns
The new child job that wraps fn.
Return type
FunctionWrappingJob
addFollowOnFn(fn, *args, **kwargs)
Add a function as a follow-on
job.
Parameters
fn ( Callable ) -- Function to be run as a follow-on job with *args and **kwargs as arguments to this function. See toil.job.FunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Returns
The new follow-on job that wraps fn.
Return type
FunctionWrappingJob
addChildJobFn(fn, *args, **kwargs)
Add a job function as a child job.
See
toil.job.JobFunctionWrappingJob
for a definition of a
job function.
Parameters
fn ( Callable ) -- Job function to be run as a child job with *args and **kwargs as arguments to this function. See toil.job.JobFunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Returns
The new child job that wraps fn.
Return type
FunctionWrappingJob
addFollowOnJobFn(fn, *args, **kwargs)
Add a follow-on job function.
See
toil.job.JobFunctionWrappingJob
for a definition of a
job function.
Parameters
fn ( Callable ) -- Job function to be run as a follow-on job with *args and **kwargs as arguments to this function. See toil.job.JobFunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Returns
The new follow-on job that wraps fn.
Return type
FunctionWrappingJob
property tempDir: str
Shortcut to calling job.fileStore.getLocalTempDir() .
Temp dir is
created on first call and will be returned for first and
future calls :return: Path to tempDir. See
job.fileStore.getLocalTempDir
Return type
str
log(text, level=logging.INFO)
Log using
fileStore.log_to_leader()
.
Parameters
text ( str )
Return type
None
static wrapFn(fn, *args, **kwargs)
Makes a Job out of a function.
Convenience
function for constructor of
toil.job.FunctionWrappingJob
.
Parameters
fn -- Function to be run with *args and **kwargs as arguments. See toil.job.JobFunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Returns
The new function that wraps fn.
Return type
FunctionWrappingJob
static wrapJobFn(fn, *args, **kwargs)
Makes a Job out of a job function.
Convenience
function for constructor of
toil.job.JobFunctionWrappingJob
.
Parameters
fn -- Job function to be run with *args and **kwargs as arguments. See toil.job.JobFunctionWrappingJob for reserved keyword arguments used to specify resource requirements.
Returns
The new job function that wraps fn.
Return type
JobFunctionWrappingJob
encapsulate(name=None)
Encapsulates the job, see
toil.job.EncapsulatedJob
. Convenience function for
constructor of
toil.job.EncapsulatedJob
.
Parameters
name ( Optional[str] ) -- Human-readable name for the encapsulated job.
Returns
an encapsulated version of this job.
Return type
EncapsulatedJob
rv(*path)
Create a promise ( toil.job.Promise ).
The
"promise" representing a return value of the job's
run method, or, in case of a function-wrapping job, the
wrapped function's return value.
Parameters
path ( (Any) ) -- Optional path for selecting a component of the promised return value. If absent or empty, the entire return value will be used. Otherwise, the first element of the path is used to select an individual item of the return value. For that to work, the return value must be a list, dictionary or of any other type implementing the __getitem__() magic method. If the selected item is yet another composite value, the second element of the path can be used to select an item from it, and so on. For example, if the return value is [6,{'a':42}] , .rv(0) would select 6 , rv(1) would select {'a':3} while rv(1,'a') would select 3 . To select a slice from a return value that is slicable, e.g. tuple or list, the path element should be a slice object. For example, assuming that the return value is [6, 7, 8, 9] then .rv(slice(1, 3)) would select [7, 8] . Note that slicing really only makes sense at the end of path.
Returns
A promise representing the return value of this jobs toil.job.Job.run() method.
Return type
Promise
registerPromise(path)
prepareForPromiseRegistration(jobStore)
Set up to allow this job's promises to register themselves.
Prepare this job (the promisor) so that its promises can register themselves with it, when the jobs they are promised to (promisees) are serialized.
The promissee
holds the reference to the promise (usually as part of the
job arguments) and when it is being pickled, so will the
promises it refers to. Pickling a promise triggers it to be
registered with the promissor.
Parameters
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore )
Return type
None
checkJobGraphForDeadlocks()
Ensures that a graph of Jobs (that hasn't yet been saved to the JobStore) doesn't contain any pathological relationships between jobs that would result in deadlocks if we tried to run the jobs.
See toil.job.Job.checkJobGraphConnected() , toil.job.Job.checkJobGraphAcyclic() and toil.job.Job.checkNewCheckpointsAreLeafVertices() for more info.
|
Raises |
toil.job.JobGraphDeadlockException -- if the job graph is cyclic, contains multiple roots or contains checkpoint jobs that are not leaf vertices when defined (see toil.job.Job.checkNewCheckpointsAreLeaves() ). |
getRootJobs()
Return the set of root job objects that contain this job.
A root job is a job with no predecessors (i.e. which are not children, follow-ons, or services).
Only deals with
jobs created here, rather than loaded from the job store.
Return type
set [ Job ]
checkJobGraphConnected()
|
Raises |
toil.job.JobGraphDeadlockException -- if toil.job.Job.getRootJobs() does not contain exactly one root job. |
As execution always starts from one root job, having multiple root jobs will cause a deadlock to occur.
Only deals with jobs created here, rather than loaded from the job store.
checkJobGraphAcylic()
|
Raises |
toil.job.JobGraphDeadlockException -- if the connected component of jobs containing this job contains any cycles of child/followOn dependencies in the augmented job graph (see below). Such cycles are not allowed in valid job graphs. |
A follow-on edge (A, B) between two jobs A and B is equivalent to adding a child edge to B from (1) A, (2) from each child of A, and (3) from the successors of each child of A. We call each such edge an edge an "implied" edge. The augmented job graph is a job graph including all the implied edges.
For a job graph G = (V, E) the algorithm is O(|V|ˆ2) . It is O(|V| + |E|) for a graph with no follow-ons. The former follow-on case could be improved!
Only deals with jobs created here, rather than loaded from the job store.
checkNewCheckpointsAreLeafVertices()
A checkpoint job is a job that is restarted if either it fails, or if any of its successors completely fails, exhausting their retries.
A job is a leaf it is has no successors.
A checkpoint job must be a leaf when initially added to the job graph. When its run method is invoked it can then create direct successors. This restriction is made to simplify implementation.
Only works on connected components of jobs not yet added to the JobStore.
|
Raises |
toil.job.JobGraphDeadlockException -- if there exists a job being added to the graph for which checkpoint=True and which is not a leaf. |
Return type
None
defer(function, *args, **kwargs)
Register a deferred function, i.e. a callable that will be invoked after the current attempt at running this job concludes. A job attempt is said to conclude when the job function (or the toil.job.Job.run() method for class-based jobs) returns, raises an exception or after the process running it terminates abnormally. A deferred function will be called on the node that attempted to run the job, even if a subsequent attempt is made on another node. A deferred function should be idempotent because it may be called multiple times on the same node or even in the same process. More than one deferred function may be registered per job attempt by calling this method repeatedly with different arguments. If the same function is registered twice with the same or different arguments, it will be called twice per job attempt.
Examples for
deferred functions are ones that handle cleanup of resources
external to Toil, like Docker containers, files outside the
work directory, etc.
Parameters
|
• |
function ( callable ) -- The function to be called after this job concludes. |
||
|
• |
args ( list ) -- The arguments to the function |
||
|
• |
kwargs ( dict ) -- The keyword arguments to the function |
Return type
None
class Runner
Used to setup and run Toil
workflow.
static
getDefaultArgumentParser(jobstore_as_flag=False)
Get argument parser with added
toil workflow options.
Parameters
jobstore_as_flag ( bool ) -- make the job store option a --jobStore flag instead of a required jobStore positional argument.
Returns
The argument parser used by a toil workflow with added Toil options.
Return type
argparse.ArgumentParser
static
getDefaultOptions(jobStore=None,
jobstore_as_flag=False)
Get default options for a toil
workflow.
Parameters
|
• |
jobStore ( Optional[str] ) -- A string describing the jobStore for the workflow. |
||
|
• |
jobstore_as_flag ( bool ) -- make the job store option a --jobStore flag instead of a required jobStore positional argument. |
Returns
The options used by a toil workflow.
Return type
argparse.Namespace
static addToilOptions(parser, jobstore_as_flag=False)
Adds the default toil options
to an
optparse
or
argparse
parser object.
Parameters
|
• |
parser ( Union[optparse.OptionParser, argparse.ArgumentParser] ) -- Options object to add toil options to. |
||
|
• |
jobstore_as_flag ( bool ) -- make the job store option a --jobStore flag instead of a required jobStore positional argument. |
Return type
None
static startToil(job, options)
Run the toil workflow using the given options.
Deprecated by toil.common.Toil.start.
(see
Job.Runner.getDefaultOptions and Job.Runner.addToilOptions)
starting with this job. :param job: root job of the workflow
:raises: toil.exceptions.FailedJobsException if at the end
of function there remain failed jobs. :return: The return
value of the root job's run function.
Parameters
job ( Job )
Return type
Any
class Service(memory=None,
cores=None, disk=None,
accelerators=None, preemptible=None, unitName=None)
Bases: Requirer
Abstract class used to define the interface to a service.
Should be subclassed by the user to define services.
Is not executed
as a job; runs within a ServiceHostJob.
unitName
jobName
hostID = None
abstract start(job)
Start the service.
Parameters
job ( Job ) -- The underlying host job that the service is being run in. Can be used to register deferred functions, or to access the fileStore for creating temporary files.
Returns
An object describing how to access the service. The object must be pickleable and will be used by jobs to access the service (see toil.job.Job.addService() ).
Return type
Any
abstract stop(job)
Stops the service. Function can
block until complete.
Parameters
job ( Job ) -- The underlying host job that the service is being run in. Can be used to register deferred functions, or to access the fileStore for creating temporary files.
Return type
None
check()
Checks the service is still running.
|
Raises |
exceptions.RuntimeError -- If the service failed, this will cause the service job to be labeled failed. |
Returns
True if the service is still running, else False. If False then the service job will be terminated, and considered a success. Important point: if the service job exits due to a failure, it should raise a RuntimeError, not return False!
Return type
bool
getUserScript()
Return type
toil.resource.ModuleDescriptor
getTopologicalOrderingOfJobs()
Returns
a list of jobs such that for all pairs of indices i, j for which i < j, the job at index i can be run before the job at index j.
Return type
list [ Job ]
Only considers jobs in this job's subgraph that are newly added, not loaded from the job store.
Ignores service jobs.
saveBody(jobStore)
Save the execution data for just this job to the JobStore, and fill in the JobDescription with the information needed to retrieve it.
The Job's JobDescription must have already had a real jobStoreID assigned to it.
Does not save
the JobDescription.
Parameters
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore ) -- The job store to save the job body into.
Return type
None
saveAsRootJob(jobStore)
Save this job to the given
jobStore as the root job of the workflow.
Returns
the JobDescription describing this job.
Parameters
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore )
Return type
JobDescription
classmethod loadJob(job_store, job_description)
Retrieves a
toil.job.Job
instance from a JobStore
Parameters
|
• |
job_store (- toil.jobStores.abstractJobStore.AbstractJobStore ) -- The job store. |
||
|
• |
job_description ( JobDescription ) -- the JobDescription of the job to retrieve. |
Returns
The job referenced by the JobDescription.
Return type
Job
set_debug_flag(flag)
Enable the given debug option
on the job.
Parameters
flag ( str )
Return type
None
has_debug_flag(flag)
Return true if the given debug
flag is set.
Parameters
flag ( str )
Return type
bool
files_downloaded_hook(host_and_job_paths=None)
Function that subclasses can call when they have downloaded their input files.
Will abort the job if the "download_only" debug flag is set.
Can be hinted a
list of file path pairs outside and inside the job
container, in which case the container environment can be
reconstructed.
Parameters
host_and_job_paths ( Optional[list[tuple[str, str]]] )
Return type
None
exception toil.job.JobException(message)
Bases: Exception
General job
exception.
Parameters
message ( str )
exception toil.job.JobGraphDeadlockException(string)
Bases: JobException
An exception raised in the event that a workflow contains an unresolvable dependency, such as a cycle. See toil.job.Job.checkJobGraphForDeadlocks() .
class toil.job.FunctionWrappingJob(userFunction, *args, **kwargs)
Bases: Job
Job used to
wrap a function. In its
run
method the wrapped
function is called.
userFunctionModule
userFunctionName
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
getUserScript()
class toil.job.JobFunctionWrappingJob(userFunction, *args, **kwargs)
Bases: FunctionWrappingJob
A job function is a function whose first argument is a Job instance that is the wrapping job for the function. This can be used to add successor jobs for the function and perform all the functions the Job class provides.
To enable the job function to get access to the toil.fileStores.abstractFileStore.AbstractFileStore instance (see toil.job.Job.run() ), it is made a variable of the wrapping job called fileStore.
To specify a job's resource requirements the following default keyword arguments can be specified:
|
• |
memory |
|||
|
• |
disk |
|||
|
• |
cores |
|||
|
• |
accelerators |
|||
|
• |
preemptible |
For example to wrap a function into a job we would call:
Job.wrapJobFn(myJob, memory='100k', disk='1M', cores=0.1)
property fileStore
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
class
toil.job.PromisedRequirementFunctionWrappingJob(userFunction,
*args, **kwargs)
Bases: FunctionWrappingJob
Handles dynamic
resource allocation using
toil.job.Promise
instances.
Spawns child function using parent function parameters and
fulfilled promised resource requirements.
classmethod create(userFunction, *args, **kwargs)
Creates an encapsulated Toil job function with unfulfilled promised resource requirements. After the promises are fulfilled, a child job function is created using updated resource values. The subgraph is encapsulated to ensure that this child job function is run before other children in the workflow. Otherwise, a different child may try to use an unresolved promise return value from the parent.
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
evaluatePromisedRequirements()
class
toil.job.PromisedRequirementJobFunctionWrappingJob(userFunction,
*args, **kwargs)
Bases: PromisedRequirementFunctionWrappingJob
Handles dynamic
resource allocation for job functions. See
toil.job.JobFunctionWrappingJob
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
class toil.job.EncapsulatedJob(job, unitName=None)
Bases: Job
A convenience Job class used to make a job subgraph appear to be a single job.
Let A be the root job of a job subgraph and B be another job we'd like to run after A and all its successors have completed, for this use encapsulate:
# Job A and
subgraph, Job B
A, B = A(), B()
Aprime = A.encapsulate()
Aprime.addChild(B)
# B will run after A and all its successors have completed,
A and its subgraph of
# successors in effect appear to be just one job.
If the job being encapsulated has predecessors (e.g. is not the root job), then the encapsulated job will inherit these predecessors. If predecessors are added to the job being encapsulated after the encapsulated job is created then the encapsulating job will NOT inherit these predecessors automatically. Care should be exercised to ensure the encapsulated job has the proper set of predecessors.
The return
value of an encapsulated job (as accessed by the
toil.job.Job.rv()
function) is the return value of
the root job, e.g. A().encapsulate().rv() and A().rv() will
resolve to the same value after A or A.encapsulate() has
been run.
addChild(childJob)
Add a childJob to be run as child of this job.
Child jobs will
be run directly after this job's
toil.job.Job.run()
method has completed.
Returns
childJob: for call chaining
addService(service, parentService=None)
Add a service.
The toil.job.Job.Service.start() method of the service will be called after the run method has completed but before any successors are run. The service's toil.job.Job.Service.stop() method will be called once the successors of the job have been run.
Services allow things like databases and servers to be started and accessed by jobs in a workflow.
|
Raises |
toil.job.JobException -- If service has already been made the child of a job or another service. |
Parameters
|
• |
service -- Service to add. |
||
|
• |
parentService -- Service that will be started before 'service' is started. Allows trees of services to be established. parentService must be a service of this job. |
Returns
a promise that will be replaced with the return value from toil.job.Job.Service.start() of service in any successor of the job.
addFollowOn(followOnJob)
Add a follow-on job.
Follow-on jobs
will be run after the child jobs and their successors have
been run.
Returns
followOnJob for call chaining
rv(*path)
Create a promise ( toil.job.Promise ).
The
"promise" representing a return value of the job's
run method, or, in case of a function-wrapping job, the
wrapped function's return value.
Parameters
path ( (Any) ) -- Optional path for selecting a component of the promised return value. If absent or empty, the entire return value will be used. Otherwise, the first element of the path is used to select an individual item of the return value. For that to work, the return value must be a list, dictionary or of any other type implementing the __getitem__() magic method. If the selected item is yet another composite value, the second element of the path can be used to select an item from it, and so on. For example, if the return value is [6,{'a':42}] , .rv(0) would select 6 , rv(1) would select {'a':3} while rv(1,'a') would select 3 . To select a slice from a return value that is slicable, e.g. tuple or list, the path element should be a slice object. For example, assuming that the return value is [6, 7, 8, 9] then .rv(slice(1, 3)) would select [7, 8] . Note that slicing really only makes sense at the end of path.
Returns
A promise representing the return value of this jobs toil.job.Job.run() method.
Return type
Promise
prepareForPromiseRegistration(jobStore)
Set up to allow this job's promises to register themselves.
Prepare this job (the promisor) so that its promises can register themselves with it, when the jobs they are promised to (promisees) are serialized.
The promissee holds the reference to the promise (usually as part of the job arguments) and when it is being pickled, so will the promises it refers to. Pickling a promise triggers it to be registered with the promissor.
__reduce__()
Called during pickling to define the pickled representation of the job.
We don't want to pickle our internal references to the job we encapsulate, so we elide them here. When actually run, we're just a no-op job that can maybe chain.
getUserScript()
class toil.job.ServiceHostJob(service)
Bases: Job
Job that runs a
service. Used internally by Toil. Users should subclass
Service instead of using this.
serviceModule
service
pickledService = None
property fileStore
Return the file store, which the Service may need.
addChild(child)
Add a childJob to be run as child of this job.
Child jobs will
be run directly after this job's
toil.job.Job.run()
method has completed.
Returns
childJob: for call chaining
addFollowOn(followOn)
Add a follow-on job.
Follow-on jobs
will be run after the child jobs and their successors have
been run.
Returns
followOnJob for call chaining
addService(service, parentService=None)
Add a service.
The toil.job.Job.Service.start() method of the service will be called after the run method has completed but before any successors are run. The service's toil.job.Job.Service.stop() method will be called once the successors of the job have been run.
Services allow things like databases and servers to be started and accessed by jobs in a workflow.
|
Raises |
toil.job.JobException -- If service has already been made the child of a job or another service. |
Parameters
|
• |
service -- Service to add. |
||
|
• |
parentService -- Service that will be started before 'service' is started. Allows trees of services to be established. parentService must be a service of this job. |
Returns
a promise that will be replaced with the return value from toil.job.Job.Service.start() of service in any successor of the job.
saveBody(jobStore)
Serialize the service itself before saving the host job's body.
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
getUserScript()
class toil.job.FileMetadata
Bases: NamedTuple
Metadata for a
file. source is the URL to grab the file from parent_dir is
parent directory of the source size is the size of the file.
Is none if the filesize cannot be retrieved.
source:
str
parent_dir:
str
size:
int
|
None
toil.job.potential_absolute_uris(uri,
path, importer=None,
execution_dir=None)
Get potential absolute URIs to check for an imported file.
Given a URI or
bare path, yield in turn all the URIs, with schemes, where
we should actually try to find it, given that we want to
search under/against the given paths or URIs, the current
directory, and the given importing WDL document if any.
Parameters
|
• |
uri ( str ) |
|||
|
• |
path ( list[str] ) |
|||
|
• |
importer ( Optional[str] ) |
|||
|
• |
execution_dir ( Optional[str] ) |
Return type
Iterator[ str ]
toil.job.get_file_sizes(filenames,
file_source, search_paths=None,
include_remote_files=True, execution_dir=None)
Resolve relative-URI files in
the given environment and turn them into absolute normalized
URIs. Returns a dictionary of the
string values
from
the WDL file values to a tuple of the normalized URI, parent
directory ID, and size of the file. The size of the file may
be None, which means unknown size.
Parameters
|
• |
filenames ( List[str] ) -- list of filenames to evaluate on |
||
|
• |
file_source (- toil.jobStores.abstractJobStore.AbstractJobStore ) -- Context to search for files with |
||
|
• |
task_path -- Dotted WDL name of the user-level code doing the importing (probably the workflow name). |
||
|
• |
search_paths ( Optional[List[str]] ) -- If set, try resolving input location relative to the URLs or directories in this list. |
||
|
• |
include_remote_files ( bool ) -- If set, import files from remote locations. Else leave them as URI references. |
||
|
• |
execution_dir ( Optional[str] ) |
Return type
Dict[ str , FileMetadata ]
class toil.job.CombineImportsJob(d, **kwargs)
Bases: Job
Combine the
outputs of multiple WorkerImportsJob into one promise
Parameters
d ( Sequence[Promised[Dict[str, toil.fileStores.FileID]]] )
run(file_store)
Merge the dicts
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
Promised[Dict[ str , toil.fileStores.FileID ]]
class toil.job.WorkerImportJob(filenames, local=False, **kwargs)
Bases: Job
Job to do file imports on a worker instead of a leader. Assumes all local and cloud files are accessible.
For the CWL/WDL
runners, this class is only used when runImportsOnWorkers is
enabled.
Parameters
|
• |
filenames ( List[str] ) |
|||
|
• |
local ( bool ) |
|||
|
• |
kwargs ( Any ) |
filenames
static import_files(files, file_source)
Import a list of files into the jobstore. Returns a mapping of the filename to the associated FileIDs
When stream is
true but the import is not streamable, the worker will run
out of disk space and run a new import job with enough disk
space instead. :param files: list of files to import :param
file_source: AbstractJobStore :return: Dictionary mapping
filenames to associated jobstore FileID
Parameters
|
• |
files ( List[str] ) |
||
|
• |
file_source (- toil.jobStores.abstractJobStore.AbstractJobStore ) |
Return type
Dict[ str , toil.fileStores.FileID ]
run(file_store)
Import the workflow inputs and
then create and run the workflow. :return: Promise of
workflow outputs
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
Promised[Dict[ str , toil.fileStores.FileID ]]
class
toil.job.ImportsJob(file_to_data, max_batch_size,
import_worker_disk, **kwargs)
Bases: Job
Job to organize and delegate files to individual WorkerImportJobs.
For the CWL/WDL
runners, this is only used when runImportsOnWorkers is
enabled
Parameters
|
• |
file_to_data ( Dict[str, FileMetadata] ) |
|||
|
• |
max_batch_size ( ParseableIndivisibleResource ) |
|||
|
• |
import_worker_disk ( ParseableIndivisibleResource ) |
|||
|
• |
kwargs ( Any ) |
run(file_store)
Import the workflow inputs and
then create and run the workflow. :return: Tuple of a
mapping from the candidate uri to the file id and a mapping
of the source filenames to its metadata. The candidate uri
is a field in the file metadata
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
Tuple[Promised[Dict[ str , toil.fileStores.FileID ]], Dict[ str , FileMetadata ]]
class toil.job.Promise(job, path)
References a return value from a method as a promise before the method itself is run.
References a return value from a toil.job.Job.run() or toil.job.Job.Service.start() method as a promise before the method itself is run.
Let T be a job.
Instances of
Promise
(termed a
promise
) are
returned by T.rv(), which is used to reference the return
value of T's run function. When the promise is passed to the
constructor (or as an argument to a wrapped function) of a
different, successor job the promise will be replaced by the
actual referenced return value. This mechanism allows a
return values from one job's run method to be input argument
to job before the former job's run function has been
executed.
Parameters
|
• |
job ( Job ) |
|||
|
• |
path ( Any ) |
filesToDelete
A set of IDs of files containing promised values when we know we won't need them anymore
|
job |
||
|
path |
__reduce__()
Return the Promise class and construction arguments.
Called during pickling when a promise (an instance of this class) is about to be be pickled. Returns the Promise class and construction arguments that will be evaluated during unpickling, namely the job store coordinates of a file that will hold the promised return value. By the time the promise is about to be unpickled, that file should be populated.
toil.job.T
toil.job.Promised
toil.job.unwrap(p)
Function for ensuring you actually have a promised value, and not just a promise. Mostly useful for satisfying type-checking.
The
"unwrap" terminology is borrowed from Rust.
Parameters
p ( Promised[T] )
Return type
T
toil.job.unwrap_all(p)
Function for ensuring you actually have a collection of promised values, and not any remaining promises. Mostly useful for satisfying type-checking.
The
"unwrap" terminology is borrowed from Rust.
Parameters
p ( Sequence[Promised[T]] )
Return type
Sequence[T]
class toil.job.PromisedRequirement(valueOrCallable, *args)
Class for dynamically allocating job function resource requirements.
(involving toil.job.Promise instances.)
Use when resource requirements depend on the return value of a parent function. PromisedRequirements can be modified by passing a function that takes the Promise as input.
For example,
let f, g, and h be functions. Then a Toil workflow can be
defined as follows:: A = Job.wrapFn(f) B = A.addChildFn(g,
cores=PromisedRequirement(A.rv()) C = B.addChildFn(h,
cores=PromisedRequirement(lambda x: 2*x, B.rv()))
getValue()
Return PromisedRequirement value.
static convertPromises(kwargs)
Return True if reserved resource keyword is a Promise or PromisedRequirement instance.
Converts
Promise instance to PromisedRequirement.
Parameters
kwargs ( dict[str, Any] ) -- function keyword arguments
Return type
bool
class
toil.job.UnfulfilledPromiseSentinel(fulfillingJobName,
file_id,
unpickled)
This should be overwritten by a proper promised value.
Throws an
exception when unpickled.
Parameters
|
• |
fulfillingJobName ( str ) |
|||
|
• |
file_id ( str ) |
|||
|
• |
unpickled ( Any ) |
fulfillingJobName
file_id
static __setstate__(stateDict)
Only called when unpickling.
This won't be
unpickled unless the promise wasn't resolved, so we throw an
exception.
Parameters
stateDict ( dict[str, Any] )
Return type
None
toil.jobStores
Submodules
toil.jobStores.abstractJobStore
Attributes
Exceptions
Classes
Module Contents
toil.jobStores.abstractJobStore.logger
exception
toil.jobStores.abstractJobStore.ProxyConnectionError
Bases: BaseException
Dummy class.
exception
toil.jobStores.abstractJobStore.LocatorException(error_msg,
locator, prefix=None)
Bases: Exception
Base exception
class for all locator exceptions. For example, job store/aws
bucket exceptions where they already exist
Parameters
|
• |
error_msg ( str ) |
|||
|
• |
locator ( str ) |
|||
|
• |
prefix ( Optional[str] ) |
exception
toil.jobStores.abstractJobStore.InvalidImportExportUrlException(url)
Bases: Exception
Common base
class for all non-exit exceptions.
Parameters
url ( urllib.parse.ParseResult )
exception
toil.jobStores.abstractJobStore.NoSuchJobException(jobStoreID)
Bases: Exception
Indicates that
the specified job does not exist.
Parameters
jobStoreID ( toil.fileStores.FileID )
exception
toil.jobStores.abstractJobStore.ConcurrentFileModificationException(jobStoreFileID)
Bases: Exception
Indicates that
the file was attempted to be modified by multiple processes
at once.
Parameters
jobStoreFileID ( toil.fileStores.FileID )
exception
toil.jobStores.abstractJobStore.NoSuchFileException(jobStoreFileID,
customName=None, *extra)
Bases: Exception
Indicates that
the specified file does not exist.
Parameters
|
• |
jobStoreFileID ( toil.fileStores.FileID ) |
|||
|
• |
customName ( Optional[str] ) |
|||
|
• |
extra ( Any ) |
exception
toil.jobStores.abstractJobStore.NoSuchJobStoreException(locator,
prefix)
Bases: LocatorException
Indicates that
the specified job store does not exist.
Parameters
|
• |
locator ( str ) |
|||
|
• |
prefix ( str ) |
exception
toil.jobStores.abstractJobStore.JobStoreExistsException(locator,
prefix)
Bases: LocatorException
Indicates that
the specified job store already exists.
Parameters
|
• |
locator ( str ) |
|||
|
• |
prefix ( str ) |
class toil.jobStores.abstractJobStore.AbstractJobStore(locator)
Bases: abc.ABC
Represents the physical storage for the jobs and files in a Toil workflow.
JobStores are responsible for storing toil.job.JobDescription (which relate jobs to each other) and files.
Actual toil.job.Job objects are stored in files, referenced by JobDescriptions. All the non-file CRUD methods the JobStore provides deal in JobDescriptions and not full, executable Jobs.
To actually get
ahold of a
toil.job.Job
, use
toil.job.Job.loadJob()
with a JobStore and the
relevant JobDescription.
Parameters
locator ( str )
initialize(config)
Initialize this job store.
Create the
physical storage for this job store, allocate a workflow ID
and persist the given Toil configuration to the store.
Parameters
config ( toil.common.Config ) -- the Toil configuration to initialize this job store with. The given configuration will be updated with the newly allocated workflow ID.
|
Raises |
JobStoreExistsException -- if the physical storage for this job store already exists |
Return type
None
writeConfig()
Return type
None
write_config()
Persists the value of the
AbstractJobStore.config
attribute to the job store,
so that it can be retrieved later by other instances of this
class.
Return type
None
resume()
Connect this instance to the physical storage it represents and load the Toil configuration into the AbstractJobStore.config attribute.
|
Raises |
NoSuchJobStoreException -- if the physical storage for this job store doesn't exist |
Return type
None
property config: toil.common.Config
Return the Toil configuration
associated with this job store.
Return type
toil.common.Config
property locator: str
Get the locator that defines
the job store, which can be used to connect to it.
Return type
str
rootJobStoreIDFileName =
'rootJobStoreID'
setRootJob(rootJobStoreID)
Set the root job of the
workflow backed by this job store.
Parameters
rootJobStoreID ( toil.fileStores.FileID )
Return type
None
set_root_job(job_id)
Set the root job of the
workflow backed by this job store.
Parameters
job_id ( toil.fileStores.FileID ) -- The ID of the job to set as root
Return type
None
loadRootJob()
Return type
toil.job.JobDescription
load_root_job()
Loads the JobDescription for the root job in the current job store.
|
Raises |
toil.job.JobException -- If no root job is set or if the root job doesn't exist in this job store |
Returns
The root job.
Return type
toil.job.JobDescription
createRootJob(desc)
Parameters
desc ( toil.job.JobDescription )
Return type
toil.job.JobDescription
create_root_job(job_description)
Create the given JobDescription
and set it as the root job in this job store.
Parameters
job_description ( toil.job.JobDescription ) -- JobDescription to save and make the root job.
Return type
toil.job.JobDescription
getRootJobReturnValue()
Return type
Any
get_root_job_return_value()
Parse the return value from the root job.
Raises an
exception if the root job hasn't fulfilled its promise yet.
Return type
Any
importFile(srcUrl:
str
, sharedFileName:
str
,
hardlink:
bool
=
False, symlink:
bool
= True) ->
None
importFile(srcUrl:
str
, sharedFileName:
None
= None, hardlink:
bool
= False, symlink:
bool
= True)
->
toil.fileStores.FileID
import_file(src_uri:
str
,
shared_file_name:
str
, hardlink:
bool
= False, symlink:
bool
= True) ->
None
import_file(src_uri:
str
,
shared_file_name:
None
= None,
hardlink:
bool
= False, symlink:
bool
= True) ->
toil.fileStores.FileID
Imports the file at the given URL into job store. The ID of the newly imported file is returned. If the name of a shared file name is provided, the file will be imported as such and None is returned. If an executable file on the local filesystem is uploaded, its executability will be preserved when it is downloaded.
Currently supported schemes are:
|
• |
's3' for objects in Amazon S3
e.g. s3://bucket/key
|
• |
'file' for local files
e.g. file:///local/file/path
|
• |
||||
|
'http' |
e.g. http://someurl.com/path |
|||
|
• |
||||
|
'gs' |
e.g. gs://bucket/file |
Raises
FileNotFoundError if the file does not exist.
Parameters
|
• |
src_uri ( str ) -- URL that points to a file or object in the storage mechanism of a supported URL scheme e.g. a blob in an AWS s3 bucket. It must be a file, not a directory or prefix. |
||
|
• |
shared_file_name ( str ) -- Optional name to assign to the imported file within the job store |
Returns
The jobStoreFileID of the imported file or None if shared_file_name was given
Return type
toil.fileStores.FileID or None
exportFile(jobStoreFileID, dstUrl)
Parameters
|
• |
jobStoreFileID ( toil.fileStores.FileID ) |
|||
|
• |
dstUrl ( str ) |
Return type
None
export_file(file_id, dst_uri)
Exports file to destination pointed at by the destination URL. The exported file will be executable if and only if it was originally uploaded from an executable file on the local filesystem.
Refer to AbstractJobStore.import_file() documentation for currently supported URL schemes.
Note that the
helper method _exportFile is used to read from the source
and write to destination. To implement any optimizations
that circumvent this, the _exportFile method should be
overridden by subclasses of AbstractJobStore.
Parameters
|
• |
file_id ( str ) -- The id of the file in the job store that should be exported. |
||
|
• |
dst_uri ( str ) -- URL that points to a file or object in the storage mechanism of a supported URL scheme e.g. a blob in an AWS s3 bucket. May also be a local path. |
Return type
None
classmethod url_exists(src_uri)
Return True if the file at the given URI exists, and False otherwise.
May raise an
error if file existence cannot be determined.
Parameters
src_uri ( str ) -- URL that points to a file or object in the storage mechanism of a supported URL scheme e.g. a blob in an AWS s3 bucket.
Return type
bool
classmethod get_size(src_uri)
Get the size in bytes of the
file at the given URL, or None if it cannot be obtained.
Parameters
src_uri ( str ) -- URL that points to a file or object in the storage mechanism of a supported URL scheme e.g. a blob in an AWS s3 bucket.
Return type
Optional[ int ]
classmethod get_is_directory(src_uri)
Return True if the thing at the
given URL is a directory, and False if it is a file. The URL
may or may not end in '/'.
Parameters
src_uri ( str )
Return type
bool
classmethod list_url(src_uri)
List the directory at the given URL. Returned path components can be joined with '/' onto the passed URL to form new URLs. Those that end in '/' correspond to directories. The provided URL may or may not end with '/'.
Currently supported schemes are:
|
• |
's3' for objects in Amazon S3
e.g. s3://bucket/prefix/
|
• |
'file' for local files
e.g. file:///local/dir/path/
Parameters
src_uri ( str ) -- URL that points to a directory or prefix in the storage mechanism of a supported URL scheme e.g. a prefix in an AWS s3 bucket.
Returns
A list of URL components in the given directory, already URL-encoded.
Return type
list [ str ]
classmethod read_from_url(src_uri, writable)
Read the given URL and write its content into the given writable stream.
Raises
FileNotFoundError if the URL doesn't exist.
Returns
The size of the file in bytes and whether the executable permission bit is set
Parameters
|
• |
src_uri ( str ) |
|||
|
• |
writable ( IO[bytes] ) |
Return type
tuple [ int , bool ]
classmethod open_url(src_uri)
Read from the given URI.
Raises FileNotFoundError if the URL doesn't exist.
Has a readable
stream interface, unlike
read_from_url()
which takes
a writable stream.
Parameters
src_uri ( str )
Return type
IO[ bytes ]
abstract destroy()
The inverse of
initialize()
, this method deletes the physical
storage represented by this instance. While not being
atomic, this method
is
at least idempotent, as a
means to counteract potential issues with eventual
consistency exhibited by the underlying storage mechanisms.
This means that if the method fails (raises an exception),
it may (and should be) invoked again. If the underlying
storage mechanism is eventually consistent, even a
successful invocation is not an ironclad guarantee that the
physical storage vanished completely and immediately. A
successful invocation only guarantees that the deletion will
eventually happen. It is therefore recommended to not
immediately reuse the same job store location for a new Toil
workflow.
Return type
None
getEnv()
Return type
dict [ str , str ]
get_env()
Returns a dictionary of
environment variables that this job store requires to be set
in order to function properly on a worker.
Return type
dict [ str , str ]
clean(jobCache=None)
Function to cleanup the state of a job store after a restart.
Fixes jobs that
might have been partially updated. Resets the try counts and
removes jobs that are not successors of the current root
job.
Parameters
jobCache ( Optional[dict[Union[str, toil.job.TemporaryID], toil.job.JobDescription]] ) -- if a value it must be a dict from job ID keys to JobDescription object values. Jobs will be loaded from the cache (which can be downloaded from the job store in a batch) instead of piecemeal when recursed into.
Return type
toil.job.JobDescription
assignID(jobDescription)
Parameters
jobDescription ( toil.job.JobDescription )
Return type
None
abstract assign_job_id(job_description)
Get a new jobStoreID to be used by the described job, and assigns it to the JobDescription.
Files
associated with the assigned ID will be accepted even if the
JobDescription has never been created or updated.
Parameters
job_description ( toil.job.JobDescription ) -- The JobDescription to give an ID to
Return type
None
batch()
If supported by the batch
system, calls to create() with this context manager active
will be performed in a batch after the context manager is
released.
Return type
collections.abc.Iterator [None]
create(jobDescription)
Parameters
jobDescription ( toil.job.JobDescription )
Return type
toil.job.JobDescription
abstract create_job(job_description)
Writes the given JobDescription to the job store. The job must have an ID assigned already.
Must call
jobDescription.pre_update_hook()
Returns
The JobDescription passed.
Return type
toil.job.JobDescription
Parameters
job_description ( toil.job.JobDescription )
exists(jobStoreID)
Parameters
jobStoreID ( str )
Return type
bool
abstract job_exists(job_id)
Indicates whether a description
of the job with the specified jobStoreID exists in the job
store
Return type
bool
Parameters
job_id ( str )
publicUrlExpiration
getPublicUrl(fileName)
Parameters
fileName ( str )
Return type
str
abstract get_public_url(file_name)
Returns a publicly accessible
URL to the given file in the job store. The returned URL may
expire as early as 1h after its been returned. Throw an
exception if the file does not exist.
Parameters
file_name ( str ) -- the jobStoreFileID of the file to generate a URL for
|
Raises |
NoSuchFileException -- if the specified file does not exist in this job store |
Return type
str
getSharedPublicUrl(sharedFileName)
Parameters
sharedFileName ( str )
Return type
str
abstract get_shared_public_url(shared_file_name)
Differs from getPublicUrl() in that this method is for generating URLs for shared files written by writeSharedFileStream() .
Returns a
publicly accessible URL to the given file in the job store.
The returned URL starts with 'http:', 'https:' or 'file:'.
The returned URL may expire as early as 1h after its been
returned. Throw an exception if the file does not exist.
Parameters
shared_file_name ( str ) -- The name of the shared file to generate a publically accessible url for.
|
Raises |
NoSuchFileException -- raised if the specified file does not exist in the store |
Return type
str
load(jobStoreID)
Parameters
jobStoreID ( str )
Return type
toil.job.JobDescription
abstract load_job(job_id)
Loads the description of the job referenced by the given ID, assigns it the job store's config, and returns it.
May declare the
job to have failed (see
toil.job.JobDescription.setupJobAfterFailure()
) if
there is evidence of a failed update attempt.
Parameters
job_id ( str ) -- the ID of the job to load
|
Raises |
NoSuchJobException -- if there is no job with the given ID |
Return type
toil.job.JobDescription
update(jobDescription)
Parameters
jobDescription ( toil.job.JobDescription )
Return type
None
abstract update_job(job_description)
Persists changes to the state of the given JobDescription in this store atomically.
Must call
jobDescription.pre_update_hook()
Parameters
|
• |
job ( toil.job.JobDescription ) -- the job to write to this job store |
||
|
• |
job_description ( toil.job.JobDescription ) |
Return type
None
delete(jobStoreID)
Parameters
jobStoreID ( str )
Return type
None
abstract delete_job(job_id)
Removes the JobDescription from the store atomically. You may not then subsequently call load(), write(), update(), etc. with the same jobStoreID or any JobDescription bearing it.
This operation
is idempotent, i.e. deleting a job twice or deleting a
non-existent job will succeed silently.
Parameters
job_id ( str ) -- the ID of the job to delete from this job store
Return type
None
abstract jobs()
Best effort attempt to return
iterator on JobDescriptions for all jobs in the store. The
iterator may not return all jobs and may also contain
orphaned jobs that have already finished successfully and
should not be rerun. To guarantee you get any and all jobs
that can be run instead construct a more expensive ToilState
object
Returns
Returns iterator on jobs in the store. The iterator may or may not contain all jobs and may contain invalid jobs
Return type
Iterator[toil.job.jobDescription]
writeFile(localFilePath, jobStoreID=None, cleanup=False)
Parameters
|
• |
localFilePath ( str ) |
|||
|
• |
jobStoreID ( Optional[str] ) |
|||
|
• |
cleanup ( bool ) |
Return type
str
abstract write_file(local_path, job_id=None, cleanup=False)
Takes a file (as a path) and
places it in this job store. Returns an ID that can be used
to retrieve the file at a later time. The file is written in
a atomic manner. It will not appear in the jobStore until
the write has successfully completed.
Parameters
|
• |
local_path ( str ) -- the path to the local file that will be uploaded to the job store. The last path component (basename of the file) will remain associated with the file in the file store, if supported, so that the file can be searched for by name or name glob. |
||
|
• |
job_id ( str ) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchJobException -- if the job specified via jobStoreID does not exist |
Return type
str
FIXME: some
implementations may not raise this
Returns
an ID referencing the newly created file and can be used to read the file in the future.
Return type
str
Parameters
|
• |
local_path ( str ) |
|||
|
• |
job_id ( Optional[str] ) |
|||
|
• |
cleanup ( bool ) |
writeFileStream(jobStoreID=None,
cleanup=False, basename=None,
encoding=None, errors=None)
Parameters
|
• |
jobStoreID ( Optional[str] ) |
|||
|
• |
cleanup ( bool ) |
|||
|
• |
basename ( Optional[str] ) |
|||
|
• |
encoding ( Optional[str] ) |
|||
|
• |
errors ( Optional[str] ) |
Return type
ContextManager[ tuple [IO[ bytes ], str ]]
abstract
write_file_stream(job_id=None, cleanup=False,
basename=None, encoding=None, errors=None)
Similar to writeFile, but
returns a context manager yielding a tuple of 1) a file
handle which can be written to and 2) the ID of the
resulting file in the job store. The yielded file handle
does not need to and should not be closed explicitly. The
file is written in a atomic manner. It will not appear in
the jobStore until the write has successfully completed.
Parameters
|
• |
job_id ( str ) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
• |
basename ( str ) -- If supported by the implementation, use the given file basename so that when searching the job store with a query matching that basename, the file will be detected. |
||
|
• |
encoding ( str ) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchJobException -- if the job specified via jobStoreID does not exist |
Return type
collections.abc.Iterator [ tuple [IO[ bytes ], str ]]
FIXME: some
implementations may not raise this
Returns
a context manager yielding a file handle which can be written to and an ID that references the newly created file and can be used to read the file in the future.
Return type
Iterator[Tuple[IO[ bytes ], str ]]
Parameters
|
• |
job_id ( Optional[str] ) |
|||
|
• |
cleanup ( bool ) |
|||
|
• |
basename ( Optional[str] ) |
|||
|
• |
encoding ( Optional[str] ) |
|||
|
• |
errors ( Optional[str] ) |
getEmptyFileStoreID(jobStoreID=None,
cleanup=False,
basename=None)
Parameters
|
• |
jobStoreID ( Optional[str] ) |
|||
|
• |
cleanup ( bool ) |
|||
|
• |
basename ( Optional[str] ) |
Return type
str
abstract
get_empty_file_store_id(job_id=None, cleanup=False,
basename=None)
Creates an empty file in the
job store and returns its ID. Call to
fileExists(getEmptyFileStoreID(jobStoreID)) will return
True.
Parameters
|
• |
job_id ( str ) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
• |
basename ( str ) -- If supported by the implementation, use the given file basename so that when searching the job store with a query matching that basename, the file will be detected. |
Returns
a jobStoreFileID that references the newly created file and can be used to reference the file in the future.
Return type
str
readFile(jobStoreFileID, localFilePath, symlink=False)
Parameters
|
• |
jobStoreFileID ( str ) |
|||
|
• |
localFilePath ( str ) |
|||
|
• |
symlink ( bool ) |
Return type
None
abstract read_file(file_id, local_path, symlink=False)
Copies or hard links the file referenced by jobStoreFileID to the given local file path. The version will be consistent with the last copy of the file written/updated. If the file in the job store is later modified via updateFile or updateFileStream, it is implementation-defined whether those writes will be visible at localFilePath. The file is copied in an atomic manner. It will not appear in the local file system until the copy has completed.
The file at the given local path may not be modified after this method returns!
Note!
Implementations of readFile need to respect/provide the
executable attribute on FileIDs.
Parameters
|
• |
file_id ( str ) -- ID of the file to be copied |
||
|
• |
local_path ( str ) -- the local path indicating where to place the contents of the given file in the job store |
||
|
• |
symlink ( bool ) -- whether the reader can tolerate a symlink. If set to true, the job store may create a symlink instead of a full copy of the file or a hard link. |
Return type
None
readFileStream(jobStoreFileID, encoding=None, errors=None)
Parameters
|
• |
jobStoreFileID ( str ) |
|||
|
• |
encoding ( Optional[str] ) |
|||
|
• |
errors ( Optional[str] ) |
Return type
Union[ContextManager[IO[ bytes ]], ContextManager[IO[ str ]]]
read_file_stream(file_id:
toil.fileStores.FileID
|
str
,
encoding: Literal[None] = None, errors:
str
|
None
= None) ->
ContextManager[IO[
bytes
]]
read_file_stream(file_id:
toil.fileStores.FileID
|
str
,
encoding:
str
, errors:
str
|
None
= None) -> ContextManager[IO[-
str
]]
Similar to readFile, but
returns a context manager yielding a file handle which can
be read from. The yielded file handle does not need to and
should not be closed explicitly.
Parameters
|
• |
file_id ( str ) -- ID of the file to get a readable file handle for |
||
|
• |
encoding ( str ) -- the name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
Returns
a context manager yielding a file handle which can be read from
Return type
Iterator[Union[IO[ bytes ], IO[ str ]]]
deleteFile(jobStoreFileID)
Parameters
jobStoreFileID ( str )
Return type
None
abstract delete_file(file_id)
Deletes the file with the given
ID from this job store. This operation is idempotent, i.e.
deleting a file twice or deleting a non-existent file will
succeed silently.
Parameters
file_id ( str ) -- ID of the file to delete
Return type
None
fileExists(jobStoreFileID)
Determine whether a file exists
in this job store.
Parameters
jobStoreFileID ( str )
Return type
bool
abstract file_exists(file_id)
Determine whether a file exists
in this job store.
Parameters
file_id ( str ) -- an ID referencing the file to be checked
Return type
bool
getFileSize(jobStoreFileID)
Get the size of the given file
in bytes.
Parameters
jobStoreFileID ( str )
Return type
int
abstract get_file_size(file_id)
Get the size of the given file in bytes, or 0 if it does not exist when queried.
Note that job
stores which encrypt files might return overestimates of
file sizes, since the encrypted file may have been padded to
the nearest block, augmented with an initialization vector,
etc.
Parameters
file_id ( str ) -- an ID referencing the file to be checked
Return type
int
updateFile(jobStoreFileID, localFilePath)
Replaces the existing version
of a file in the job store.
Parameters
|
• |
jobStoreFileID ( str ) |
|||
|
• |
localFilePath ( str ) |
Return type
None
abstract update_file(file_id, local_path)
Replaces the existing version of a file in the job store.
Throws an
exception if the file does not exist.
Parameters
|
• |
file_id ( str ) -- the ID of the file in the job store to be updated |
||
|
• |
local_path ( str ) -- the local path to a file that will overwrite the current version in the job store |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchFileException -- if the specified file does not exist |
Return type
None
updateFileStream(jobStoreFileID, encoding=None, errors=None)
Parameters
|
• |
jobStoreFileID ( str ) |
|||
|
• |
encoding ( Optional[str] ) |
|||
|
• |
errors ( Optional[str] ) |
Return type
ContextManager[IO[Any]]
abstract update_file_stream(file_id, encoding=None, errors=None)
Replaces the existing version
of a file in the job store. Similar to writeFile, but
returns a context manager yielding a file handle which can
be written to. The yielded file handle does not need to and
should not be closed explicitly.
Parameters
|
• |
file_id ( str ) -- the ID of the file in the job store to be updated |
||
|
• |
encoding ( str ) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchFileException -- if the specified file does not exist |
Return type
collections.abc.Iterator [IO[Any]]
sharedFileNameRegex
writeSharedFileStream(sharedFileName, isProtected=None,
encoding=None, errors=None)
Parameters
|
• |
sharedFileName ( str ) |
|||
|
• |
isProtected ( Optional[bool] ) |
|||
|
• |
encoding ( Optional[str] ) |
|||
|
• |
errors ( Optional[str] ) |
Return type
ContextManager[IO[ bytes ]]
abstract
write_shared_file_stream(shared_file_name,
encrypted=None, encoding=None, errors=None)
Returns a context manager
yielding a writable file handle to the global file
referenced by the given name. File will be created in an
atomic manner.
Parameters
|
• |
shared_file_name ( str ) -- A file name matching AbstractJobStore.fileNameRegex, unique within this job store |
||
|
• |
encrypted ( bool ) -- True if the file must be encrypted, None if it may be encrypted or False if it must be stored in the clear. |
||
|
• |
encoding ( str ) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method
Returns
a context manager yielding a writable file handle
Return type
Iterator[IO[ bytes ]]
readSharedFileStream(sharedFileName, encoding=None, errors=None)
Parameters
|
• |
sharedFileName ( str ) |
|||
|
• |
encoding ( Optional[str] ) |
|||
|
• |
errors ( Optional[str] ) |
Return type
ContextManager[IO[ bytes ]]
abstract
read_shared_file_stream(shared_file_name,
encoding=None, errors=None)
Returns a context manager
yielding a readable file handle to the global file
referenced by the given name.
Parameters
|
• |
shared_file_name ( str ) -- A file name matching AbstractJobStore.fileNameRegex, unique within this job store |
||
|
• |
encoding ( str ) -- the name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
Returns
a context manager yielding a readable file handle
Return type
Iterator[IO[ bytes ]]
writeStatsAndLogging(statsAndLoggingString)
Parameters
statsAndLoggingString ( str )
Return type
None
abstract write_logs(msg)
Stores a message as a log in
the jobstore.
Parameters
msg ( str ) -- the string to be written
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
Return type
None
readStatsAndLogging(callback, readAll=False)
Parameters
|
• |
callback ( Callable[Ellipsis, Any] ) |
|||
|
• |
readAll ( bool ) |
Return type
int
abstract read_logs(callback, read_all=False)
Reads logs accumulated by the write_logs() method. For each log this method calls the given callback function with the message as an argument (rather than returning logs directly, this method must be supplied with a callback which will process log messages).
Only unread
logs will be read unless the read_all parameter is set.
Parameters
|
• |
callback ( Callable ) -- a function to be applied to each of the stats file handles found |
||
|
• |
read_all ( bool ) -- a boolean indicating whether to read the already processed stats files in addition to the unread stats files |
||
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method
Returns
the number of stats files processed
Return type
int
write_leader_pid()
Write the pid of this process to a file in the job store.
Overwriting the
current contents of pid.log is a feature, not a bug of this
method. Other methods will rely on always having the most
current pid available. So far there is no reason to store
any old pids.
Return type
None
read_leader_pid()
Read the pid of the leader process to a file in the job store.
|
Raises |
NoSuchFileException -- If the PID file doesn't exist. |
Return type
int
write_leader_node_id()
Write the leader node id to the
job store. This should only be called by the leader.
Return type
None
read_leader_node_id()
Read the leader node id stored in the job store.
|
Raises |
NoSuchFileException -- If the node ID file doesn't exist. |
Return type
str
write_kill_flag(kill=False)
Write a file inside the job store that serves as a kill flag.
The initialized file contains the characters "NO". This should only be changed when the user runs the "toil kill" command.
Changing this
file to a "YES" triggers a kill of the leader
process. The workers are expected to be cleaned up by the
leader.
Parameters
kill ( bool )
Return type
None
read_kill_flag()
Read the kill flag from the job
store, and return True if the leader has been killed. False
otherwise.
Return type
bool
default_caching()
Jobstore's preference as to whether it likes caching or doesn't care about it. Some jobstores benefit from caching, however on some local configurations it can be flaky.
see
https://github.com/DataBiosphere/toil/issues/4218
Return type
bool
class toil.jobStores.abstractJobStore.JobStoreSupport(locator)
Bases: AbstractJobStore
A mostly fake
JobStore to access URLs not really associated with real job
stores.
Parameters
locator ( str )
toil.jobStores.aws
Submodules
toil.jobStores.aws.jobStore
Attributes
Exceptions
Classes
Module Contents
toil.jobStores.aws.jobStore.boto3_session
toil.jobStores.aws.jobStore.s3_boto3_resource
toil.jobStores.aws.jobStore.s3_boto3_client
toil.jobStores.aws.jobStore.logger
toil.jobStores.aws.jobStore.CONSISTENCY_TICKS = 5
toil.jobStores.aws.jobStore.CONSISTENCY_TIME = 1
exception toil.jobStores.aws.jobStore.ChecksumError
Bases: Exception
Raised when a download from AWS does not contain the correct data.
exception toil.jobStores.aws.jobStore.DomainDoesNotExist(domain_name)
Bases: Exception
Raised when a domain that is expected to exist does not exist.
class
toil.jobStores.aws.jobStore.AWSJobStore(locator, partSize=50
<<
20)
Bases: toil.jobStores.abstractJobStore.AbstractJobStore
A job store
that uses Amazon's S3 for file storage and SimpleDB for
storing job info and enforcing strong consistency on the S3
file storage. There will be SDB domains for jobs and files
and a versioned S3 bucket for file contents. Job objects are
pickled, compressed, partitioned into chunks of 1024 bytes
and each chunk is stored as a an attribute of the SDB item
representing the job. UUIDs are used to identify jobs and
files.
Parameters
|
• |
locator ( str ) |
|||
|
• |
partSize ( int ) |
bucketNameRe
minBucketNameLen = 3
maxBucketNameLen = 63
maxNameLen = 10
nameSeparator = '--'
|
region |
name_prefix
part_size
jobs_domain_name:
str
|
None
=
None
files_domain_name:
str
|
None
=
None
files_bucket = None
|
db |
s3_resource
s3_client
initialize(config)
Initialize this job store.
Create the
physical storage for this job store, allocate a workflow ID
and persist the given Toil configuration to the store.
Parameters
config ( toil.Config ) -- the Toil configuration to initialize this job store with. The given configuration will be updated with the newly allocated workflow ID.
|
Raises |
JobStoreExistsException -- if the physical storage for this job store already exists |
Return type
None
property sseKeyPath: str | None
Return type
Optional[ str ]
resume()
Connect this instance to the physical storage it represents and load the Toil configuration into the AbstractJobStore.config attribute.
|
Raises |
NoSuchJobStoreException -- if the physical storage for this job store doesn't exist |
Return type
None
jobsPerBatchInsert = 25
batch()
If supported by the batch
system, calls to create() with this context manager active
will be performed in a batch after the context manager is
released.
Return type
None
assign_job_id(job_description)
Get a new jobStoreID to be used by the described job, and assigns it to the JobDescription.
Files
associated with the assigned ID will be accepted even if the
JobDescription has never been created or updated.
Parameters
job_description ( toil.job.JobDescription ) -- The JobDescription to give an ID to
Return type
None
create_job(job_description)
Writes the given JobDescription to the job store. The job must have an ID assigned already.
Must call
jobDescription.pre_update_hook()
Returns
The JobDescription passed.
Return type
toil.job.JobDescription
Parameters
job_description ( toil.job.JobDescription )
job_exists(job_id)
Indicates whether a description
of the job with the specified jobStoreID exists in the job
store
Return type
bool
Parameters
job_id ( Union[bytes, str] )
|
jobs() |
Best effort attempt to return iterator on JobDescriptions for all jobs in the store. The iterator may not return all jobs and may also contain orphaned jobs that have already finished successfully and should not be rerun. To guarantee you get any and all jobs that can be run instead construct a more expensive ToilState object |
Returns
Returns iterator on jobs in the store. The iterator may or may not contain all jobs and may contain invalid jobs
Return type
Iterator[toil.job.jobDescription]
load_job(job_id)
Loads the description of the job referenced by the given ID, assigns it the job store's config, and returns it.
May declare the
job to have failed (see
toil.job.JobDescription.setupJobAfterFailure()
) if
there is evidence of a failed update attempt.
Parameters
job_id ( toil.fileStores.FileID ) -- the ID of the job to load
|
Raises |
NoSuchJobException -- if there is no job with the given ID |
Return type
toil.job.Job
update_job(job_description)
Persists changes to the state of the given JobDescription in this store atomically.
Must call
jobDescription.pre_update_hook()
Parameters
job ( toil.job.JobDescription ) -- the job to write to this job store
itemsPerBatchDelete = 25
delete_job(job_id)
Removes the JobDescription from the store atomically. You may not then subsequently call load(), write(), update(), etc. with the same jobStoreID or any JobDescription bearing it.
This operation
is idempotent, i.e. deleting a job twice or deleting a
non-existent job will succeed silently.
Parameters
job_id ( str ) -- the ID of the job to delete from this job store
get_empty_file_store_id(jobStoreID=None,
cleanup=False,
basename=None)
Creates an empty file in the
job store and returns its ID. Call to
fileExists(getEmptyFileStoreID(jobStoreID)) will return
True.
Parameters
|
• |
job_id ( str ) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
• |
basename ( str ) -- If supported by the implementation, use the given file basename so that when searching the job store with a query matching that basename, the file will be detected. |
Returns
a jobStoreFileID that references the newly created file and can be used to reference the file in the future.
Return type
str
write_file(local_path, job_id=None, cleanup=False)
Takes a file (as a path) and
places it in this job store. Returns an ID that can be used
to retrieve the file at a later time. The file is written in
a atomic manner. It will not appear in the jobStore until
the write has successfully completed.
Parameters
|
• |
local_path ( str ) -- the path to the local file that will be uploaded to the job store. The last path component (basename of the file) will remain associated with the file in the file store, if supported, so that the file can be searched for by name or name glob. |
||
|
• |
job_id ( str ) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchJobException -- if the job specified via jobStoreID does not exist |
Return type
toil.fileStores.FileID
FIXME: some
implementations may not raise this
Returns
an ID referencing the newly created file and can be used to read the file in the future.
Return type
str
Parameters
|
• |
local_path ( toil.fileStores.FileID ) |
|||
|
• |
job_id ( Optional[toil.fileStores.FileID] ) |
|||
|
• |
cleanup ( bool ) |
write_file_stream(job_id=None,
cleanup=False, basename=None,
encoding=None, errors=None)
Similar to writeFile, but
returns a context manager yielding a tuple of 1) a file
handle which can be written to and 2) the ID of the
resulting file in the job store. The yielded file handle
does not need to and should not be closed explicitly. The
file is written in a atomic manner. It will not appear in
the jobStore until the write has successfully completed.
Parameters
|
• |
job_id ( str ) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
• |
basename ( str ) -- If supported by the implementation, use the given file basename so that when searching the job store with a query matching that basename, the file will be detected. |
||
|
• |
encoding ( str ) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchJobException -- if the job specified via jobStoreID does not exist |
FIXME: some
implementations may not raise this
Returns
a context manager yielding a file handle which can be written to and an ID that references the newly created file and can be used to read the file in the future.
Return type
Iterator[Tuple[IO[ bytes ], str ]]
Parameters
|
• |
job_id ( Optional[toil.fileStores.FileID] ) |
|||
|
• |
cleanup ( bool ) |
write_shared_file_stream(shared_file_name,
encrypted=None,
encoding=None, errors=None)
Returns a context manager
yielding a writable file handle to the global file
referenced by the given name. File will be created in an
atomic manner.
Parameters
|
• |
shared_file_name ( str ) -- A file name matching AbstractJobStore.fileNameRegex, unique within this job store |
||
|
• |
encrypted ( bool ) -- True if the file must be encrypted, None if it may be encrypted or False if it must be stored in the clear. |
||
|
• |
encoding ( str ) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method
Returns
a context manager yielding a writable file handle
Return type
Iterator[IO[ bytes ]]
update_file(file_id, local_path)
Replaces the existing version of a file in the job store.
Throws an
exception if the file does not exist.
Parameters
|
• |
file_id -- the ID of the file in the job store to be updated |
||
|
• |
local_path -- the local path to a file that will overwrite the current version in the job store |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchFileException -- if the specified file does not exist |
update_file_stream(file_id, encoding=None, errors=None)
Replaces the existing version
of a file in the job store. Similar to writeFile, but
returns a context manager yielding a file handle which can
be written to. The yielded file handle does not need to and
should not be closed explicitly.
Parameters
|
• |
file_id ( str ) -- the ID of the file in the job store to be updated |
||
|
• |
encoding ( str ) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchFileException -- if the specified file does not exist |
file_exists(file_id)
Determine whether a file exists
in this job store.
Parameters
file_id -- an ID referencing the file to be checked
get_file_size(file_id)
Get the size of the given file in bytes, or 0 if it does not exist when queried.
Note that job
stores which encrypt files might return overestimates of
file sizes, since the encrypted file may have been padded to
the nearest block, augmented with an initialization vector,
etc.
Parameters
file_id ( str ) -- an ID referencing the file to be checked
Return type
int
read_file(file_id, local_path, symlink=False)
Copies or hard links the file referenced by jobStoreFileID to the given local file path. The version will be consistent with the last copy of the file written/updated. If the file in the job store is later modified via updateFile or updateFileStream, it is implementation-defined whether those writes will be visible at localFilePath. The file is copied in an atomic manner. It will not appear in the local file system until the copy has completed.
The file at the given local path may not be modified after this method returns!
Note!
Implementations of readFile need to respect/provide the
executable attribute on FileIDs.
Parameters
|
• |
file_id ( str ) -- ID of the file to be copied |
||
|
• |
local_path ( str ) -- the local path indicating where to place the contents of the given file in the job store |
||
|
• |
symlink ( bool ) -- whether the reader can tolerate a symlink. If set to true, the job store may create a symlink instead of a full copy of the file or a hard link. |
read_file_stream(file_id, encoding=None, errors=None)
Similar to readFile, but
returns a context manager yielding a file handle which can
be read from. The yielded file handle does not need to and
should not be closed explicitly.
Parameters
|
• |
file_id ( str ) -- ID of the file to get a readable file handle for |
||
|
• |
encoding ( str ) -- the name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
Returns
a context manager yielding a file handle which can be read from
Return type
Iterator[Union[IO[ bytes ], IO[ str ]]]
read_shared_file_stream(shared_file_name,
encoding=None,
errors=None)
Returns a context manager
yielding a readable file handle to the global file
referenced by the given name.
Parameters
|
• |
shared_file_name ( str ) -- A file name matching AbstractJobStore.fileNameRegex, unique within this job store |
||
|
• |
encoding ( str ) -- the name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
Returns
a context manager yielding a readable file handle
Return type
Iterator[IO[ bytes ]]
delete_file(file_id)
Deletes the file with the given
ID from this job store. This operation is idempotent, i.e.
deleting a file twice or deleting a non-existent file will
succeed silently.
Parameters
file_id ( str ) -- ID of the file to delete
write_logs(msg)
Stores a message as a log in
the jobstore.
Parameters
msg ( str ) -- the string to be written
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
read_logs(callback, read_all=False)
Reads logs accumulated by the write_logs() method. For each log this method calls the given callback function with the message as an argument (rather than returning logs directly, this method must be supplied with a callback which will process log messages).
Only unread
logs will be read unless the read_all parameter is set.
Parameters
|
• |
callback ( Callable ) -- a function to be applied to each of the stats file handles found |
||
|
• |
read_all ( bool ) -- a boolean indicating whether to read the already processed stats files in addition to the unread stats files |
||
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method
Returns
the number of stats files processed
Return type
int
get_public_url(jobStoreFileID)
Returns a publicly accessible
URL to the given file in the job store. The returned URL may
expire as early as 1h after its been returned. Throw an
exception if the file does not exist.
Parameters
file_name ( str ) -- the jobStoreFileID of the file to generate a URL for
|
Raises |
NoSuchFileException -- if the specified file does not exist in this job store |
Return type
str
get_shared_public_url(shared_file_name)
Differs from getPublicUrl() in that this method is for generating URLs for shared files written by writeSharedFileStream() .
Returns a
publicly accessible URL to the given file in the job store.
The returned URL starts with 'http:', 'https:' or 'file:'.
The returned URL may expire as early as 1h after its been
returned. Throw an exception if the file does not exist.
Parameters
shared_file_name ( str ) -- The name of the shared file to generate a publically accessible url for.
|
Raises |
NoSuchFileException -- raised if the specified file does not exist in the store |
Return type
str
sharedFileOwnerID
statsFileOwnerID
readStatsFileOwnerID
class FileInfo(fileID, ownerID, encrypted, version=None,
content=None, numContentChunks=0, checksum=None)
Bases: toil.jobStores.aws.utils.SDBHelper
Represents a
file in this job store.
outer = None
|
Type |
AWSJobStore |
encrypted
property fileID
property ownerID
property version
property previousVersion
property content
property checksum
classmethod create(ownerID)
Parameters
ownerID ( str )
classmethod presenceIndicator()
The key that is guaranteed to be present in the return value of binaryToAttributes(). Assuming that binaryToAttributes() is used with SDB's PutAttributes, the return value of this method could be used to detect the presence/absence of an item in SDB.
classmethod
exists(jobStoreFileID)
classmethod load(jobStoreFileID)
classmethod loadOrCreate(jobStoreFileID, ownerID,
encrypted)
classmethod loadOrFail(jobStoreFileID,
customName=None)
Return type
AWSJobStore.FileInfo
Returns
an instance of this class representing the file with the given ID
|
Raises |
NoSuchFileException -- if given file does not exist |
classmethod fromItem(item)
Convert an SDB item to an
instance of this class.
Parameters
item ( Item )
toItem()
Convert this instance to a
dictionary of attribute names to values
Returns
the attributes dict and an integer specifying the the number of chunk attributes in the dictionary that are used for storing inlined content.
Return type
tuple [ dict [ str , str ], int ]
static maxInlinedSize()
|
save() |
upload(localFilePath,
calculateChecksum=True)
uploadStream(multipart=True, allowInlining=True,
encoding=None, errors=None)
Context manager that gives out a binary or text mode upload stream to upload data.
copyFrom(srcObj)
Copies contents of source key
into this file.
Parameters
srcObj ( S3.Object ) -- The key (object) that will be copied from
copyTo(dstObj)
Copies contents of this file to
the given key.
Parameters
dstObj ( S3.Object ) -- The key (object) to copy this file's content to
download(localFilePath,
verifyChecksum=True)
downloadStream(verifyChecksum=True, encoding=None,
errors=None)
Context manager that gives out a download stream to download data.
delete()
getSize()
Return the size of the referenced item in bytes.
__repr__()
versionings
destroy()
The inverse of initialize() , this method deletes the physical storage represented by this instance. While not being atomic, this method is at least idempotent, as a means to counteract potential issues with eventual consistency exhibited by the underlying storage mechanisms. This means that if the method fails (raises an exception), it may (and should be) invoked again. If the underlying storage mechanism is eventually consistent, even a successful invocation is not an ironclad guarantee that the physical storage vanished completely and immediately. A successful invocation only guarantees that the deletion will eventually happen. It is therefore recommended to not immediately reuse the same job store location for a new Toil workflow.
toil.jobStores.aws.jobStore.aRepr
toil.jobStores.aws.jobStore.custom_repr
exception
toil.jobStores.aws.jobStore.BucketLocationConflictException(bucketRegion)
Bases: toil.jobStores.abstractJobStore.LocatorException
Base exception class for all locator exceptions. For example, job store/aws bucket exceptions where they already exist
toil.jobStores.aws.utils
Attributes
Exceptions
Classes
Functions
Module Contents
toil.jobStores.aws.utils.logger
toil.jobStores.aws.utils.DIAL_SPECIFIC_REGION_CONFIG
class toil.jobStores.aws.utils.SDBHelper
A mixin with methods for storing limited amounts of binary data in an SDB item
>>>
import os
>>> H=SDBHelper
>>> H.presenceIndicator()
u'numChunks'
>>> H.binaryToAttributes(None)['numChunks']
0
>>> H.attributesToBinary({u'numChunks': 0})
(None, 0)
>>> H.binaryToAttributes(b'')
{u'000': b'VQ==', u'numChunks': 1}
>>> H.attributesToBinary({u'numChunks': 1, u'000':
b'VQ=='})
(b'', 1)
Good pseudo-random data is very likely smaller than its bzip2ed form. Subtract 1 for the type character, i.e 'C' or 'U', with which the string is prefixed. We should get one full chunk:
>>> s
= os.urandom(H.maxRawValueSize-1)
>>> d = H.binaryToAttributes(s)
>>> len(d), len(d['000'])
(2, 1024)
>>> H.attributesToBinary(d) == (s, 1)
True
One byte more and we should overflow four bytes into the second chunk, two bytes for base64-encoding the additional character and two bytes for base64-padding to the next quartet.
>>> s
+= s[0:1]
>>> d = H.binaryToAttributes(s)
>>> len(d), len(d['000']), len(d['001'])
(3, 1024, 4)
>>> H.attributesToBinary(d) == (s, 2)
True
maxAttributesPerItem = 256
maxValueSize = 1024
maxRawValueSize
classmethod maxBinarySize(extraReservedChunks=0)
classmethod binaryToAttributes(binary)
Turn a bytestring, or None,
into SimpleDB attributes.
Return type
dict [ str , str ]
classmethod attributeDictToList(attributes)
Convert the attribute dict (ex:
from binaryToAttributes) into a list of attribute typed
dicts to be compatible with boto3 argument syntax :param
attributes: Dict[str, str], attribute in object form
:return: list of attributes in typed dict form
Parameters
attributes ( dict[str, str] )
Return type
list [mypy_boto3_sdb.type_defs.AttributeTypeDef]
classmethod attributeListToDict(attributes)
Convert the attribute boto3
representation of list of attribute typed dicts back to a
dictionary with name, value pairs :param attribute:
attribute in typed dict form :return: Dict[str, str],
attribute in dict form
Parameters
attributes (- list[mypy_boto3_sdb.type_defs.AttributeTypeDef] )
Return type
dict [ str , str ]
classmethod get_attributes_from_item(item, keys)
Parameters
|
• |
item ( mypy_boto3_sdb.type_defs.ItemTypeDef ) |
|||
|
• |
keys ( list[str] ) |
Return type
list [Optional[ str ]]
classmethod presenceIndicator()
The key that is guaranteed to be present in the return value of binaryToAttributes(). Assuming that binaryToAttributes() is used with SDB's PutAttributes, the return value of this method could be used to detect the presence/absence of an item in SDB.
classmethod attributesToBinary(attributes)
Return type
( str |None, int )
Returns
the binary data and the number of chunks it was composed from
Parameters
attributes (- list[mypy_boto3_sdb.type_defs.AttributeTypeDef] )
toil.jobStores.aws.utils.fileSizeAndTime(localFilePath)
toil.jobStores.aws.utils.uploadFromPath(localFilePath,
resource,
bucketName, fileID, headerArgs=None, partSize=50 <<
20)
Uploads a file to s3, using
multipart uploading if applicable
Parameters
|
• |
localFilePath ( str ) -- Path of the file to upload to s3 |
||
|
• |
resource ( S3.Resource ) -- boto3 resource |
||
|
• |
bucketName ( str ) -- name of the bucket to upload to |
||
|
• |
fileID ( str ) -- the name of the file to upload to |
||
|
• |
headerArgs ( dict ) -- http headers to use when uploading - generally used for encryption purposes |
||
|
• |
partSize ( int ) -- max size of each part in the multipart upload, in bytes |
Returns
version of the newly uploaded file
toil.jobStores.aws.utils.uploadFile(readable,
resource, bucketName,
fileID, headerArgs=None, partSize=50 << 20)
Upload a readable object to s3,
using multipart uploading if applicable. :param readable: a
readable stream or a file path to upload to s3 :param
S3.Resource resource: boto3 resource :param str bucketName:
name of the bucket to upload to :param str fileID: the name
of the file to upload to :param dict headerArgs: http
headers to use when uploading - generally used for
encryption purposes :param int partSize: max size of each
part in the multipart upload, in bytes :return: version of
the newly uploaded file
Parameters
|
• |
bucketName ( str ) |
|||
|
• |
fileID ( str ) |
|||
|
• |
headerArgs ( Optional[dict] ) |
|||
|
• |
partSize ( int ) |
exception toil.jobStores.aws.utils.ServerSideCopyProhibitedError
Bases: RuntimeError
Raised when AWS refuses to perform a server-side copy between S3 keys, and insists that you pay to download and upload the data yourself instead.
toil.jobStores.aws.utils.copyKeyMultipart(resource,
srcBucketName,
srcKeyName, srcKeyVersion, dstBucketName, dstKeyName,
sseAlgorithm=None, sseKey=None, copySourceSseAlgorithm=None,
copySourceSseKey=None)
Copies a key from a source key to a destination key in multiple parts. Note that if the destination key exists it will be overwritten implicitly, and if it does not exist a new key will be created. If the destination bucket does not exist an error will be raised.
This function will always do a fast, server-side copy, at least until/unless < https://github.com/boto/boto3/issues/3270 > is fixed. In some situations, a fast, server-side copy is not actually possible. For example, when residing in an AWS VPC with an S3 VPC Endpoint configured, copying from a bucket in another region to a bucket in your own region cannot be performed server-side. This is because the VPC Endpoint S3 API servers refuse to perform server-side copies between regions, the source region's API servers refuse to initiate the copy and refer you to the destination bucket's region's API servers, and the VPC routing tables are configured to redirect all access to the current region's S3 API servers to the S3 Endpoint API servers instead.
If a fast
server-side copy is not actually possible, a
ServerSideCopyProhibitedError will be raised.
Parameters
|
• |
resource ( mypy_boto3_s3.S3ServiceResource ) -- boto3 resource |
||
|
• |
srcBucketName ( str ) -- The name of the bucket to be copied from. |
||
|
• |
srcKeyName ( str ) -- The name of the key to be copied from. |
||
|
• |
srcKeyVersion ( str ) -- The version of the key to be copied from. |
||
|
• |
dstBucketName ( str ) -- The name of the destination bucket for the copy. |
||
|
• |
dstKeyName ( str ) -- The name of the destination key that will be created or overwritten. |
||
|
• |
sseAlgorithm ( str ) -- Server-side encryption algorithm for the destination. |
||
|
• |
sseKey ( str ) -- Server-side encryption key for the destination. |
||
|
• |
copySourceSseAlgorithm ( str ) -- Server-side encryption algorithm for the source. |
||
|
• |
copySourceSseKey ( str ) -- Server-side encryption key for the source. |
Return type
str
Returns
The version of the copied file (or None if versioning is not enabled for dstBucket).
toil.jobStores.aws.utils.monkeyPatchSdbConnection(sdb)
toil.jobStores.aws.utils.sdb_unavailable(e)
toil.jobStores.aws.utils.no_such_sdb_domain(e)
toil.jobStores.aws.utils.retryable_ssl_error(e)
toil.jobStores.aws.utils.retryable_sdb_errors(e)
toil.jobStores.aws.utils.retry_sdb(delays=DEFAULT_DELAYS,
timeout=DEFAULT_TIMEOUT,
predicate=retryable_sdb_errors)
toil.jobStores.conftest
Attributes
Module Contents
toil.jobStores.conftest.collect_ignore = []
toil.jobStores.fileJobStore
Attributes
Classes
Module Contents
toil.jobStores.fileJobStore.logger
class toil.jobStores.fileJobStore.FileJobStore(path,
fanOut=1000)
Bases: toil.jobStores.abstractJobStore.AbstractJobStore
A job store
that uses a directory on a locally attached file system. To
be compatible with distributed batch systems, that file
system must be shared by all worker nodes.
Parameters
|
• |
path ( str ) |
|||
|
• |
fanOut ( int ) |
validDirs =
'abcdefghijklmnopqrstuvwxyz0123456789'
validDirsSet
JOB_DIR_PREFIX = 'instance-'
JOB_NAME_DIR_PREFIX = 'kind-'
BUFFER_SIZE = 10485760
LOG_TEMP_SUFFIX = '.new'
LOG_PREFIX = 'stats'
default_caching()
Jobstore's preference as to whether it likes caching or doesn't care about it. Some jobstores benefit from caching, however on some local configurations it can be flaky.
see
https://github.com/DataBiosphere/toil/issues/4218
Return type
bool
jobStoreDir
jobsDir
statsDir
stats_inbox
stats_archive
filesDir
jobFilesDir
sharedFilesDir
|
fanOut |
linkImports
= None
moveExports = None
symlink_job_store_reads = None
__repr__()
initialize(config)
Initialize this job store.
Create the
physical storage for this job store, allocate a workflow ID
and persist the given Toil configuration to the store.
Parameters
config -- the Toil configuration to initialize this job store with. The given configuration will be updated with the newly allocated workflow ID.
|
Raises |
JobStoreExistsException -- if the physical storage for this job store already exists |
resume()
Connect this instance to the physical storage it represents and load the Toil configuration into the AbstractJobStore.config attribute.
|
Raises |
NoSuchJobStoreException -- if the physical storage for this job store doesn't exist |
destroy()
The inverse of initialize() , this method deletes the physical storage represented by this instance. While not being atomic, this method is at least idempotent, as a means to counteract potential issues with eventual consistency exhibited by the underlying storage mechanisms. This means that if the method fails (raises an exception), it may (and should be) invoked again. If the underlying storage mechanism is eventually consistent, even a successful invocation is not an ironclad guarantee that the physical storage vanished completely and immediately. A successful invocation only guarantees that the deletion will eventually happen. It is therefore recommended to not immediately reuse the same job store location for a new Toil workflow.
assign_job_id(job_description)
Get a new jobStoreID to be used by the described job, and assigns it to the JobDescription.
Files
associated with the assigned ID will be accepted even if the
JobDescription has never been created or updated.
Parameters
job_description ( toil.job.JobDescription ) -- The JobDescription to give an ID to
create_job(job_description)
Writes the given JobDescription to the job store. The job must have an ID assigned already.
Must call
jobDescription.pre_update_hook()
Returns
The JobDescription passed.
Return type
toil.job.JobDescription
batch()
If supported by the batch system, calls to create() with this context manager active will be performed in a batch after the context manager is released.
job_exists(job_id)
Indicates whether a description
of the job with the specified jobStoreID exists in the job
store
Return type
bool
get_public_url(jobStoreFileID)
Returns a publicly accessible
URL to the given file in the job store. The returned URL may
expire as early as 1h after its been returned. Throw an
exception if the file does not exist.
Parameters
file_name ( str ) -- the jobStoreFileID of the file to generate a URL for
|
Raises |
NoSuchFileException -- if the specified file does not exist in this job store |
Return type
str
get_shared_public_url(sharedFileName)
Differs from getPublicUrl() in that this method is for generating URLs for shared files written by writeSharedFileStream() .
Returns a
publicly accessible URL to the given file in the job store.
The returned URL starts with 'http:', 'https:' or 'file:'.
The returned URL may expire as early as 1h after its been
returned. Throw an exception if the file does not exist.
Parameters
shared_file_name ( str ) -- The name of the shared file to generate a publically accessible url for.
|
Raises |
NoSuchFileException -- raised if the specified file does not exist in the store |
Return type
str
load_job(job_id)
Loads the description of the job referenced by the given ID, assigns it the job store's config, and returns it.
May declare the
job to have failed (see
toil.job.JobDescription.setupJobAfterFailure()
) if
there is evidence of a failed update attempt.
Parameters
job_id -- the ID of the job to load
|
Raises |
NoSuchJobException -- if there is no job with the given ID |
update_job(job)
Persists changes to the state of the given JobDescription in this store atomically.
Must call
jobDescription.pre_update_hook()
Parameters
job ( toil.job.JobDescription ) -- the job to write to this job store
delete_job(job_id)
Removes the JobDescription from the store atomically. You may not then subsequently call load(), write(), update(), etc. with the same jobStoreID or any JobDescription bearing it.
This operation
is idempotent, i.e. deleting a job twice or deleting a
non-existent job will succeed silently.
Parameters
job_id ( str ) -- the ID of the job to delete from this job store
|
jobs() |
Best effort attempt to return iterator on JobDescriptions for all jobs in the store. The iterator may not return all jobs and may also contain orphaned jobs that have already finished successfully and should not be rerun. To guarantee you get any and all jobs that can be run instead construct a more expensive ToilState object |
Returns
Returns iterator on jobs in the store. The iterator may or may not contain all jobs and may contain invalid jobs
Return type
Iterator[toil.job.jobDescription]
write_file(local_path, job_id=None, cleanup=False)
Takes a file (as a path) and
places it in this job store. Returns an ID that can be used
to retrieve the file at a later time. The file is written in
a atomic manner. It will not appear in the jobStore until
the write has successfully completed.
Parameters
|
• |
local_path ( str ) -- the path to the local file that will be uploaded to the job store. The last path component (basename of the file) will remain associated with the file in the file store, if supported, so that the file can be searched for by name or name glob. |
||
|
• |
job_id ( str ) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchJobException -- if the job specified via jobStoreID does not exist |
FIXME: some
implementations may not raise this
Returns
an ID referencing the newly created file and can be used to read the file in the future.
Return type
str
write_file_stream(job_id=None,
cleanup=False, basename=None,
encoding=None, errors=None)
Similar to writeFile, but
returns a context manager yielding a tuple of 1) a file
handle which can be written to and 2) the ID of the
resulting file in the job store. The yielded file handle
does not need to and should not be closed explicitly. The
file is written in a atomic manner. It will not appear in
the jobStore until the write has successfully completed.
Parameters
|
• |
job_id ( str ) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
• |
basename ( str ) -- If supported by the implementation, use the given file basename so that when searching the job store with a query matching that basename, the file will be detected. |
||
|
• |
encoding ( str ) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchJobException -- if the job specified via jobStoreID does not exist |
FIXME: some
implementations may not raise this
Returns
a context manager yielding a file handle which can be written to and an ID that references the newly created file and can be used to read the file in the future.
Return type
Iterator[Tuple[IO[ bytes ], str ]]
get_empty_file_store_id(jobStoreID=None,
cleanup=False,
basename=None)
Creates an empty file in the
job store and returns its ID. Call to
fileExists(getEmptyFileStoreID(jobStoreID)) will return
True.
Parameters
|
• |
job_id ( str ) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
• |
basename ( str ) -- If supported by the implementation, use the given file basename so that when searching the job store with a query matching that basename, the file will be detected. |
Returns
a jobStoreFileID that references the newly created file and can be used to reference the file in the future.
Return type
str
update_file(file_id, local_path)
Replaces the existing version of a file in the job store.
Throws an
exception if the file does not exist.
Parameters
|
• |
file_id -- the ID of the file in the job store to be updated |
||
|
• |
local_path -- the local path to a file that will overwrite the current version in the job store |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchFileException -- if the specified file does not exist |
read_file(file_id, local_path, symlink=False)
Copies or hard links the file referenced by jobStoreFileID to the given local file path. The version will be consistent with the last copy of the file written/updated. If the file in the job store is later modified via updateFile or updateFileStream, it is implementation-defined whether those writes will be visible at localFilePath. The file is copied in an atomic manner. It will not appear in the local file system until the copy has completed.
The file at the given local path may not be modified after this method returns!
Note!
Implementations of readFile need to respect/provide the
executable attribute on FileIDs.
Parameters
|
• |
file_id ( str ) -- ID of the file to be copied |
||
|
• |
local_path ( str ) -- the local path indicating where to place the contents of the given file in the job store |
||
|
• |
symlink ( bool ) -- whether the reader can tolerate a symlink. If set to true, the job store may create a symlink instead of a full copy of the file or a hard link. |
Return type
None
delete_file(file_id)
Deletes the file with the given
ID from this job store. This operation is idempotent, i.e.
deleting a file twice or deleting a non-existent file will
succeed silently.
Parameters
file_id ( str ) -- ID of the file to delete
file_exists(file_id)
Determine whether a file exists
in this job store.
Parameters
file_id -- an ID referencing the file to be checked
get_file_size(file_id)
Get the size of the given file in bytes, or 0 if it does not exist when queried.
Note that job
stores which encrypt files might return overestimates of
file sizes, since the encrypted file may have been padded to
the nearest block, augmented with an initialization vector,
etc.
Parameters
file_id ( str ) -- an ID referencing the file to be checked
Return type
int
update_file_stream(file_id, encoding=None, errors=None)
Replaces the existing version
of a file in the job store. Similar to writeFile, but
returns a context manager yielding a file handle which can
be written to. The yielded file handle does not need to and
should not be closed explicitly.
Parameters
|
• |
file_id ( str ) -- the ID of the file in the job store to be updated |
||
|
• |
encoding ( str ) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchFileException -- if the specified file does not exist |
read_file_stream(file_id:
str
|
toil.fileStores.FileID
,
encoding: Literal[None] = None, errors:
str
|
None
= None) ->
collections.abc.Iterator
[IO[
bytes
]]
read_file_stream(file_id:
str
|
toil.fileStores.FileID
,
encoding:
str
, errors:
str
|
None
= None) ->
collections.abc.Iterator
[IO[
str
]]
read_file_stream(file_id:
str
|
toil.fileStores.FileID
,
encoding:
str
|
None
= None,
errors:
str
|
None
= None)
->
collections.abc.Iterator
[IO[
bytes
]]
|
collections.abc.Iterator
[IO[
str
]]
Similar to readFile, but
returns a context manager yielding a file handle which can
be read from. The yielded file handle does not need to and
should not be closed explicitly.
Parameters
|
• |
file_id ( str ) -- ID of the file to get a readable file handle for |
||
|
• |
encoding ( str ) -- the name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
Returns
a context manager yielding a file handle which can be read from
Return type
Iterator[Union[IO[ bytes ], IO[ str ]]]
write_shared_file_stream(shared_file_name,
encrypted=None,
encoding=None, errors=None)
Returns a context manager
yielding a writable file handle to the global file
referenced by the given name. File will be created in an
atomic manner.
Parameters
|
• |
shared_file_name ( str ) -- A file name matching AbstractJobStore.fileNameRegex, unique within this job store |
||
|
• |
encrypted ( bool ) -- True if the file must be encrypted, None if it may be encrypted or False if it must be stored in the clear. |
||
|
• |
encoding ( str ) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method
Returns
a context manager yielding a writable file handle
Return type
Iterator[IO[ bytes ]]
read_shared_file_stream(shared_file_name,
encoding=None,
errors=None)
Returns a context manager
yielding a readable file handle to the global file
referenced by the given name.
Parameters
|
• |
shared_file_name ( str ) -- A file name matching AbstractJobStore.fileNameRegex, unique within this job store |
||
|
• |
encoding ( str ) -- the name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
Returns
a context manager yielding a readable file handle
Return type
Iterator[IO[ bytes ]]
list_all_file_names(for_job=None)
Get all the file names (not file IDs) of files stored in the job store.
Used for
debugging.
Parameters
for_job ( Optional[str] ) -- If set, restrict the list to files for a particular job.
Return type
collections.abc.Iterable [ str ]
write_logs(msg)
Stores a message as a log in
the jobstore.
Parameters
msg ( str ) -- the string to be written
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
read_logs(callback, read_all=False)
Reads logs accumulated by the write_logs() method. For each log this method calls the given callback function with the message as an argument (rather than returning logs directly, this method must be supplied with a callback which will process log messages).
Only unread
logs will be read unless the read_all parameter is set.
Parameters
|
• |
callback ( Callable ) -- a function to be applied to each of the stats file handles found |
||
|
• |
read_all ( bool ) -- a boolean indicating whether to read the already processed stats files in addition to the unread stats files |
||
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method
Returns
the number of stats files processed
Return type
int
toil.jobStores.googleJobStore
Attributes
Classes
Functions
Module Contents
toil.jobStores.googleJobStore.log
toil.jobStores.googleJobStore.GOOGLE_STORAGE = 'gs'
toil.jobStores.googleJobStore.MAX_BATCH_SIZE = 1000
toil.jobStores.googleJobStore.google_retry_predicate(e)
necessary because under heavy load google may throw
TooManyRequests: 429 The project exceeded the rate limit for creating and deleting buckets.
or numerous other server errors which need to be retried.
toil.jobStores.googleJobStore.google_retry(f)
This decorator retries the wrapped function if google throws any angry service errors.
It should wrap any function that makes use of the Google Client API
class toil.jobStores.googleJobStore.GoogleJobStore(locator)
Bases: toil.jobStores.abstractJobStore.AbstractJobStore
Represents the physical storage for the jobs and files in a Toil workflow.
JobStores are responsible for storing toil.job.JobDescription (which relate jobs to each other) and files.
Actual toil.job.Job objects are stored in files, referenced by JobDescriptions. All the non-file CRUD methods the JobStore provides deal in JobDescriptions and not full, executable Jobs.
To actually get
ahold of a
toil.job.Job
, use
toil.job.Job.loadJob()
with a JobStore and the
relevant JobDescription.
Parameters
locator ( str )
nodeServiceAccountJson =
'/root/service_account.json'
projectID
bucketName
bucket = None
statsBaseID = 'f16eef0c-b597-4b8b-9b0c-4d605b4f506c'
statsReadPrefix = '_'
readStatsBaseID
sseKey = None
storageClient
classmethod create_client()
Produce a client for Google Sotrage with the highest level of access we can get.
Fall back to anonymous access if no project is available, unlike the Google Storage module's behavior.
Warn if
GOOGLE_APPLICATION_CREDENTIALS is set but not actually
present.
Return type
google.cloud.storage.Client
initialize(config=None)
Initialize this job store.
Create the
physical storage for this job store, allocate a workflow ID
and persist the given Toil configuration to the store.
Parameters
config -- the Toil configuration to initialize this job store with. The given configuration will be updated with the newly allocated workflow ID.
|
Raises |
JobStoreExistsException -- if the physical storage for this job store already exists |
resume()
Connect this instance to the physical storage it represents and load the Toil configuration into the AbstractJobStore.config attribute.
|
Raises |
NoSuchJobStoreException -- if the physical storage for this job store doesn't exist |
destroy()
The inverse of initialize() , this method deletes the physical storage represented by this instance. While not being atomic, this method is at least idempotent, as a means to counteract potential issues with eventual consistency exhibited by the underlying storage mechanisms. This means that if the method fails (raises an exception), it may (and should be) invoked again. If the underlying storage mechanism is eventually consistent, even a successful invocation is not an ironclad guarantee that the physical storage vanished completely and immediately. A successful invocation only guarantees that the deletion will eventually happen. It is therefore recommended to not immediately reuse the same job store location for a new Toil workflow.
assign_job_id(job_description)
Get a new jobStoreID to be used by the described job, and assigns it to the JobDescription.
Files
associated with the assigned ID will be accepted even if the
JobDescription has never been created or updated.
Parameters
job_description ( toil.job.JobDescription ) -- The JobDescription to give an ID to
batch()
If supported by the batch system, calls to create() with this context manager active will be performed in a batch after the context manager is released.
create_job(job_description)
Writes the given JobDescription to the job store. The job must have an ID assigned already.
Must call
jobDescription.pre_update_hook()
Returns
The JobDescription passed.
Return type
toil.job.JobDescription
job_exists(job_id)
Indicates whether a description
of the job with the specified jobStoreID exists in the job
store
Return type
bool
get_public_url(fileName)
Returns a publicly accessible
URL to the given file in the job store. The returned URL may
expire as early as 1h after its been returned. Throw an
exception if the file does not exist.
Parameters
file_name ( str ) -- the jobStoreFileID of the file to generate a URL for
|
Raises |
NoSuchFileException -- if the specified file does not exist in this job store |
Return type
str
get_shared_public_url(sharedFileName)
Differs from getPublicUrl() in that this method is for generating URLs for shared files written by writeSharedFileStream() .
Returns a
publicly accessible URL to the given file in the job store.
The returned URL starts with 'http:', 'https:' or 'file:'.
The returned URL may expire as early as 1h after its been
returned. Throw an exception if the file does not exist.
Parameters
shared_file_name ( str ) -- The name of the shared file to generate a publically accessible url for.
|
Raises |
NoSuchFileException -- raised if the specified file does not exist in the store |
Return type
str
load_job(job_id)
Loads the description of the job referenced by the given ID, assigns it the job store's config, and returns it.
May declare the
job to have failed (see
toil.job.JobDescription.setupJobAfterFailure()
) if
there is evidence of a failed update attempt.
Parameters
job_id -- the ID of the job to load
|
Raises |
NoSuchJobException -- if there is no job with the given ID |
update_job(job)
Persists changes to the state of the given JobDescription in this store atomically.
Must call
jobDescription.pre_update_hook()
Parameters
job ( toil.job.JobDescription ) -- the job to write to this job store
delete_job(job_id)
Removes the JobDescription from the store atomically. You may not then subsequently call load(), write(), update(), etc. with the same jobStoreID or any JobDescription bearing it.
This operation
is idempotent, i.e. deleting a job twice or deleting a
non-existent job will succeed silently.
Parameters
job_id ( str ) -- the ID of the job to delete from this job store
get_env()
Return a dict of environment variables to send out to the workers so they can load the job store.
|
jobs() |
Best effort attempt to return iterator on JobDescriptions for all jobs in the store. The iterator may not return all jobs and may also contain orphaned jobs that have already finished successfully and should not be rerun. To guarantee you get any and all jobs that can be run instead construct a more expensive ToilState object |
Returns
Returns iterator on jobs in the store. The iterator may or may not contain all jobs and may contain invalid jobs
Return type
Iterator[toil.job.jobDescription]
write_file(local_path, job_id=None, cleanup=False)
Takes a file (as a path) and
places it in this job store. Returns an ID that can be used
to retrieve the file at a later time. The file is written in
a atomic manner. It will not appear in the jobStore until
the write has successfully completed.
Parameters
|
• |
local_path ( str ) -- the path to the local file that will be uploaded to the job store. The last path component (basename of the file) will remain associated with the file in the file store, if supported, so that the file can be searched for by name or name glob. |
||
|
• |
job_id ( str ) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchJobException -- if the job specified via jobStoreID does not exist |
FIXME: some
implementations may not raise this
Returns
an ID referencing the newly created file and can be used to read the file in the future.
Return type
str
write_file_stream(job_id=None,
cleanup=False, basename=None,
encoding=None, errors=None)
Similar to writeFile, but
returns a context manager yielding a tuple of 1) a file
handle which can be written to and 2) the ID of the
resulting file in the job store. The yielded file handle
does not need to and should not be closed explicitly. The
file is written in a atomic manner. It will not appear in
the jobStore until the write has successfully completed.
Parameters
|
• |
job_id ( str ) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
• |
basename ( str ) -- If supported by the implementation, use the given file basename so that when searching the job store with a query matching that basename, the file will be detected. |
||
|
• |
encoding ( str ) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchJobException -- if the job specified via jobStoreID does not exist |
FIXME: some
implementations may not raise this
Returns
a context manager yielding a file handle which can be written to and an ID that references the newly created file and can be used to read the file in the future.
Return type
Iterator[Tuple[IO[ bytes ], str ]]
get_empty_file_store_id(jobStoreID=None,
cleanup=False,
basename=None)
Creates an empty file in the
job store and returns its ID. Call to
fileExists(getEmptyFileStoreID(jobStoreID)) will return
True.
Parameters
|
• |
job_id ( str ) -- the id of a job, or None. If specified, the may be associated with that job in a job-store-specific way. This may influence the returned ID. |
||
|
• |
cleanup ( bool ) -- Whether to attempt to delete the file when the job whose jobStoreID was given as jobStoreID is deleted with jobStore.delete(job). If jobStoreID was not given, does nothing. |
||
|
• |
basename ( str ) -- If supported by the implementation, use the given file basename so that when searching the job store with a query matching that basename, the file will be detected. |
Returns
a jobStoreFileID that references the newly created file and can be used to reference the file in the future.
Return type
str
read_file(file_id, local_path, symlink=False)
Copies or hard links the file referenced by jobStoreFileID to the given local file path. The version will be consistent with the last copy of the file written/updated. If the file in the job store is later modified via updateFile or updateFileStream, it is implementation-defined whether those writes will be visible at localFilePath. The file is copied in an atomic manner. It will not appear in the local file system until the copy has completed.
The file at the given local path may not be modified after this method returns!
Note!
Implementations of readFile need to respect/provide the
executable attribute on FileIDs.
Parameters
|
• |
file_id ( str ) -- ID of the file to be copied |
||
|
• |
local_path ( str ) -- the local path indicating where to place the contents of the given file in the job store |
||
|
• |
symlink ( bool ) -- whether the reader can tolerate a symlink. If set to true, the job store may create a symlink instead of a full copy of the file or a hard link. |
read_file_stream(file_id, encoding=None, errors=None)
Similar to readFile, but
returns a context manager yielding a file handle which can
be read from. The yielded file handle does not need to and
should not be closed explicitly.
Parameters
|
• |
file_id ( str ) -- ID of the file to get a readable file handle for |
||
|
• |
encoding ( str ) -- the name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
Returns
a context manager yielding a file handle which can be read from
Return type
Iterator[Union[IO[ bytes ], IO[ str ]]]
delete_file(file_id)
Deletes the file with the given
ID from this job store. This operation is idempotent, i.e.
deleting a file twice or deleting a non-existent file will
succeed silently.
Parameters
file_id ( str ) -- ID of the file to delete
file_exists(file_id)
Determine whether a file exists
in this job store.
Parameters
file_id -- an ID referencing the file to be checked
get_file_size(file_id)
Get the size of the given file in bytes, or 0 if it does not exist when queried.
Note that job
stores which encrypt files might return overestimates of
file sizes, since the encrypted file may have been padded to
the nearest block, augmented with an initialization vector,
etc.
Parameters
file_id ( str ) -- an ID referencing the file to be checked
Return type
int
update_file(file_id, local_path)
Replaces the existing version of a file in the job store.
Throws an
exception if the file does not exist.
Parameters
|
• |
file_id -- the ID of the file in the job store to be updated |
||
|
• |
local_path -- the local path to a file that will overwrite the current version in the job store |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchFileException -- if the specified file does not exist |
update_file_stream(file_id, encoding=None, errors=None)
Replaces the existing version
of a file in the job store. Similar to writeFile, but
returns a context manager yielding a file handle which can
be written to. The yielded file handle does not need to and
should not be closed explicitly.
Parameters
|
• |
file_id ( str ) -- the ID of the file in the job store to be updated |
||
|
• |
encoding ( str ) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
Raises |
|||
|
• |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
||
|
• |
NoSuchFileException -- if the specified file does not exist |
write_shared_file_stream(shared_file_name,
encrypted=True,
encoding=None, errors=None)
Returns a context manager
yielding a writable file handle to the global file
referenced by the given name. File will be created in an
atomic manner.
Parameters
|
• |
shared_file_name ( str ) -- A file name matching AbstractJobStore.fileNameRegex, unique within this job store |
||
|
• |
encrypted ( bool ) -- True if the file must be encrypted, None if it may be encrypted or False if it must be stored in the clear. |
||
|
• |
encoding ( str ) -- the name of the encoding used to encode the file. Encodings are the same as for encode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
||
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method
Returns
a context manager yielding a writable file handle
Return type
Iterator[IO[ bytes ]]
read_shared_file_stream(shared_file_name,
isProtected=True,
encoding=None, errors=None)
Returns a context manager
yielding a readable file handle to the global file
referenced by the given name.
Parameters
|
• |
shared_file_name ( str ) -- A file name matching AbstractJobStore.fileNameRegex, unique within this job store |
||
|
• |
encoding ( str ) -- the name of the encoding used to decode the file. Encodings are the same as for decode(). Defaults to None which represents binary mode. |
||
|
• |
errors ( str ) -- an optional string that specifies how encoding errors are to be handled. Errors are the same as for open(). Defaults to 'strict' when an encoding is specified. |
Returns
a context manager yielding a readable file handle
Return type
Iterator[IO[ bytes ]]
write_logs(msg)
Stores a message as a log in
the jobstore.
Parameters
msg ( str ) -- the string to be written
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method |
Return type
None
read_logs(callback, read_all=False)
Reads logs accumulated by the write_logs() method. For each log this method calls the given callback function with the message as an argument (rather than returning logs directly, this method must be supplied with a callback which will process log messages).
Only unread
logs will be read unless the read_all parameter is set.
Parameters
|
• |
callback ( Callable ) -- a function to be applied to each of the stats file handles found |
||
|
• |
read_all ( bool ) -- a boolean indicating whether to read the already processed stats files in addition to the unread stats files |
||
|
Raises |
ConcurrentFileModificationException -- if the file was modified concurrently during an invocation of this method
Returns
the number of stats files processed
Return type
int
toil.jobStores.utils
Attributes
Exceptions
Classes
Functions
Module Contents
toil.jobStores.utils.log
class toil.jobStores.utils.WritablePipe(encoding=None,
errors=None)
Bases: abc.ABC
An object-oriented wrapper for os.pipe. Clients should subclass it, implement readFrom() to consume the readable end of the pipe, then instantiate the class as a context manager to get the writable end. See the example below.
>>>
import sys, shutil
>>> class MyPipe(WritablePipe):
... def readFrom(self, readable):
... shutil.copyfileobj(codecs.getreader('utf-8')(readable),
sys.stdout)
>>> with MyPipe() as writable:
... _ = writable.write('Hello, world!\n'.encode('utf-8'))
Hello, world!
Each instance of this class creates a thread and invokes the readFrom method in that thread. The thread will be join()ed upon normal exit from the context manager, i.e. the body of the with statement. If an exception occurs, the thread will not be joined but a well-behaved readFrom() implementation will terminate shortly thereafter due to the pipe having been closed.
Now, exceptions in the reader thread will be reraised in the main thread:
>>>
class MyPipe(WritablePipe):
... def readFrom(self, readable):
... raise RuntimeError('Hello, world!')
>>> with MyPipe() as writable:
... pass
Traceback (most recent call last):
...
RuntimeError: Hello, world!
More complicated, less illustrative tests:
Same as above, but proving that handles are closed:
>>> x
= os.dup(0); os.close(x)
>>> class MyPipe(WritablePipe):
... def readFrom(self, readable):
... raise RuntimeError('Hello, world!')
>>> with MyPipe() as writable:
... pass
Traceback (most recent call last):
...
RuntimeError: Hello, world!
>>> y = os.dup(0); os.close(y); x == y
True
Exceptions in the body of the with statement aren't masked, and handles are closed:
>>> x
= os.dup(0); os.close(x)
>>> class MyPipe(WritablePipe):
... def readFrom(self, readable):
... pass
>>> with MyPipe() as writable:
... raise RuntimeError('Hello, world!')
Traceback (most recent call last):
...
RuntimeError: Hello, world!
>>> y = os.dup(0); os.close(y); x == y
True
abstract readFrom(readable)
Implement this method to read
data from the pipe. This method should support both binary
and text mode output.
Parameters
readable ( file ) -- the file object representing the readable end of the pipe. Do not explicitly invoke the close() method of the object, that will be done automatically.
encoding
|
errors |
readable_fh
= None
writable = None
thread = None
reader_done = False
__enter__()
__exit__(exc_type, exc_val, exc_tb)
class toil.jobStores.utils.ReadablePipe(encoding=None, errors=None)
Bases: abc.ABC
An object-oriented wrapper for os.pipe. Clients should subclass it, implement writeTo() to place data into the writable end of the pipe, then instantiate the class as a context manager to get the writable end. See the example below.
>>>
import sys, shutil
>>> class MyPipe(ReadablePipe):
... def writeTo(self, writable):
... writable.write('Hello, world!\n'.encode('utf-8'))
>>> with MyPipe() as readable:
... shutil.copyfileobj(codecs.getreader('utf-8')(readable),
sys.stdout)
Hello, world!
Each instance of this class creates a thread and invokes the writeTo() method in that thread. The thread will be join()ed upon normal exit from the context manager, i.e. the body of the with statement. If an exception occurs, the thread will not be joined but a well-behaved writeTo() implementation will terminate shortly thereafter due to the pipe having been closed.
Now, exceptions in the reader thread will be reraised in the main thread:
>>>
class MyPipe(ReadablePipe):
... def writeTo(self, writable):
... raise RuntimeError('Hello, world!')
>>> with MyPipe() as readable:
... pass
Traceback (most recent call last):
...
RuntimeError: Hello, world!
More complicated, less illustrative tests:
Same as above, but proving that handles are closed:
>>> x
= os.dup(0); os.close(x)
>>> class MyPipe(ReadablePipe):
... def writeTo(self, writable):
... raise RuntimeError('Hello, world!')
>>> with MyPipe() as readable:
... pass
Traceback (most recent call last):
...
RuntimeError: Hello, world!
>>> y = os.dup(0); os.close(y); x == y
True
Exceptions in the body of the with statement aren't masked, and handles are closed:
>>> x
= os.dup(0); os.close(x)
>>> class MyPipe(ReadablePipe):
... def writeTo(self, writable):
... pass
>>> with MyPipe() as readable:
... raise RuntimeError('Hello, world!')
Traceback (most recent call last):
...
RuntimeError: Hello, world!
>>> y = os.dup(0); os.close(y); x == y
True
abstract writeTo(writable)
Implement this method to write
data from the pipe. This method should support both binary
and text mode input.
Parameters
writable ( file ) -- the file object representing the writable end of the pipe. Do not explicitly invoke the close() method of the object, that will be done automatically.
encoding
|
errors |
writable_fh
= None
readable = None
thread = None
__enter__()
__exit__(exc_type, exc_val, exc_tb)
class
toil.jobStores.utils.ReadableTransformingPipe(source,
encoding=None, errors=None)
Bases: ReadablePipe
A pipe which is constructed around a readable stream, and which provides a context manager that gives a readable stream.
Useful as a base class for pipes which have to transform or otherwise visit bytes that flow through them, instead of just consuming or producing data.
Clients should subclass it and implement transform() , like so:
>>>
import sys, shutil
>>> class MyPipe(ReadableTransformingPipe):
... def transform(self, readable, writable):
...
writable.write(readable.read().decode('utf-8').upper().encode('utf-8'))
>>> class SourcePipe(ReadablePipe):
... def writeTo(self, writable):
... writable.write('Hello, world!\n'.encode('utf-8'))
>>> with SourcePipe() as source:
... with MyPipe(source) as transformed:
...
shutil.copyfileobj(codecs.getreader('utf-8')(transformed),
sys.stdout)
HELLO, WORLD!
The transform() method runs in its own thread, and should move data chunk by chunk instead of all at once. It should finish normally if it encounters either an EOF on the readable, or a BrokenPipeError on the writable. This means that it should make sure to actually catch a BrokenPipeError when writing.
See also: toil.lib.misc.WriteWatchingStream .
|
source |
abstract transform(readable, writable)
Implement this method to ship
data through the pipe.
Parameters
|
• |
readable ( file ) -- the input stream file object to transform. |
||
|
• |
writable ( file ) -- the file object representing the writable end of the pipe. Do not explicitly invoke the close() method of the object, that will be done automatically. |
writeTo(writable)
Implement this method to write
data from the pipe. This method should support both binary
and text mode input.
Parameters
writable ( file ) -- the file object representing the writable end of the pipe. Do not explicitly invoke the close() method of the object, that will be done automatically.
exception toil.jobStores.utils.JobStoreUnavailableException
Bases: RuntimeError
Raised when a particular type of job store is requested but can't be used.
toil.jobStores.utils.generate_locator(job_store_type,
local_suggestion=None, decoration=None)
Generate a random locator for a
job store of the given type. Raises an
JobStoreUnavailableException if that job store cannot be
used.
Parameters
|
• |
job_store_type ( str ) -- Registry name of the job store to use. |
||
|
• |
local_suggestion ( Optional[str] ) -- Path to a nonexistent local directory suitable for use as a file job store. |
||
|
• |
decoration ( Optional[str] ) -- Extra string to add to the job store locator, if convenient. |
Return str
Job store locator for a usable job store.
Return type
str
toil.leader
The leader script (of the leader/worker pair) for running jobs.
Attributes
Classes
Module Contents
toil.leader.logger
class toil.leader.Leader(config, batchSystem, provisioner,
jobStore,
rootJob, jobCache=None)
Represents the Toil leader.
Responsible for
determining what jobs are ready to be scheduled, by
consulting the job store, and issuing them in the batch
system.
Parameters
|
• |
config ( toil.common.Config ) |
||
|
• |
batchSystem (- toil.batchSystems.abstractBatchSystem.AbstractBatchSystem ) |
||
|
• |
provisioner ( Optional[- toil.provisioners.abstractProvisioner.AbstractProvisioner] ) |
||
|
• |
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore ) |
||
|
• |
rootJob ( toil.job.JobDescription ) |
||
|
• |
jobCache ( Optional[dict[Union[str, toil.job.TemporaryID], toil.job.JobDescription]] ) |
||
|
config |
jobStore
jobStoreLocator
toilState
batchSystem
issued_jobs_by_batch_system_id:
dict
[
int
,
str
]
preemptibleJobsIssued = 0
serviceJobsIssued = 0
serviceJobsToBeIssued:
list
[
str
]
= []
preemptibleServiceJobsIssued = 0
preemptibleServiceJobsToBeIssued:
list
[
str
] = []
timeSinceJobsLastRescued = None
reissueMissingJobs_missingHash:
dict
[
int
,
int
]
provisioner
clusterScaler = None
serviceManager
statsAndLogging
potentialDeadlockedJobs:
set
[
str
]
potentialDeadlockTime = 0
toilMetrics:
toil.common.ToilMetrics
|
None
= None
debugJobNames = ('CWLJob', 'CWLWorkflow', 'CWLScatter',
'CWLGather', 'ResolveIndirect')
deadlockThrottler
statusThrottler
kill_throttler
progress_overall = None
progress_failed = None
GOOD_COLOR = (0, 60, 108)
BAD_COLOR = (253, 199, 0)
PROGRESS_BAR_FORMAT =
'{desc}{desc_pad}{percentage:3.0f}%|{bar}|
{count:{len_total}d}/{total:d} ({count_1:d} failures)...
recommended_fail_exit_code = 1
|
run() |
Run the leader process to issue and manage jobs. |
Raises
|
toil.exceptions.FailedJobsException if failed jobs remain after running. |
Returns
The return value of the root job's run function.
Return type
Any
create_status_sentinel_file(fail)
Create a file in the jobstore
indicating failure or success.
Parameters
fail ( bool )
Return type
None
innerLoop()
Process jobs.
This is the leader's main loop.
checkForDeadlocks()
Check if the system is deadlocked running service jobs.
feed_deadlock_watchdog()
Note that progress has been
made and any pending deadlock checks should be reset.
Return type
None
issueJob(jobNode)
Add a job to the queue of jobs
currently trying to run.
Parameters
jobNode ( toil.job.JobDescription )
Return type
None
issueJobs(jobs)
Add a list of jobs, each represented as a jobNode object.
issueServiceJob(service_id)
Issue a service job.
Put it on a
queue if the maximum number of service jobs to be scheduled
has been reached.
Parameters
service_id ( str )
Return type
None
issueQueingServiceJobs()
Issues any queuing service jobs up to the limit of the maximum allowed.
getNumberOfJobsIssued(preemptible=None)
Get number of jobs that have
been added by issueJob(s) and not removed by removeJob.
Parameters
preemptible ( Optional[bool] ) -- If none, return all types of jobs. If true, return just the number of preemptible jobs. If false, return just the number of non-preemptible jobs.
Return type
int
removeJob(jobBatchSystemID)
Remove a job from the system by
batch system ID.
Returns
Job description as it was issued.
Parameters
jobBatchSystemID ( int )
Return type
toil.job.JobDescription
getJobs(preemptible=None)
Get all issued jobs.
Parameters
preemptible ( Optional[bool] ) -- If specified, select only preemptible or only non-preemptible jobs.
Return type
list [ toil.job.JobDescription ]
killJobs(jobsToKill, exit_reason=BatchJobExitReason.KILLED)
Kills the given set of jobs and then sends them for processing.
Returns the
jobs that, upon processing, were reissued.
Parameters
exit_reason (- toil.batchSystems.abstractBatchSystem.BatchJobExitReason )
reissueOverLongJobs()
Check each issued job.
If a job is
running for longer than desirable issue a kill instruction.
Wait for the job to die then we pass the job to
process_finished_job.
Return type
None
reissueMissingJobs(killAfterNTimesMissing=3)
Check all the current job ids are in the list of currently issued batch system jobs.
If a job is missing, we mark it as so, if it is missing for a number of runs of this function (say 10).. then we try deleting the job (though its probably lost), we wait then we pass the job to process_finished_job.
processRemovedJob(issuedJob,
result_status)
process_finished_job(batch_system_id, result_status,
wall_time=None, exit_reason=None)
Process finished jobs.
Called when an attempt to run a job finishes, either successfully or otherwise.
Takes the job
out of the issued state, and then works out what to do about
the fact that it succeeded or failed.
Returns
True if the job is going to run again, and False if the job is fully done or completely failed.
Return type
bool
process_finished_job_description(finished_job,
result_status,
wall_time=None, exit_reason=None,
batch_system_id=None)
Process a finished JobDescription based upon its success or failure.
If wall-clock time is available, informs the cluster scaler about the job finishing.
If the job failed and a batch system ID is available, checks for and reports batch system logs.
Checks if it
succeeded and was removed, or if it failed and needs to be
set up after failure, and dispatches to the appropriate
function.
Returns
True if the job is going to run again, and False if the job is fully done or completely failed.
Parameters
|
• |
finished_job ( toil.job.JobDescription ) |
||
|
• |
result_status ( int ) |
||
|
• |
wall_time ( Optional[float] ) |
||
|
• |
exit_reason ( Optional[- toil.batchSystems.abstractBatchSystem.BatchJobExitReason] ) |
||
|
• |
batch_system_id ( Optional[int] ) |
Return type
bool
getSuccessors(job_id, alreadySeenSuccessors)
Get successors of the given job
by walking the job graph recursively.
Parameters
|
• |
alreadySeenSuccessors ( set[str] ) -- any successor seen here is ignored and not traversed. |
||
|
• |
job_id ( str ) |
Returns
The set of found successors. This set is added to alreadySeenSuccessors.
Return type
set [ str ]
processTotallyFailedJob(job_id)
Process a totally failed job.
Parameters
job_id ( str )
Return type
None
toil.lib
Submodules
toil.lib.accelerators
Accelerator (i.e. GPU) utilities for Toil
Functions
Module Contents
toil.lib.accelerators.have_working_nvidia_smi()
Return True if the nvidia-smi binary, from nvidia's CUDA userspace utilities, is installed and can be run successfully.
TODO: This
isn't quite the same as the check that cwltool uses to
decide if it can fulfill a CUDARequirement.
Return type
bool
toil.lib.accelerators.get_host_accelerator_numbers()
Work out what accelerator is what.
For each accelerator visible to us, returns the host-side (for example, outside-of-Slurm-job) number for that accelerator. It is often the same as the apparent number.
Can be used
with Docker's --gpus='"device=#,#,#"' option to
forward the right GPUs as seen from a Docker daemon.
Return type
list [ int ]
toil.lib.accelerators.have_working_nvidia_docker_runtime()
Return True if Docker exists
and can handle an "nvidia" runtime and the
"--gpus" option.
Return type
bool
toil.lib.accelerators.count_nvidia_gpus()
Return the number of nvidia
GPUs seen by nvidia-smi, or 0 if it is not working.
Return type
int
toil.lib.accelerators.count_amd_gpus()
Return the number of amd GPUs
seen by rocm-smi, or 0 if it is not working. :return:
Return type
int
toil.lib.accelerators.get_individual_local_accelerators()
Determine all the local accelerators available. Report each with count 1, in the order of the number that can be used to assign them.
TODO: How will
numbers work with multiple types of accelerator? We need an
accelerator assignment API.
Return type
list [ toil.job.AcceleratorRequirement ]
toil.lib.accelerators.get_restrictive_environment_for_local_accelerators(accelerator_numbers)
Get environment variables which can be applied to a process to restrict it to using only the given accelerator numbers.
The numbers are
in the space of accelerators returned by
get_individual_local_accelerators().
Parameters
accelerator_numbers ( Union[set[int], list[int]] )
Return type
dict [ str , str ]
toil.lib.aws
Submodules
toil.lib.aws.ami
Attributes
Exceptions
Functions
Module Contents
toil.lib.aws.ami.logger
exception
toil.lib.aws.ami.ReleaseFeedUnavailableError
Bases: RuntimeError
Raised when a Flatcar releases can't be located.
toil.lib.aws.ami.get_flatcar_ami(ec2_client, architecture='amd64')
Retrieve the flatcar AMI image to use as the base for all Toil autoscaling instances.
AMI must be
available to the user on AWS (attempting to launch will
return a 403 otherwise).
Priority is:
|
1. |
User specified AMI via TOIL_AWS_AMI |
|||
|
2. |
Official AMI from stable.release.flatcar-linux.net |
|||
|
3. |
Search the AWS Marketplace |
|||
|
Raises |
ReleaseFeedUnavailableError -- if all of these sources fail.
Parameters
|
• |
ec2_client ( botocore.client.BaseClient ) -- Boto3 EC2 Client |
||
|
• |
architecture ( str ) -- The architecture type for the new AWS machine. Can be either amd64 or arm64 |
Return type
str
toil.lib.aws.ami.flatcar_release_feed_ami(region,
architecture='amd64',
source='stable')
Yield AMI IDs for the given
architecture from the Flatcar release feed.
Parameters
|
• |
source ( str ) -- can be set to a Flatcar release channel ('stable', 'beta', or 'alpha'), 'archive' to check the Internet Archive for a feed, and 'toil' to check if the Toil project has put up a feed. |
||
|
• |
region ( str ) |
||
|
• |
architecture ( str ) |
Return type
Optional[ str ]
Retries if the release feed cannot be fetched. If the release feed has a permanent error, yields nothing. If some entries in the release feed are unparseable, yields the others.
toil.lib.aws.ami.feed_flatcar_ami_release(ec2_client,
architecture='amd64', source='stable')
Check a Flatcar release feed for the latest flatcar AMI.
Verify it's on AWS.
Does not raise
exceptions.
Parameters
|
• |
ec2_client ( botocore.client.BaseClient ) -- Boto3 EC2 Client |
||
|
• |
architecture ( str ) -- The architecture type for the new AWS machine. Can be either amd64 or arm64 |
||
|
• |
source ( str ) -- can be set to a Flatcar release channel ('stable', 'beta', or 'alpha'), 'archive' to check the Internet Archive for a feed, and 'toil' to check if the Toil project has put up a feed. |
Return type
Optional[ str ]
toil.lib.aws.ami.aws_marketplace_flatcar_ami_search(ec2_client,
architecture='amd64')
Query AWS for all AMI names matching Flatcar-stable-* and return the most recent one.
Does not raise
exceptions.
Returns
An AMI name, or None if no matching AMI was found or we could not talk to AWS.
Parameters
|
• |
ec2_client ( botocore.client.BaseClient ) |
|||
|
• |
architecture ( str ) |
Return type
Optional[ str ]
toil.lib.aws.iam
Attributes
Functions
Module Contents
toil.lib.aws.iam.logger
toil.lib.aws.iam.CLUSTER_LAUNCHING_PERMISSIONS =
['iam:CreateRole',
'iam:CreateInstanceProfile', 'iam:TagInstanceProfile',
'iam:DeleteRole',...
toil.lib.aws.iam.AllowedActionCollection
toil.lib.aws.iam.delete_iam_instance_profile(instance_profile_name,
region=None, quiet=True)
Parameters
|
• |
instance_profile_name ( str ) |
|||
|
• |
region ( Optional[str] ) |
|||
|
• |
quiet ( bool ) |
Return type
None
toil.lib.aws.iam.delete_iam_role(role_name, region=None, quiet=True)
Deletes an AWS IAM role. Any
separate policies are detached from the role, and any inline
policies are deleted.
Parameters
|
• |
role_name ( str ) -- The name of the AWS IAM role. |
||
|
• |
region ( Optional[str] ) -- The AWS region that the role_name is in. |
||
|
• |
quiet ( bool ) -- Whether or not to print/log information about the deletion to stdout. |
Return type
None
toil.lib.aws.iam.create_iam_role(role_name,
assume_role_policy_document, policies, region=None)
Creates an AWS IAM role. Any
separate policies are detached from the role, and any inline
policies are deleted.
Parameters
|
• |
role_name ( str ) -- The name of the AWS IAM role. |
||
|
• |
region ( Optional[str] ) -- The AWS region that the role_name is in. |
||
|
• |
assume_role_policy_document ( str ) -- Policies to create inline with the role. |
||
|
• |
policies ( dict[str, Any] ) -- Global policies to attach to the role. |
Return type
str
toil.lib.aws.iam.init_action_collection()
Initialization of an action collection, an action collection contains allowed Actions and NotActions by resource, these are patterns containing wildcards, an Action explicitly allows a matched pattern, eg ec2:* will explicitly allow all ec2 permissions
A NotAction
will explicitly allow all actions that don't match a
specific pattern eg iam:* allows all non iam actions
Return type
AllowedActionCollection
toil.lib.aws.iam.add_to_action_collection(a, b)
Combines two action collections
Parameters
|
• |
a ( AllowedActionCollection ) |
|||
|
• |
b ( AllowedActionCollection ) |
Return type
AllowedActionCollection
toil.lib.aws.iam.policy_permissions_allow(given_permissions,
required_permissions=[])
Check whether given set of
actions are a subset of another given set of actions,
returns true if they are otherwise false and prints a
warning.
Parameters
|
• |
required_permissions ( list[str] ) -- Dictionary containing actions required, keyed by resource |
||
|
• |
given_permissions ( AllowedActionCollection ) -- Set of actions that are granted to a user or role |
Return type
bool
toil.lib.aws.iam.permission_matches_any(perm, list_perms)
Takes a permission and checks
whether it's contained within a list of given permissions
Returns True if it is otherwise False
Parameters
|
• |
perm ( str ) -- Permission to check in string form |
||
|
• |
list_perms ( list[str] ) -- Permission list to check against |
Return type
bool
toil.lib.aws.iam.get_actions_from_policy_document(policy_doc)
Given a policy document, go
through each statement and create an AllowedActionCollection
representing the permissions granted in the policy document.
Parameters
policy_doc ( mypy_boto3_iam.type_defs.PolicyDocumentDictTypeDef ) -- A policy document to examine
Return type
AllowedActionCollection
toil.lib.aws.iam.allowed_actions_attached(iam, attached_policies)
Go through all attached policy
documents and create an AllowedActionCollection representing
granted permissions.
Parameters
|
• |
iam ( mypy_boto3_iam.IAMClient ) -- IAM client to use |
||
|
• |
attached_policies (- list[mypy_boto3_iam.type_defs.AttachedPolicyTypeDef] ) -- Attached policies |
Return type
AllowedActionCollection
toil.lib.aws.iam.allowed_actions_roles(iam, policy_names, role_name)
Returns a dictionary containing
a list of all aws actions allowed for a given role. This
dictionary is keyed by resource and gives a list of policies
allowed on that resource.
Parameters
|
• |
iam ( mypy_boto3_iam.IAMClient ) -- IAM client to use |
||
|
• |
policy_names ( list[str] ) -- Name of policy document associated with a role |
||
|
• |
role_name ( str ) -- Name of role to get associated policies |
Return type
AllowedActionCollection
toil.lib.aws.iam.collect_policy_actions(policy_documents)
Collect all of the actions
allowed by the given policy documents into one
AllowedActionCollection.
Parameters
policy_documents ( list[Union[str, mypy_boto3_iam.type_defs.PolicyDocumentDictTypeDef]] )
Return type
AllowedActionCollection
toil.lib.aws.iam.allowed_actions_user(iam, policy_names, user_name)
Gets all allowed actions for a
user given by user_name, returns a dictionary, keyed by
resource, with a list of permissions allowed for each given
resource.
Parameters
|
• |
iam ( mypy_boto3_iam.IAMClient ) -- IAM client to use |
||
|
• |
policy_names ( list[str] ) -- Name of policy document associated with a user |
||
|
• |
user_name ( str ) -- Name of user to get associated policies |
Return type
AllowedActionCollection
toil.lib.aws.iam.allowed_actions_group(iam, policy_names, group_name)
Gets all allowed actions for a
group given by group_name, returns a dictionary, keyed by
resource, with a list of permissions allowed for each given
resource.
Parameters
|
• |
iam ( mypy_boto3_iam.IAMClient ) -- IAM client to use |
||
|
• |
policy_names ( list[str] ) -- Name of policy document associated with a user |
||
|
• |
group_name ( str ) -- Name of group to get associated policies |
Return type
AllowedActionCollection
toil.lib.aws.iam.get_policy_permissions(region)
Returns an action collection
containing lists of all permission grant patterns keyed by
resource that they are allowed upon. Requires AWS
credentials to be associated with a user or assumed role.
Parameters
|
• |
zone -- AWS zone to connect to |
|||
|
• |
region ( str ) |
Return type
AllowedActionCollection
toil.lib.aws.iam.get_aws_account_num()
Returns AWS account num
Return type
Optional[ str ]
toil.lib.aws.s3
Attributes
Functions
Module Contents
toil.lib.aws.s3.logger
toil.lib.aws.s3.list_multipart_uploads(bucket, region,
prefix,
max_uploads=1)
Parameters
|
• |
bucket ( str ) |
|||
|
• |
region ( str ) |
|||
|
• |
prefix ( str ) |
|||
|
• |
max_uploads ( int ) |
Return type
mypy_boto3_s3.type_defs.ListMultipartUploadsOutputTypeDef
toil.lib.aws.session
Attributes
Classes
Functions
Module Contents
toil.lib.aws.session.logger
class toil.lib.aws.session.AWSConnectionManager
Class that represents a connection to AWS. Caches Boto 3 and Boto 2 objects by region.
Access to any kind of item goes through the particular method for the thing you want (session, resource, service, Boto2 Context), and then you pass the region you want to work in, and possibly the type of thing you want, as arguments.
This class is intended to eventually enable multi-region clusters, where connections to multiple regions may need to be managed in the same provisioner.
We also support None for a region, in which case no region will be passed to Boto/Boto3. The caller is responsible for implementing e.g. TOIL_AWS_REGION support.
Since
connection objects may not be thread safe (see <-
https://boto3.amazonaws.com/v1/documentation/api/1.14.31/guide/session.html#multithreading-or-multiprocessing-with-sessions
>),
one is created for each thread that calls the relevant
lookup method.
sessions_by_region:
dict
[
str
|
None
,
threading.local
]
resource_cache:
dict
[
tuple
[
str
|
None
,
str
,
str
|
None
],
threading.local
]
client_cache:
dict
[
tuple
[
str
|
None
,
str
,
str
|
None
],
threading.local
]
boto2_cache:
dict
[
tuple
[
str
|
None
,
str
],
threading.local
]
session(region)
Get the Boto3 Session to use
for the given region.
Parameters
region ( Optional[str] )
Return type
boto3.session.Session
resource(region:
str
|
None
, service_name:
Literal['s3'],
endpoint_url:
str
|
None
= None)
->
mypy_boto3_s3.S3ServiceResource
resource(region:
str
|
None
,
service_name: Literal['iam'],
endpoint_url:
str
|
None
= None)
->
mypy_boto3_iam.IAMServiceResource
resource(region:
str
|
None
,
service_name: Literal['ec2'],
endpoint_url:
str
|
None
= None)
->
mypy_boto3_ec2.EC2ServiceResource
Get the Boto3 Resource to use
with the given service (like 'ec2') in the given region.
Parameters
endpoint_url -- AWS endpoint URL to use for the client. If not specified, a default is used.
client(region:
str
|
None
, service_name:
Literal['ec2'],
endpoint_url:
str
|
None
= None,
config: botocore.client.Config
|
None
= None) -> mypy_boto3_ec2.EC2Client
client(region:
str
|
None
,
service_name: Literal['iam'],
endpoint_url:
str
|
None
= None,
config: botocore.client.Config
|
None
= None) -> mypy_boto3_iam.IAMClient
client(region:
str
|
None
,
service_name: Literal['s3'],
endpoint_url:
str
|
None
= None,
config: botocore.client.Config
|
None
= None) -> mypy_boto3_s3.S3Client
client(region:
str
|
None
,
service_name: Literal['sts'],
endpoint_url:
str
|
None
= None,
config: botocore.client.Config
|
None
= None) -> mypy_boto3_sts.STSClient
client(region:
str
|
None
,
service_name: Literal['sdb'],
endpoint_url:
str
|
None
= None,
config: botocore.client.Config
|
None
= None) ->
mypy_boto3_sdb.SimpleDBClient
client(region:
str
|
None
,
service_name: Literal['autoscaling'],
endpoint_url:
str
|
None
= None,
config: botocore.client.Config
|
None
= None) ->
mypy_boto3_autoscaling.AutoScalingClient
Get the Boto3 Client to use
with the given service (like 'ec2') in the given region.
Parameters
|
• |
endpoint_url -- AWS endpoint URL to use for the client. If not specified, a default is used. |
||
|
• |
config -- Custom configuration to use for the client. |
toil.lib.aws.session.establish_boto3_session(region_name=None)
Get a Boto 3 session usable by the current thread.
This function
may not always establish a
new
session; it can be
memoized.
Parameters
region_name ( Optional[str] )
Return type
boto3.Session
toil.lib.aws.session.client(service_name:
Literal['ec2'], region_name:
str
|
None
= None, endpoint_url:
str
|
None
= None, config:
botocore.client.Config |
None
= None) ->
mypy_boto3_ec2.EC2Client
toil.lib.aws.session.client(service_name: Literal['iam'],
region_name:
str
|
None
= None, endpoint_url:
str
|
None
= None, config:
botocore.client.Config |
None
= None) ->
mypy_boto3_iam.IAMClient
toil.lib.aws.session.client(service_name: Literal['s3'],
region_name:
str
|
None
= None, endpoint_url:
str
|
None
= None, config:
botocore.client.Config |
None
= None) ->
mypy_boto3_s3.S3Client
toil.lib.aws.session.client(service_name: Literal['sts'],
region_name:
str
|
None
= None, endpoint_url:
str
|
None
= None, config:
botocore.client.Config |
None
= None) ->
mypy_boto3_sts.STSClient
toil.lib.aws.session.client(service_name: Literal['sdb'],
region_name:
str
|
None
= None, endpoint_url:
str
|
None
= None, config:
botocore.client.Config |
None
= None) ->
mypy_boto3_sdb.SimpleDBClient
toil.lib.aws.session.client(service_name:
Literal['autoscaling'],
region_name:
str
|
None
= None,
endpoint_url:
str
|
None
= None,
config: botocore.client.Config |
None
= None)
->
mypy_boto3_autoscaling.AutoScalingClient
Get a Boto 3 client for a particular AWS service, usable by the current thread.
Global alternative to AWSConnectionManager.
toil.lib.aws.session.resource(service_name:
Literal['s3'], region_name:
str
|
None
= None, endpoint_url:
str
|
None
= None) ->
mypy_boto3_s3.S3ServiceResource
toil.lib.aws.session.resource(service_name: Literal['iam'],
region_name:
str
|
None
= None,
endpoint_url:
str
|
None
= None)
->
mypy_boto3_iam.IAMServiceResource
toil.lib.aws.session.resource(service_name: Literal['ec2'],
region_name:
str
|
None
= None,
endpoint_url:
str
|
None
= None)
->
mypy_boto3_ec2.EC2ServiceResource
Get a Boto 3 resource for a particular AWS service, usable by the current thread.
Global alternative to AWSConnectionManager.
toil.lib.aws.utils
Attributes
Exceptions
Functions
Module Contents
toil.lib.aws.utils.ClientError
= None
toil.lib.aws.utils.logger
toil.lib.aws.utils.THROTTLED_ERROR_CODES = ['Throttling',
'ThrottlingException', 'ThrottledException',
'RequestThrottledException',...
toil.lib.aws.utils.delete_sdb_domain(sdb_domain_name,
region=None,
quiet=True)
Parameters
|
• |
sdb_domain_name ( str ) |
|||
|
• |
region ( Optional[str] ) |
|||
|
• |
quiet ( bool ) |
Return type
None
toil.lib.aws.utils.connection_reset(e)
Return true if an error is a
connection reset error.
Parameters
e ( Exception )
Return type
bool
toil.lib.aws.utils.connection_error(e)
Return True if an error
represents a failure to make a network connection.
Parameters
e ( Exception )
Return type
bool
toil.lib.aws.utils.retryable_s3_errors(e)
Return true if this is an error
from S3 that looks like we ought to retry our request.
Parameters
e ( Exception )
Return type
bool
toil.lib.aws.utils.retry_s3(delays=DEFAULT_DELAYS,
timeout=DEFAULT_TIMEOUT,
predicate=retryable_s3_errors)
Retry iterator of context
managers specifically for S3 operations.
Parameters
|
• |
delays ( collections.abc.Iterable[float] ) |
|||
|
• |
timeout ( float ) |
|||
|
• |
predicate ( Callable[[Exception], bool] ) |
Return type
collections.abc.Iterator [ContextManager[None]]
toil.lib.aws.utils.delete_s3_bucket(s3_resource, bucket, quiet=True)
Delete the given S3 bucket.
Parameters
|
• |
s3_resource ( mypy_boto3_s3.S3ServiceResource ) |
|||
|
• |
bucket ( str ) |
|||
|
• |
quiet ( bool ) |
Return type
None
toil.lib.aws.utils.create_s3_bucket(s3_resource, bucket_name, region)
Create an AWS S3 bucket, using the given Boto3 S3 session, with the given name, in the given region.
Supports the us-east-1 region, where bucket creation is special.
ALL
S3
bucket creation should use this function.
Parameters
|
• |
s3_resource ( mypy_boto3_s3.S3ServiceResource ) |
|||
|
• |
bucket_name ( str ) |
|||
|
• |
region ( toil.lib.aws.AWSRegionName ) |
Return type
mypy_boto3_s3.service_resource.Bucket
toil.lib.aws.utils.enable_public_objects(bucket_name)
Enable a bucket to contain objects which are public.
This adjusts the bucket's Public Access Block setting to not block all public access, and also adjusts the bucket's Object Ownership setting to a setting which enables object ACLs.
Does not touch the account 's Public Access Block setting, which can also interfere here. That is probably best left to the account administrator.
This
configuration used to be the default, and is what most of
Toil's code is written to expect, but it was changed so that
new buckets default to the more restrictive setting <-
https://aws.amazon.com/about-aws/whats-new/2022/12/amazon-s3-automatically-enable-block-public-access-disable-access-control-lists-buckets-april-2023/
>,
with the expectation that people would write IAM policies
for the buckets to allow public access if needed. Toil
expects to be able to make arbitrary objects in arbitrary
places public, and naming them all in an IAM policy would be
a very awkward way to do it. So we restore the old behavior.
Parameters
bucket_name ( str )
Return type
None
exception toil.lib.aws.utils.NoBucketLocationError
Bases: Exception
Error to represent that we could not get a location for a bucket.
toil.lib.aws.utils.get_bucket_region(bucket_name,
endpoint_url=None,
only_strategies=None)
Get the AWS region name associated with the given S3 bucket, or raise NoBucketLocationError.
Does not log at info level or above when this does not work; failures are expected in some contexts.
Takes an
optional S3 API URL override.
Parameters
|
• |
only_strategies ( Optional[set[int]] ) -- For testing, use only strategies with 1-based numbers in this set. |
||
|
• |
bucket_name ( str ) |
||
|
• |
endpoint_url ( Optional[str] ) |
Return type
str
toil.lib.aws.utils.region_to_bucket_location(region)
Parameters
region ( str )
Return type
str
toil.lib.aws.utils.bucket_location_to_region(location)
Parameters
location ( Optional[str] )
Return type
str
toil.lib.aws.utils.get_object_for_url(url, existing=None)
Extracts a key (object) from a given parsed s3:// URL.
If existing is
true and the object does not exist, raises
FileNotFoundError.
Parameters
|
• |
existing ( bool ) -- If True, key is expected to exist. If False, key is expected not to exists and it will be created. If None, the key will be created if it doesn't exist. |
||
|
• |
url ( urllib.parse.ParseResult ) |
Return type
mypy_boto3_s3.service_resource.Object
toil.lib.aws.utils.list_objects_for_url(url)
Extracts a key (object) from a
given parsed s3:// URL. The URL will be supplemented with a
trailing slash if it is missing.
Parameters
url ( urllib.parse.ParseResult )
Return type
list [ str ]
toil.lib.aws.utils.flatten_tags(tags)
Convert tags from a key to
value dict into a list of 'Key': xxx, 'Value': xxx dicts.
Parameters
tags ( dict[str, str] )
Return type
list [ dict [ str , str ]]
toil.lib.aws.utils.boto3_pager(requestor_callable,
result_attribute_name, **kwargs)
Yield all the results from
calling the given Boto 3 method with the given keyword
arguments, paging through the results using the Marker or
NextToken, and fetching out and looping over the list in the
response with the given attribute name.
Parameters
|
• |
requestor_callable ( Callable[Ellipsis, Any] ) |
|||
|
• |
result_attribute_name ( str ) |
|||
|
• |
kwargs ( Any ) |
Return type
collections.abc.Iterable [Any]
toil.lib.aws.utils.get_item_from_attributes(attributes, name)
Given a list of attributes, find the attribute associated with the name and return its corresponding value.
The attribute_list will be a list of TypedDict's (which boto3 SDB functions commonly return), where each TypedDict has a "Name" and "Value" key value pair. This function grabs the value out of the associated TypedDict.
If the
attribute with the name does not exist, the function will
return None.
Parameters
|
• |
attributes (- list[mypy_boto3_sdb.type_defs.AttributeTypeDef] ) -- list of attributes |
||
|
• |
name ( str ) -- name of the attribute |
Returns
value of the attribute
Return type
Any
Attributes
Functions
Package Contents
toil.lib.aws.AWSRegionName
toil.lib.aws.AWSServerErrors
toil.lib.aws.logger
toil.lib.aws.get_current_aws_region()
Return the AWS region that the
currently configured AWS zone (see get_current_aws_zone())
is in.
Return type
Optional[ str ]
toil.lib.aws.get_aws_zone_from_environment()
Get the AWS zone from
TOIL_AWS_ZONE if set.
Return type
Optional[ str ]
toil.lib.aws.get_aws_zone_from_metadata()
Get the AWS zone from instance
metadata, if on EC2 and the boto module is available.
Otherwise, gets the AWS zone from ECS task metadata, if on
ECS.
Return type
Optional[ str ]
toil.lib.aws.get_aws_zone_from_boto()
Get the AWS zone from the Boto3
config file or from AWS_DEFAULT_REGION, if it is configured
and the boto3 module is available.
Return type
Optional[ str ]
toil.lib.aws.get_aws_zone_from_environment_region()
Pick an AWS zone in the region
defined by TOIL_AWS_REGION, if it is set.
Return type
Optional[ str ]
toil.lib.aws.get_current_aws_zone()
Get the currently configured or occupied AWS zone to use.
Reports the TOIL_AWS_ZONE environment variable if set.
Otherwise, if we have boto and are running on EC2, or if we are on ECS, reports the zone we are running in.
Otherwise, if we have the TOIL_AWS_REGION variable set, chooses a zone in that region.
Finally, if we have boto2, and a default region is configured in Boto 2, chooses a zone in that region.
Returns
'us-east-1a' if no method can produce a zone to use.
Return type
Optional[ str ]
toil.lib.aws.zone_to_region(zone)
Get a region (e.g. us-west-2)
from a zone (e.g. us-west-1c).
Parameters
zone ( str )
Return type
AWSRegionName
toil.lib.aws.running_on_ec2()
Return True if we are currently
running on EC2, and false otherwise.
Return type
bool
toil.lib.aws.running_on_ecs()
Return True if we are currently
running on Amazon ECS, and false otherwise.
Return type
bool
toil.lib.aws.build_tag_dict_from_env(environment=os.environ)
Parameters
environment ( collections.abc.MutableMapping[str, str] )
Return type
dict [ str , str ]
toil.lib.bioio
Functions
Module Contents
toil.lib.bioio.system(command)
A convenience wrapper around subprocess.check_call that logs the command before passing it on. The command can be either a string or a sequence of strings. If it is a string shell=True will be passed to subprocess.check_call. :type command: str | sequence[string]
toil.lib.bioio.getLogLevelString(logger=None)
toil.lib.bioio.setLoggingFromOptions(options)
toil.lib.bioio.getTempFile(suffix='', rootDir=None)
toil.lib.compatibility
Functions
Module Contents
toil.lib.compatibility.deprecated(new_function_name)
Parameters
new_function_name ( str )
Return type
Callable[Ellipsis, Any]
toil.lib.compatibility.compat_bytes(s)
Parameters
s ( Union[bytes, str] )
Return type
str
toil.lib.compatibility.compat_bytes_recursive(data)
Convert a tree of objects over
bytes to objects over strings.
Parameters
data ( Any )
Return type
Any
toil.lib.conversions
Conversion utilities for mapping memory, disk, core declarations from strings to numbers and vice versa. Also contains general conversion functions
Attributes
Functions
Module Contents
toil.lib.conversions.BINARY_PREFIXES
= ['ki', 'mi', 'gi', 'ti', 'pi',
'ei', 'kib', 'mib', 'gib', 'tib', 'pib', 'eib']
toil.lib.conversions.DECIMAL_PREFIXES = ['b', 'k', 'm', 'g',
't', 'p',
'e', 'kb', 'mb', 'gb', 'tb', 'pb', 'eb']
toil.lib.conversions.VALID_PREFIXES
toil.lib.conversions.bytes_in_unit(unit='B')
Parameters
unit ( str )
Return type
int
toil.lib.conversions.convert_units(num, src_unit, dst_unit='B')
Returns a float representing
the converted input in dst_units.
Parameters
|
• |
num ( float ) |
|||
|
• |
src_unit ( str ) |
|||
|
• |
dst_unit ( str ) |
Return type
float
toil.lib.conversions.parse_memory_string(string)
Given a string representation
of some memory (i.e. '1024 Mib'), return the number and
unit.
Parameters
string ( str )
Return type
tuple [ float , str ]
toil.lib.conversions.human2bytes(string)
Given a string representation
of some memory (i.e. '1024 Mib'), return the integer number
of bytes.
Parameters
string ( str )
Return type
int
toil.lib.conversions.bytes2human(n)
Return a binary value as a
human readable string with units.
Parameters
n ( SupportsInt )
Return type
str
toil.lib.conversions.b_to_mib(n)
Convert a number from bytes to
mibibytes.
Parameters
n ( Union[int, float] )
Return type
float
toil.lib.conversions.mib_to_b(n)
Convert a number from mibibytes
to bytes.
Parameters
n ( Union[int, float] )
Return type
float
toil.lib.conversions.hms_duration_to_seconds(hms)
Parses a given time string in
hours:minutes:seconds, returns an equivalent total seconds
value
Parameters
hms ( str )
Return type
float
toil.lib.conversions.strtobool(val)
Make a human-readable string into a bool.
Convert a
string along the lines of "y", "1",
"ON", "TrUe", or "Yes" to
True, and the corresponding false-ish values to False.
Parameters
val ( str )
Return type
bool
toil.lib.conversions.opt_strtobool(b)
Convert an optional string
representation of bool to None or bool
Parameters
b ( Optional[str] )
Return type
Optional[ bool ]
toil.lib.docker
Attributes
Functions
Module Contents
toil.lib.docker.logger
toil.lib.docker.FORGO = 0
toil.lib.docker.STOP = 1
toil.lib.docker.RM = 2
toil.lib.docker.dockerCheckOutput(*args, **kwargs)
toil.lib.docker.dockerCall(*args, **kwargs)
toil.lib.docker.subprocessDockerCall(*args, **kwargs)
toil.lib.docker.apiDockerCall(job, image, parameters=None,
deferParam=None, volumes=None, working_dir=None,
containerName=None,
entrypoint=None, detach=False, log_config=None,
auto_remove=None,
remove=False, user=None, environment=None, stdout=None,
stderr=False,
stream=False, demux=False, streamfile=None,
accelerators=None,
timeout=365 * 24 * 60 * 60, **kwargs)
A toil wrapper for the python docker API.
Docker API Docs: - https://docker-py.readthedocs.io/en/stable/index.html Docker API Code: https://github.com/docker/docker-py
This implements docker's python API within toil so that calls are run as jobs, with the intention that failed/orphaned docker jobs be handled appropriately.
Example of using dockerCall in toil to index a FASTA file with SAMtools:
def
toil_job(job):
working_dir = job.fileStore.getLocalTempDir()
path = job.fileStore.readGlobalFile(ref_id,
os.path.join(working_dir, 'ref.fasta')
parameters = ['faidx', path]
apiDockerCall(job,
image='quay.io/ucgc_cgl/samtools:latest',
working_dir=working_dir,
parameters=parameters)
Note that when
run with detach=False, or with detach=True and stdout=True
or stderr=True, this is a blocking call. When run with
detach=True and without output capture, the container is
started and returned without waiting for it to finish.
Parameters
|
• |
job ( toil.Job.job ) -- The Job instance for the calling function. |
||
|
• |
image ( str ) -- Name of the Docker image to be used. (e.g. 'quay.io/ucsc_cgl/samtools:latest') |
||
|
• |
parameters ( list[str] ) -- A list of string elements. If there are multiple elements, these will be joined with spaces. This handling of multiple elements provides backwards compatibility with previous versions which called docker using subprocess.check_call(). If list of lists: list[list[str]], then treat as successive commands chained with pipe. |
||
|
• |
working_dir ( str ) -- The working directory. |
||
|
• |
deferParam ( int ) -- Action to take on the container upon job completion. FORGO (0) leaves the container untouched and running. STOP (1) Sends SIGTERM, then SIGKILL if necessary to the container. RM (2) Immediately send SIGKILL to the container. This is the default behavior if deferParam is set to None. |
||
|
• |
name ( str ) -- The name/ID of the container. |
||
|
• |
entrypoint ( str ) -- Prepends commands sent to the container. See: - https://docker-py.readthedocs.io/en/stable/containers.html |
||
|
• |
detach ( bool ) -- Run the container in detached mode. (equivalent to '-d') |
||
|
• |
stdout ( bool ) -- Return logs from STDOUT when detach=False (default: True). Block and capture stdout to a file when detach=True (default: False). Output capture defaults to output.log, and can be specified with the "streamfile" kwarg. |
||
|
• |
stderr ( bool ) -- Return logs from STDERR when detach=False (default: False). Block and capture stderr to a file when detach=True (default: False). Output capture defaults to output.log, and can be specified with the "streamfile" kwarg. |
||
|
• |
stream ( bool ) -- If True and detach=False, return a log generator instead of a string. Ignored if detach=True. (default: False). |
||
|
• |
demux ( bool ) -- Similar to demux in container.exec_run(). If True and detach=False, returns a tuple of (stdout, stderr). If stream=True, returns a log generator with tuples of (stdout, stderr). Ignored if detach=True. (default: False). |
||
|
• |
streamfile ( str ) -- Collect container output to this file if detach=True and stderr and/or stdout are True. Defaults to "output.log". |
||
|
• |
log_config ( dict ) -- Specify the logs to return from the container. See: - https://docker-py.readthedocs.io/en/stable/containers.html |
||
|
• |
remove ( bool ) -- Remove the container on exit or not. |
||
|
• |
user ( str ) -- The container will be run with the privileges of the user specified. Can be an actual name, such as 'root' or 'lifeisaboutfishtacos', or it can be the uid or gid of the user ('0' is root; '1000' is an example of a less privileged uid or gid), or a complement of the uid:gid (RECOMMENDED), such as '0:0' (root user : root group) or '1000:1000' (some other user : some other user group). |
||
|
• |
environment -- Allows one to set environment variables inside of the container, such as: |
||
|
• |
timeout ( int ) -- Use the given timeout in seconds for interactions with the Docker daemon. Note that the underlying docker module is not always able to abort ongoing reads and writes in order to respect the timeout. Defaults to 1 year (i.e. wait essentially indefinitely). |
||
|
• |
accelerators ( Optional[list[int]] ) -- Toil accelerator numbers (usually GPUs) to forward to the container. These are interpreted in the current Python process's environment. See toil.lib.accelerators.get_individual_local_accelerators() for the menu of available accelerators. |
||
|
• |
kwargs -- Additional keyword arguments supplied to the docker API's run command. The list is 75 keywords total, for examples and full documentation see: - https://docker-py.readthedocs.io/en/stable/containers.html |
Returns
Returns the standard output/standard error text, as requested, when detach=False. Returns the underlying docker.models.containers.Container object from the Docker API when detach=True.
toil.lib.docker.dockerKill(container_name,
gentleKill=False,
remove=False, timeout=365 * 24 * 60 * 60)
Immediately kills a container.
Equivalent to "docker kill": -
https://docs.docker.com/engine/reference/commandline/kill/
Parameters
|
• |
container_name ( str ) -- Name of the container being killed. |
||
|
• |
gentleKill ( bool ) -- If True, trigger a graceful shutdown. |
||
|
• |
remove ( bool ) -- If True, remove the container after it exits. |
||
|
• |
timeout ( int ) -- Use the given timeout in seconds for interactions with the Docker daemon. Note that the underlying docker module is not always able to abort ongoing reads and writes in order to respect the timeout. Defaults to 1 year (i.e. wait essentially indefinitely). |
Return type
None
toil.lib.docker.dockerStop(container_name, remove=False)
Gracefully kills a container.
Equivalent to "docker stop": -
https://docs.docker.com/engine/reference/commandline/stop/
Parameters
|
• |
container_name ( str ) -- Name of the container being stopped. |
||
|
• |
remove ( bool ) -- If True, remove the container after it exits. |
Return type
None
toil.lib.docker.containerIsRunning(container_name,
timeout=365 * 24 *
60 * 60)
Checks whether the container is
running or not.
Parameters
|
• |
container_name ( str ) -- Name of the container being checked. |
||
|
• |
timeout ( int ) -- Use the given timeout in seconds for interactions with the Docker daemon. Note that the underlying docker module is not always able to abort ongoing reads and writes in order to respect the timeout. Defaults to 1 year (i.e. wait essentially indefinitely). |
Returns
True if status is 'running', False if status is anything else, and None if the container does not exist.
toil.lib.docker.getContainerName(job)
Create a random string including the job name, and return it. Name will match [a-zA-Z0-9][a-zA-Z0-9_.-] .
toil.lib.ec2
Attributes
Exceptions
Functions
Module Contents
toil.lib.ec2.a_short_time
= 5
toil.lib.ec2.a_long_time
toil.lib.ec2.logger
exception toil.lib.ec2.UserError(message=None,
cause=None)
Bases: RuntimeError
Unspecified run-time error.
toil.lib.ec2.not_found(e)
toil.lib.ec2.inconsistencies_detected(e)
toil.lib.ec2.INCONSISTENCY_ERRORS
toil.lib.ec2.retry_ec2(t=a_short_time, retry_for=10 *
a_short_time,
retry_while=not_found)
exception toil.lib.ec2.UnexpectedResourceState(resource,
to_state,
state)
Bases: Exception
Common base class for all non-exit exceptions.
toil.lib.ec2.wait_transition(boto3_ec2,
resource, from_states,
to_state, state_getter=lambda x: ...)
Wait until the specified EC2
resource (instance, image, volume, ...) transitions from any
of the given 'from' states to the specified 'to' state. If
the instance is found in a state other that the to state or
any of the from states, an exception will be thrown.
Parameters
|
• |
resource ( mypy_boto3_ec2.type_defs.InstanceTypeDef ) -- the resource to monitor |
||
|
• |
from_states ( collections.abc.Iterable[str] ) -- a set of states that the resource is expected to be in before the transition occurs |
||
|
• |
to_state ( str ) -- the state of the resource when this method returns |
||
|
• |
boto3_ec2 ( mypy_boto3_ec2.client.EC2Client ) |
||
|
• |
state_getter ( Callable[[mypy_boto3_ec2.type_defs.InstanceTypeDef], str] ) |
toil.lib.ec2.wait_instances_running(boto3_ec2, instances)
Wait until no instance in the
given iterable is 'pending'. Yield every instance that
entered the running state as soon as it does.
Parameters
|
• |
boto3_ec2 ( mypy_boto3_ec2.client.EC2Client ) -- the EC2 connection to use for making requests |
||
|
• |
instances (- collections.abc.Iterable[mypy_boto3_ec2.type_defs.InstanceTypeDef] ) -- the instances to wait on |
Return type
collections.abc.Generator [mypy_boto3_ec2.type_defs.InstanceTypeDef, None, None]
toil.lib.ec2.wait_spot_requests_active(boto3_ec2,
requests,
timeout=None, tentative=False)
Wait until no spot request in
the given iterator is in the 'open' state or, optionally, a
timeout occurs. Yield spot requests as soon as they leave
the 'open' state.
Parameters
|
• |
boto3_ec2 ( mypy_boto3_ec2.client.EC2Client ) -- ec2 client |
||
|
• |
requests (- collections.abc.Iterable[mypy_boto3_ec2.type_defs.SpotInstanceRequestTypeDef] ) -- The requests to wait on. |
||
|
• |
timeout ( float ) -- Maximum time in seconds to spend waiting or None to wait forever. If a timeout occurs, the remaining open requests will be cancelled. |
||
|
• |
tentative ( bool ) -- if True, give up on a spot request at the earliest indication of it not being fulfilled immediately |
Return type
collections.abc.Iterable [- list [mypy_boto3_ec2.type_defs.SpotInstanceRequestTypeDef]]
toil.lib.ec2.create_spot_instances(boto3_ec2,
price, image_id, spec,
num_instances=1, timeout=None, tentative=False,
tags=None)
Create instances on the spot
market.
Parameters
boto3_ec2 ( mypy_boto3_ec2.client.EC2Client )
Return type
collections.abc.Generator [mypy_boto3_ec2.type_defs.DescribeInstancesResultTypeDef, None, None]
toil.lib.ec2.create_ondemand_instances(boto3_ec2,
image_id, spec,
num_instances=1)
Requests the RunInstances EC2
API call but accounts for the race between recently created
instance profiles, IAM roles and an instance creation that
refers to them.
Parameters
|
• |
boto3_ec2 ( mypy_boto3_ec2.client.EC2Client ) |
|||
|
• |
image_id ( str ) |
|||
|
• |
spec ( collections.abc.Mapping[str, Any] ) |
|||
|
• |
num_instances ( int ) |
Return type
list [mypy_boto3_ec2.type_defs.InstanceTypeDef]
toil.lib.ec2.increase_instance_hop_limit(boto3_ec2, boto_instance_list)
Increase the default HTTP hop limit, as we are running Toil and Kubernetes inside a Docker container, so the default hop limit of 1 will not be enough when grabbing metadata information with ec2_metadata
Must be called
after the instances are guaranteed to be running.
Parameters
|
• |
boto_instance_list (- list[mypy_boto3_ec2.type_defs.InstanceTypeDef] ) -- List of boto instances to modify |
||
|
• |
boto3_ec2 ( mypy_boto3_ec2.client.EC2Client ) |
Returns
Return type
None
toil.lib.ec2.prune(bushy)
Prune entries in the given dict
with false-y values. Boto3 may not like None and instead
wants no key.
Parameters
bushy ( dict )
Return type
dict
toil.lib.ec2.iam_client
toil.lib.ec2.wait_until_instance_profile_arn_exists(instance_profile_arn)
Parameters
instance_profile_arn ( str )
toil.lib.ec2.create_instances(ec2_resource,
image_id, key_name,
instance_type, num_instances=1, security_group_ids=None,
user_data=None, block_device_map=None,
instance_profile_arn=None,
placement_az=None, subnet_id=None, tags=None)
Replaces
create_ondemand_instances. Uses boto3 and returns a list of
Boto3 instance dicts.
See "create_instances" (returns a list of
ec2.Instance objects):
- https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances
Not to be confused with
"run_instances" (same input args;
returns a dictionary):
- https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Client.run_instances
Tags, if given,
are applied to the instances, and all volumes.
Parameters
|
• |
ec2_resource ( mypy_boto3_ec2.service_resource.EC2ServiceResource ) |
|||
|
• |
image_id ( str ) |
|||
|
• |
key_name ( str ) |
|||
|
• |
instance_type ( str ) |
|||
|
• |
num_instances ( int ) |
|||
|
• |
security_group_ids ( Optional[list] ) |
|||
|
• |
user_data ( Optional[Union[str, bytes]] ) |
|||
|
• |
block_device_map ( Optional[list[dict]] ) |
|||
|
• |
instance_profile_arn ( Optional[str] ) |
|||
|
• |
placement_az ( Optional[str] ) |
|||
|
• |
subnet_id ( str ) |
|||
|
• |
tags ( Optional[dict[str, str]] ) |
Return type
list [mypy_boto3_ec2.service_resource.Instance]
toil.lib.ec2.create_launch_template(ec2_client,
template_name,
image_id, key_name, instance_type, security_group_ids=None,
user_data=None, block_device_map=None,
instance_profile_arn=None,
placement_az=None, subnet_id=None, tags=None)
Creates a launch template with the given name for launching instances with the given parameters.
We only ever use the default version of any launch template.
Internally
calls -
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html?highlight=create_launch_template#EC2.Client.create_launch_template
Parameters
|
• |
tags ( Optional[dict[str, str]] ) -- Tags, if given, are applied to the template itself, all instances, and all volumes. |
||
|
• |
user_data ( Optional[Union[str, bytes]] ) -- non-base64-encoded user data to pass to the instances. |
||
|
• |
ec2_client ( mypy_boto3_ec2.client.EC2Client ) |
||
|
• |
template_name ( str ) |
||
|
• |
image_id ( str ) |
||
|
• |
key_name ( str ) |
||
|
• |
instance_type ( str ) |
||
|
• |
security_group_ids ( Optional[list] ) |
||
|
• |
block_device_map ( Optional[list[dict]] ) |
||
|
• |
instance_profile_arn ( Optional[str] ) |
||
|
• |
placement_az ( Optional[str] ) |
||
|
• |
subnet_id ( Optional[str] ) |
Returns
the ID of the launch template.
Return type
str
toil.lib.ec2.create_auto_scaling_group(autoscaling_client,
asg_name,
launch_template_ids, vpc_subnets, min_size, max_size,
instance_types=None, spot_bid=None, spot_cheapest=False,
tags=None)
Create a new Auto Scaling Group
with the given name (which is also its unique identifier).
Parameters
|
• |
autoscaling_client ( mypy_boto3_autoscaling.client.AutoScalingClient ) -- Boto3 client for autoscaling. |
||
|
• |
asg_name ( str ) -- Unique name for the autoscaling group. |
||
|
• |
launch_template_ids ( dict[str, str] ) -- ID of the launch template to make instances from, for each instance type. |
||
|
• |
vpc_subnets ( list[str] ) -- One or more subnet IDs to place instances in the group into. Determine the availability zone(s) instances will launch into. |
||
|
• |
min_size ( int ) -- Minimum number of instances to have in the group at all times. |
||
|
• |
max_size ( int ) -- Maximum number of instances to allow in the group at any time. |
||
|
• |
instance_types ( Optional[collections.abc.Iterable[- str]] ) -- Use a pool over the given instance types, instead of the type given in the launch template. For on-demand groups, this is a prioritized list. For spot groups, we let AWS balance according to spot_strategy. Must be 20 types or shorter. |
||
|
• |
spot_bid ( Optional[float] ) -- If set, the ASG will be a spot market ASG. Bid is in dollars per instance hour. All instance types in the group are bid on equivalently. |
||
|
• |
spot_cheapest ( bool ) -- If true, use the cheapest spot instances available out of instance_types, instead of the spot instances that minimize eviction probability. |
||
|
• |
tags ( Optional[dict[str, str]] ) -- Tags to apply to the ASG only. Tags for the instances should be added to the launch template instead. |
Return type
None
The default version of the launch template is used.
toil.lib.ec2nodes
Attributes
Classes
Functions
Module Contents
toil.lib.ec2nodes.logger
toil.lib.ec2nodes.manager
toil.lib.ec2nodes.dirname
toil.lib.ec2nodes.region_json_dirname
toil.lib.ec2nodes.EC2Regions
class toil.lib.ec2nodes.InstanceType(name, cores, memory,
disks,
disk_capacity, architecture)
Parameters
|
• |
name ( str ) |
|||
|
• |
cores ( int ) |
|||
|
• |
memory ( float ) |
|||
|
• |
disks ( float ) |
|||
|
• |
disk_capacity ( float ) |
|||
|
• |
architecture ( str ) |
__slots__ = ('name',
'cores', 'memory', 'disks',
'disk_capacity', 'architecture')
|
name |
||
|
cores |
||
|
memory |
||
|
disks |
disk_capacity
architecture
__str__()
Return type
str
__eq__(other)
Parameters
other ( object )
Return type
bool
toil.lib.ec2nodes.is_number(s)
Determines if a unicode string
(that may include commas) is a number.
Parameters
s ( str ) -- Any unicode string.
Returns
True if s represents a number, False otherwise.
Return type
bool
toil.lib.ec2nodes.parse_storage(storage_info)
Parses EC2 JSON storage param
string into a number.
Examples:
"2 x 160 SSD" "3 x 2000 HDD" "EBS only" "1 x 410" "8 x 1.9 NVMe SSD" "900 GB NVMe SSD"
Parameters
storage_info ( str ) -- EC2 JSON storage param string.
Returns
Two floats representing: (# of disks), and (disk_capacity in GiB of each disk).
Return type
Union[ list [ int ], tuple [Union[ int , float ], float ]]
toil.lib.ec2nodes.parse_memory(mem_info)
Returns EC2 'memory' string as a float.
Format should
always be '#' GiB (example: '244 GiB' or '1,952 GiB').
Amazon loves to put commas in their numbers, so we have to
accommodate that. If the syntax ever changes, this will
raise.
Parameters
mem_info ( str ) -- EC2 JSON memory param string.
Returns
A float representing memory in GiB.
Return type
float
toil.lib.ec2nodes.download_region_json(filename, region='us-east-1')
Downloads and writes the AWS Billing JSON to a file using the AWS pricing API.
See:
https://aws.amazon.com/blogs/aws/new-aws-price-list-api/
Returns
A dict of InstanceType objects, where the key is the string: aws instance name (example: 't2.micro'), and the value is an InstanceType object representing that aws instance name.
Parameters
|
• |
filename ( str ) |
|||
|
• |
region ( str ) |
Return type
None
toil.lib.ec2nodes.reduce_region_json_size(filename)
Deletes information in the json file that we don't need, and rewrites it. This makes the file smaller.
The reason
being: we used to download the unified AWS Bulk API JSON,
which eventually crept up to 5.6Gb, the loading of which
could not be done on a 32Gb RAM machine. Now we download
each region JSON individually (with AWS's new Query API),
but even those may eventually one day grow ridiculously
large, so we do what we can to keep the file sizes down (and
thus also the amount loaded into memory) to keep this script
working for longer.
Parameters
filename ( str )
Return type
list [ dict [ str , Any]]
toil.lib.ec2nodes.updateStaticEC2Instances()
Generates a new python file of fetchable EC2 Instances by region with current prices and specs.
Takes a few
(˜3+) minutes to run (you'll need decent internet).
Returns
Nothing. Writes a new 'generatedEC2Lists.py' file.
Return type
None
toil.lib.encryption
Submodules
toil.lib.encryption.conftest
Attributes
Module Contents
toil.lib.encryption.conftest.collect_ignore = []
toil.lib.exceptions
Exceptions
Classes
Functions
Module Contents
class toil.lib.exceptions.panic(log=None)
The Python idiom for reraising a primary exception fails when the except block raises a secondary exception, e.g. while trying to cleanup. In that case the original exception is lost and the secondary exception is reraised. The solution seems to be to save the primary exception info as returned from sys.exc_info() and then reraise that.
This is a contextmanager that should be used like this
|
try: |
# do something that can fail |
except:
with panic( log ):
# do cleanup that can also fail
If a logging logger is passed to panic(), any secondary Exception raised within the with block will be logged. Otherwise those exceptions are swallowed. At the end of the with block the primary exception will be reraised.
|
log |
exc_info =
None
__enter__()
__exit__(*exc_info)
toil.lib.exceptions.raise_(exc_type, exc_value, traceback)
Return type
None
exception toil.lib.exceptions.UnimplementedURLException(url, operation)
Bases: RuntimeError
Unspecified
run-time error.
Parameters
|
• |
url ( urllib.parse.ParseResult ) |
|||
|
• |
operation ( str ) |
toil.lib.expando
Classes
Module Contents
class toil.lib.expando.Expando(*args, **kwargs)
Bases: dict
Pass initial attributes to the constructor:
>>> o
= Expando(foo=42)
>>> o.foo
42
Dynamically create new attributes:
>>>
o.bar = 'hi'
>>> o.bar
'hi'
Expando is a dictionary:
>>>
isinstance(o,dict)
True
>>> o['foo']
42
Works great with JSON:
>>>
import json
>>> s='{"foo":42}'
>>> o = json.loads(s,object_hook=Expando)
>>> o.foo
42
>>> o.bar = 'hi'
>>> o.bar
'hi'
And since Expando is a dict, it serializes back to JSON just fine:
>>>
json.dumps(o, sort_keys=True)
'{"bar": "hi", "foo": 42}'
Attributes can be deleted, too:
>>> o
= Expando(foo=42)
>>> o.foo
42
>>> del o.foo
>>> o.foo
Traceback (most recent call last):
...
AttributeError: 'Expando' object has no attribute 'foo'
>>> o['foo']
Traceback (most recent call last):
...
KeyError: 'foo'
>>>
del o.foo
Traceback (most recent call last):
...
AttributeError: foo
And copied:
>>> o
= Expando(foo=42)
>>> p = o.copy()
>>> isinstance(p,Expando)
True
>>> o == p
True
>>> o is p
False
Same with MagicExpando ...
>>> o
= MagicExpando()
>>> o.foo.bar = 42
>>> p = o.copy()
>>> isinstance(p,MagicExpando)
True
>>> o == p
True
>>> o is p
False
... but the copy is shallow:
>>>
o.foo is p.foo
True
__slots__ = None
__dict__
|
copy() |
Return a shallow copy of the dict. |
class toil.lib.expando.MagicExpando(*args, **kwargs)
Bases: Expando
Use MagicExpando for chained attribute access.
The first time a missing attribute is accessed, it will be set to a new child MagicExpando.
>>>
o=MagicExpando()
>>> o.foo = 42
>>> o
{'foo': 42}
>>> o.bar.hello = 'hi'
>>> o.bar
{'hello': 'hi'}
__getattribute__(name)
Return getattr(self, name).
Parameters
name ( str )
toil.lib.ftp_utils
Attributes
Classes
Module Contents
toil.lib.ftp_utils.logger
class toil.lib.ftp_utils.FtpFsAccess(cache=None)
FTP access with upload.
Taken and
modified from -
https://github.com/ohsu-comp-bio/cwl-tes/blob/03f0096f9fae8acd527687d3460a726e09190c3a/cwl_tes/ftp.py#L37-L251
Parameters
cache ( Optional[dict[Any, ftplib.FTP]] )
|
cache |
netrc = None
exists(fn)
Check if a file/directory
exists over an FTP server :param fn: FTP url :return: True
or false depending on whether the object exists on the
server
Parameters
fn ( str )
Return type
bool
isfile(fn)
Check if the FTP url points to
a file :param fn: FTP url :return: True if url is file, else
false
Parameters
fn ( str )
Return type
bool
isdir(fn)
Check if the FTP url points to
a directory :param fn: FTP url :return: True if url is
directory, else false
Parameters
fn ( str )
Return type
bool
open(fn, mode)
Open an FTP url.
Only supports
reading, no write support. :param fn: FTP url :param mode:
Mode to open FTP url in :return:
Parameters
|
• |
fn ( str ) |
|||
|
• |
mode ( str ) |
Return type
IO[ bytes ]
size(fn)
Get the size of an FTP object
:param fn: FTP url :return: Size of object
Parameters
fn ( str )
Return type
Optional[ int ]
toil.lib.generatedEC2Lists
Attributes
Module Contents
toil.lib.generatedEC2Lists.E2Instances
toil.lib.generatedEC2Lists.regionDict
toil.lib.generatedEC2Lists.ec2InstancesByRegion
toil.lib.humanize
Attributes
Functions
Module Contents
toil.lib.humanize.logger
toil.lib.humanize.bytes2human(n)
Convert n bytes into a human
readable string.
Parameters
n ( SupportsInt )
Return type
str
toil.lib.humanize.human2bytes(s)
Attempts to guess the string format based on default symbols set and return the corresponding bytes as an integer.
When unable to
recognize the format ValueError is raised.
Parameters
s ( str )
Return type
int
toil.lib.integration
Contains functions for integrating Toil with external services such as Dockstore.
Attributes
Functions
Module Contents
toil.lib.integration.logger
toil.lib.integration.session
toil.lib.integration.is_dockstore_workflow(workflow)
Returns True if a workflow string smells Dockstore-y.
Detects
Dockstore page URLs and strings that could be Dockstore TRS
IDs.
Parameters
workflow ( str )
Return type
bool
toil.lib.integration.find_trs_spec(workflow)
Parse a Dockstore workflow URL
or TSR ID to a string that is definitely a TRS ID.
Parameters
workflow ( str )
Return type
str
toil.lib.integration.parse_trs_spec(trs_spec)
Parse a TRS ID to workflow and
optional version.
Parameters
trs_spec ( str )
Return type
tuple [ str , Optional[ str ]]
toil.lib.integration.get_workflow_root_from_dockstore(workflow,
supported_languages=None)
Given a Dockstore URL or TRS identifier, get the root WDL or CWL URL for the workflow.
Accepts inputs like:
|
• |
- https://dockstore.org/workflows/github.com/dockstore-testing/md5sum-checker:master?tab=info |
||
|
• |
#workflow/github.com/dockstore-testing/md5sum-checker |
Assumes the input is actually one of the supported formats. See is_dockstore_workflow().
TODO: Needs to
handle multi-workflow files if Dockstore can.
Parameters
|
• |
workflow ( str ) |
|||
|
• |
supported_languages ( Optional[set[str]] ) |
Return type
str
toil.lib.integration.resolve_workflow(workflow,
supported_languages=None)
Find the real workflow URL or filename from a command line argument.
Transform a
workflow URL or path that might actually be a Dockstore page
URL or TRS specifier to an actual URL or path to a workflow
document.
Parameters
|
• |
workflow ( str ) |
|||
|
• |
supported_languages ( Optional[set[str]] ) |
Return type
str
toil.lib.io
Attributes
Classes
Functions
Module Contents
toil.lib.io.logger
toil.lib.io.TOIL_URI_SCHEME = 'toilfile:'
toil.lib.io.STANDARD_SCHEMES = ['http:', 'https:', 's3:',
'gs:',
'ftp:']
toil.lib.io.REMOTE_SCHEMES
toil.lib.io.ALL_SCHEMES
toil.lib.io.is_standard_url(filename)
Parameters
filename ( str )
Return type
bool
toil.lib.io.is_remote_url(filename)
Decide if a filename is a
known, non-file kind of URL
Parameters
filename ( str )
Return type
bool
toil.lib.io.is_any_url(filename)
Decide if a string is a URI like http:// or file:// .
Otherwise it
might be a bare path.
Parameters
filename ( str )
Return type
bool
toil.lib.io.is_url_with_scheme(filename, schemes)
Return True if filename is a
URL with any of the given schemes and False otherwise.
Parameters
|
• |
filename ( str ) |
|||
|
• |
schemes ( list[str] ) |
Return type
bool
toil.lib.io.is_toil_url(filename)
Parameters
filename ( str )
Return type
bool
toil.lib.io.is_file_url(filename)
Parameters
filename ( str )
Return type
bool
toil.lib.io.mkdtemp(suffix=None, prefix=None, dir=None)
Make a temporary directory like tempfile.mkdtemp, but with relaxed permissions.
The permissions on the directory will be 711 instead of 700, allowing the group and all other users to traverse the directory. This is necessary if the directory is on NFS and the Docker daemon would like to mount it or a file inside it into a container, because on NFS even the Docker daemon appears bound by the file permissions.
See
<
https://github.com/DataBiosphere/toil/issues/4644
>,
and <-
https://stackoverflow.com/a/67928880
>
which talks about a similar problem but in the context of
user namespaces.
Parameters
|
• |
suffix ( Optional[str] ) |
|||
|
• |
prefix ( Optional[str] ) |
|||
|
• |
dir ( Optional[str] ) |
Return type
str
toil.lib.io.robust_rmtree(path)
Robustly tries to delete paths.
Continues silently if the path to be removed is already gone, or if it goes away while this function is executing.
May raise an
error if a path changes between file and directory while the
function is executing, or if a permission error is
encountered.
Parameters
path ( Union[str, bytes] )
Return type
None
toil.lib.io.atomic_tmp_file(final_path)
Return a tmp file name to use
with atomic_install. This will be in the same directory as
final_path. The temporary file will have the same extension
as finalPath. It the final path is in /dev (/dev/null,
/dev/stdout), it is returned unchanged and
atomic_tmp_install will do nothing.
Parameters
final_path ( str )
Return type
str
toil.lib.io.atomic_install(tmp_path, final_path)
atomic install of tmp_path as
final_path
Return type
None
toil.lib.io.AtomicFileCreate(final_path, keep=False)
Context manager to create a
temporary file. Entering returns path to the temporary file
in the same directory as finalPath. If the code in context
succeeds, the file renamed to its actually name. If an error
occurs, the file is not installed and is removed unless keep
is specified.
Parameters
|
• |
final_path ( str ) |
|||
|
• |
keep ( bool ) |
Return type
collections.abc.Iterator [ str ]
toil.lib.io.atomic_copy(src_path, dest_path, executable=None)
Copy a file using posix atomic
creations semantics.
Parameters
|
• |
src_path ( str ) |
|||
|
• |
dest_path ( str ) |
|||
|
• |
executable ( Optional[bool] ) |
Return type
None
toil.lib.io.atomic_copyobj(src_fh,
dest_path, length=16384,
executable=False)
Copy an open file using posix
atomic creations semantics.
Parameters
|
• |
src_fh ( io.BytesIO ) |
|||
|
• |
dest_path ( str ) |
|||
|
• |
length ( int ) |
|||
|
• |
executable ( bool ) |
Return type
None
toil.lib.io.make_public_dir(in_directory, suggested_name=None)
Make a publicly-accessible
directory in the given directory.
Parameters
|
• |
suggested_name ( Optional[str] ) -- Use this directory name first if possible. |
||
|
• |
in_directory ( str ) |
Return type
str
Try to make a random directory name with length 4 that doesn't exist, with the given prefix. Otherwise, try length 5, length 6, etc, up to a max of 32 (len of uuid4 with dashes replaced). This function's purpose is mostly to avoid having long file names when generating directories. If somehow this fails, which should be incredibly unlikely, default to a normal uuid4, which was our old default.
toil.lib.io.try_path(path, min_size=100 * 1024 * 1024)
Try to use the given path.
Return it if it exists or can be made, and we can make
things within it, or None otherwise.
Parameters
|
• |
min_size ( int ) -- Reject paths on filesystems smaller than this many bytes. |
||
|
• |
path ( str ) |
Return type
Optional[ str ]
class toil.lib.io.WriteWatchingStream(backingStream)
A stream wrapping class that calls any functions passed to onWrite() with the number of bytes written for every write.
Not seekable.
Parameters
backingStream ( IO[Any] )
backingStream
writeListeners = []
onWrite(listener)
Call the given listener with
the number of bytes written on every write.
Parameters
listener ( Callable[[int], None] )
Return type
None
write(data)
Write the given data to the file.
writelines(datas)
Write each string from the given iterable, without newlines.
flush()
Flush the backing stream.
close()
Close the backing stream.
class toil.lib.io.ReadableFileObj
Bases: Protocol
Protocol that
is more specific than what file_digest takes as an argument.
Also guarantees a read() method. Would extend the protocol
from Typeshed for hashlib but those are only declared for
3.11+.
readinto(buf, /)
Parameters
buf ( bytearray )
Return type
int
readable()
Return type
bool
read(number)
Parameters
number ( int )
Return type
bytes
toil.lib.io.file_digest(f, alg_name)
Polyfilled hashlib.file_digest
that works on Python <3.11.
Parameters
|
• |
f ( ReadableFileObj ) |
|||
|
• |
alg_name ( str ) |
Return type
hashlib._Hash
toil.lib.iterables
Attributes
Classes
Functions
Module Contents
toil.lib.iterables.IT
toil.lib.iterables.flatten(iterables)
Flatten an iterable, except for
string elements.
Parameters
iterables ( collections.abc.Iterable[IT] )
Return type
collections.abc.Iterator [IT]
class toil.lib.iterables.concat(*args)
A literal iterable to combine sequence literals (lists, set) with generators or list comprehensions.
Instead of
>>> [
-1 ] + [ x * 2 for x in range( 3 ) ] + [ -1 ]
[-1, 0, 2, 4, -1]
you can write
>>>
list( concat( -1, ( x * 2 for x in range( 3 ) ), -1 ) )
[-1, 0, 2, 4, -1]
This is slightly shorter (not counting the list constructor) and does not involve array construction or concatenation.
Note that concat() flattens (or chains) all iterable arguments into a single result iterable:
>>>
list( concat( 1, range( 2, 4 ), 4 ) )
[1, 2, 3, 4]
It only does so one level deep. If you need to recursively flatten a data structure, check out crush().
If you want to prevent that flattening for an iterable argument, wrap it in concat():
>>>
list( concat( 1, concat( range( 2, 4 ) ), 4 ) )
[1, range(2, 4), 4]
Some more example.
>>>
list( concat() ) # empty concat
[]
>>> list( concat( 1 ) ) # non-iterable
[1]
>>> list( concat( concat() ) ) # empty iterable
[]
>>> list( concat( concat( 1 ) ) ) # singleton
iterable
[1]
>>> list( concat( 1, concat( 2 ), 3 ) ) # flattened
iterable
[1, 2, 3]
>>> list( concat( 1, [2], 3 ) ) # flattened
iterable
[1, 2, 3]
>>> list( concat( 1, concat( [2] ), 3 ) ) #
protecting an iterable from being flattened
[1, [2], 3]
>>> list( concat( 1, concat( [2], 3 ), 4 ) ) #
protection only works with a single argument
[1, 2, 3, 4]
>>> list( concat( 1, 2, concat( 3, 4 ), 5, 6 ) )
[1, 2, 3, 4, 5, 6]
>>> list( concat( 1, 2, concat( [ 3, 4 ] ), 5, 6 )
)
[1, 2, [3, 4], 5, 6]
Note that while strings are technically iterable, concat() does not flatten them.
>>>
list( concat( 'ab' ) )
['ab']
>>> list( concat( concat( 'ab' ) ) )
['ab']
Parameters
args ( Any )
|
args |
__iter__()
Return type
collections.abc.Iterator [Any]
toil.lib.memoize
Attributes
Functions
Module Contents
toil.lib.memoize.memoize
Memoize a function result based on its parameters using this decorator.
For example, this can be used in place of lazy initialization. If the decorating function is invoked by multiple threads, the decorated function may be called more than once with the same arguments.
toil.lib.memoize.MAT
toil.lib.memoize.MRT
toil.lib.memoize.sync_memoize(f)
Like memoize, but guarantees
that decorated function is only called once, even when
multiple threads are calling the decorating function with
multiple parameters.
Parameters
f ( Callable[[MAT], MRT] )
Return type
Callable[[MAT], MRT]
toil.lib.memoize.parse_iso_utc(s)
Parses an ISO time with a
hard-coded Z for zulu-time (UTC) at the end. Other timezones
are not supported. Returns a timezone-naive datetime object.
Parameters
s ( str ) -- The ISO-formatted time
Returns
A timezone-naive datetime object
Return type
datetime.datetime
>>>
parse_iso_utc('2016-04-27T00:28:04.000Z')
datetime.datetime(2016, 4, 27, 0, 28, 4)
>>> parse_iso_utc('2016-04-27T00:28:04Z')
datetime.datetime(2016, 4, 27, 0, 28, 4)
>>> parse_iso_utc('2016-04-27T00:28:04X')
Traceback (most recent call last):
...
ValueError: Not a valid ISO datetime in UTC:
2016-04-27T00:28:04X
toil.lib.memoize.strict_bool(s)
Variant of bool() that only
accepts two possible string values.
Parameters
s ( str )
Return type
bool
toil.lib.misc
Attributes
Exceptions
Functions
Module Contents
toil.lib.misc.logger
toil.lib.misc.get_public_ip()
Get the IP that this machine uses to contact the internet.
If behind a
NAT, this will still be this computer's IP, and not the
router's.
Return type
str
toil.lib.misc.get_user_name()
Get the current user name, or a
suitable substitute string if the user name is not
available.
Return type
str
toil.lib.misc.utc_now()
Return a datetime in the UTC
timezone corresponding to right now.
Return type
datetime.datetime
toil.lib.misc.unix_now_ms()
Return the current time in
milliseconds since the Unix epoch.
Return type
float
toil.lib.misc.slow_down(seconds)
Toil jobs that have completed are not allowed to have taken 0 seconds, but Kubernetes timestamps round things to the nearest second. It is possible in some batch systems for a pod to have identical start and end timestamps.
This function
takes a possibly 0 job length in seconds and enforces a
minimum length to satisfy Toil.
Parameters
seconds ( float ) -- Timestamp difference
Returns
seconds, or a small positive number if seconds is 0
Return type
float
toil.lib.misc.printq(msg, quiet, log=False)
This is for functions used simultaneously in Toil proper and in the admin scripts.
Our admin
scripts "print" to stdout, while Toil proper uses
logging. For a script that, for example, cleans up IAM, EC2,
etc. cruft leftover after failed CI runs, we can call an AWS
delete IAM role function, and this prints or logs progress
(unless quiet is True), depending on whether the function is
called in, say, the jobstore or a script.
Parameters
|
• |
msg ( str ) -- The string to print or log to stdout. |
||
|
• |
quiet ( bool ) -- Silent output to stdout. |
||
|
• |
log ( bool ) -- Use logging (else "print" to the screen). |
Return type
None
toil.lib.misc.truncExpBackoff()
Return type
collections.abc.Iterator [ float ]
exception
toil.lib.misc.CalledProcessErrorStderr(returncode, cmd,
output=None, stderr=None)
Bases: subprocess.CalledProcessError
Version of
CalledProcessError that include stderr in the error message
if it is set
__str__()
Return str(self).
Return type
str
toil.lib.misc.call_command(cmd,
*args, input=None, timeout=None,
useCLocale=True, env=None, quiet=False)
Simplified calling of external commands.
If the process fails, CalledProcessErrorStderr is raised.
The captured stderr is always printed, regardless of if an exception occurs, so it can be logged.
Always logs the
command at debug log level.
Parameters
|
• |
quiet ( Optional[bool] ) -- If True, do not log the command output. If False (the default), do log the command output at debug log level. |
||
|
• |
useCLocale ( bool ) -- If True, C locale is forced, to prevent failures that can occur in some batch systems when using UTF-8 locale. |
||
|
• |
cmd ( list[str] ) |
||
|
• |
args ( str ) |
||
|
• |
input ( Optional[str] ) |
||
|
• |
timeout ( Optional[float] ) |
||
|
• |
env ( Optional[dict[str, str]] ) |
Returns
Command standard output, decoded as utf-8.
Return type
str
toil.lib.objects
Classes
Module Contents
class toil.lib.objects.InnerClass(inner_class)
Note that this is EXPERIMENTAL code.
A nested class (the inner class) decorated with this will have an additional attribute called 'outer' referencing the instance of the nesting class (the outer class) that was used to create the inner class. The outer instance does not need to be passed to the inner class's constructor, it will be set magically. Shamelessly stolen from
- http://stackoverflow.com/questions/2278426/inner-classes-how-can-i-get-the-outer-class-object-at-construction-time#answer-2278595 .
with names made more descriptive (I hope) and added caching of the BoundInner classes.
Caveat: Within the inner class, self.__class__ will not be the inner class but a dynamically created subclass thereof. It's name will be the same as that of the inner class, but its __module__ will be different. There will be one such dynamic subclass per inner class and instance of outer class, if that outer class instance created any instances of inner the class.
>>>
class Outer(object):
... def new_inner(self):
... # self is an instance of the outer class
... inner = self.Inner()
... # the inner instance's 'outer' attribute is set to the
outer instance
... assert inner.outer is self
... return inner
... @InnerClass
... class Inner(object):
... def get_outer(self):
... return self.outer
... @classmethod
... def new_inner(cls):
... return cls()
>>> o = Outer()
>>> i = o.new_inner()
>>> i
<toil.lib.objects.Inner...> bound to
<toil.lib.objects.Outer object at ...>
>>>
i.get_outer()
<toil.lib.objects.Outer object at ...>
Now with inheritance for both inner and outer:
>>>
class DerivedOuter(Outer):
... def new_inner(self):
... return self.DerivedInner()
... @InnerClass
... class DerivedInner(Outer.Inner):
... def get_outer(self):
... assert super( DerivedOuter.DerivedInner, self
).get_outer() == self.outer
... return self.outer
>>> derived_outer = DerivedOuter()
>>> derived_inner = derived_outer.new_inner()
>>> derived_inner
<toil.lib.objects...> bound to
<toil.lib.objects.DerivedOuter object at ...>
>>>
derived_inner.get_outer()
<toil.lib.objects.DerivedOuter object at ...>
Test a static references: >>> Outer.Inner # doctest: +ELLIPSIS <class 'toil.lib.objects...Inner'> >>> DerivedOuter.Inner # doctest: +ELLIPSIS <class 'toil.lib.objects...Inner'> >>> DerivedOuter.DerivedInner #doctest: +ELLIPSIS <class 'toil.lib.objects...DerivedInner'>
Can't decorate top-level classes. Unfortunately, this is detected when the instance is created, not when the class is defined. >>> @InnerClass ... class Foo(object): ... pass >>> Foo() Traceback (most recent call last): ... RuntimeError: Inner classes must be nested in another class.
All inner instances should refer to a single outer instance: >>> o = Outer() >>> o.new_inner().outer == o == o.new_inner().outer True
All inner instances should be of the same class ... >>> o.new_inner().__class__ == o.new_inner().__class__ True
... but that class isn't the inner class ... >>> o.new_inner().__class__ != Outer.Inner True
... but a subclass of the inner class. >>> isinstance( o.new_inner(), Outer.Inner ) True
Static and class methods, e.g. should work, too
>>>
o.Inner.new_inner().outer == o
True
inner_class
__get__(instance, owner)
__call__(**kwargs)
toil.lib.resources
Classes
Functions
Module Contents
class toil.lib.resources.ResourceMonitor
Global resource monitoring widget.
Presents class
methods to get the resource usage of this process and child
processes, and other class methods to adjust the statistics
so they can account for e.g. resources used inside
containers, or other resource usage that
should
be
billable to the current process.
classmethod record_extra_memory(peak_ki)
Become responsible for the given peak memory usage, in kibibytes.
The memory will
be treated as if it was used by a child process at the time
our real child processes were also using their peak memory.
Parameters
peak_ki ( int )
Return type
None
classmethod record_extra_cpu(seconds)
Become responsible for the given CPU time.
The CPU time
will be treated as if it had been used by a child process.
Parameters
seconds ( float )
Return type
None
classmethod get_total_cpu_time_and_memory_usage()
Gives the total cpu time of
itself and all its children, and the maximum RSS memory
usage of itself and its single largest child (in kibibytes).
Return type
tuple [ float , int ]
classmethod get_total_cpu_time()
Gives the total cpu time,
including the children.
Return type
float
toil.lib.resources.glob(glob_pattern, directoryname)
Walks through a directory and
its subdirectories looking for files matching the
glob_pattern and returns a list=[].
Parameters
|
• |
directoryname ( str ) -- Any accessible folder name on the filesystem. |
||
|
• |
glob_pattern ( str ) -- A string like *.txt , which would find all text files. |
Returns
A list=[] of absolute filepaths matching the glob pattern.
Return type
list [ str ]
toil.lib.retry
This file holds the retry() decorator function and RetryCondition object.
retry() can be used to decorate any function based on the list of errors one wishes to retry on.
This list of errors can contain normal Exception objects, and/or RetryCondition objects wrapping Exceptions to include additional conditions.
For example, retrying on a one Exception (HTTPError):
from requests
import get
from requests.exceptions import HTTPError
@retry(errors=[HTTPError])
def update_my_wallpaper():
return get('https://www.deviantart.com/')
Or:
from requests
import get
from requests.exceptions import HTTPError
@retry(errors=[HTTPError,
ValueError])
def update_my_wallpaper():
return get('https://www.deviantart.com/')
The examples above will retry for the default interval on any errors specified the "errors=" arg list.
To retry on specifically 500/502/503/504 errors, you could specify an ErrorCondition object instead, for example:
from requests
import get
from requests.exceptions import HTTPError
@retry(errors=[
ErrorCondition(
error=HTTPError,
error_codes=[500, 502, 503, 504]
)])
def update_my_wallpaper():
return requests.get('https://www.deviantart.com/')
To retry on specifically errors containing the phrase "NotFound":
from requests
import get
from requests.exceptions import HTTPError
@retry(errors=[
ErrorCondition(
error=HTTPError,
error_message_must_include="NotFound"
)])
def update_my_wallpaper():
return requests.get('https://www.deviantart.com/')
To retry on all HTTPError errors EXCEPT an HTTPError containing the phrase "NotFound":
from requests
import get
from requests.exceptions import HTTPError
@retry(errors=[
HTTPError,
ErrorCondition(
error=HTTPError,
error_message_must_include="NotFound",
retry_on_this_condition=False
)])
def update_my_wallpaper():
return requests.get('https://www.deviantart.com/')
To retry on boto3's specific status errors, an example of the implementation is:
import boto3
from botocore.exceptions import ClientError
@retry(errors=[
ErrorCondition(
error=ClientError,
boto_error_codes=["BucketNotFound"]
)])
def boto_bucket(bucket_name):
boto_session = boto3.session.Session()
s3_resource = boto_session.resource('s3')
return s3_resource.Bucket(bucket_name)
Any combination of these will also work, provided the codes are matched to the correct exceptions. A ValueError will not return a 404, for example.
The retry function as a decorator should make retrying functions easier and clearer It also encourages smaller independent functions, as opposed to lumping many different things that may need to be retried on different conditions in the same function.
The ErrorCondition object tries to take some of the heavy lifting of writing specific retry conditions and boil it down to an API that covers all common use-cases without the user having to write any new bespoke functions.
Use-cases covered currently:
|
1. |
Retrying on a normal error, like a KeyError. |
||
|
2. |
Retrying on HTTP error codes (use ErrorCondition). |
||
|
3. |
Retrying on boto 3's specific status errors, like "BucketNotFound" (use ErrorCondition). |
||
|
4. |
Retrying when an error message contains a certain phrase (use ErrorCondition). |
||
|
5. |
Explicitly NOT retrying on a condition (use ErrorCondition). |
If new functionality is needed, it's currently best practice in Toil to add functionality to the ErrorCondition itself rather than making a new custom retry method.
Attributes
Classes
Functions
Module Contents
toil.lib.retry.SUPPORTED_HTTP_ERRORS
toil.lib.retry.kubernetes = None
toil.lib.retry.botocore = None
toil.lib.retry.logger
class toil.lib.retry.ErrorCondition(error=None,
error_codes=None,
boto_error_codes=None, error_message_must_include=None,
retry_on_this_condition=True)
A wrapper describing an error condition.
ErrorCondition
events may be used to define errors in more detail to
determine whether to retry.
Parameters
|
• |
error ( Optional[Any] ) |
|||
|
• |
error_codes ( list[int] ) |
|||
|
• |
boto_error_codes ( list[str] ) |
|||
|
• |
error_message_must_include ( str ) |
|||
|
• |
retry_on_this_condition ( bool ) |
|||
|
error |
error_codes
boto_error_codes
error_message_must_include
retry_on_this_condition
toil.lib.retry.RT
toil.lib.retry.retry(intervals=None, infinite_retries=False,
errors=None, log_message=None, prepare=None)
Retry a function if it fails with any Exception defined in "errors".
Does so every x
seconds, where x is defined by a list of numbers (ints or
floats) in "intervals". Also accepts
ErrorCondition events for more detailed retry attempts.
Parameters
|
• |
intervals ( Optional[list] ) -- A list of times in seconds we keep retrying until returning failure. Defaults to retrying with the following exponential back-off before failing: 1s, 1s, 2s, 4s, 8s, 16s |
||
|
• |
infinite_retries ( bool ) -- If this is True, reset the intervals when they run out. Defaults to: False. |
||
|
• |
errors ( Optional[collections.abc.Sequence[Union[- ErrorCondition, type[Exception]]]] ) -- |
A list of exceptions OR ErrorCondition objects to catch and retry on. ErrorCondition objects describe more detailed error event conditions than a plain error. An ErrorCondition specifies: - Exception (required) - Error codes that must match to be retried (optional; defaults to not checking) - A string that must be in the error message to be retried (optional; defaults to not checking) - A bool that can be set to False to always error on this condition.
If not specified, this will default to a generic Exception.
|
• |
log_message ( Optional[tuple[Callable, str]] ) -- Optional tuple of ("log/print function()", "message string") that will precede each attempt. |
||
|
• |
prepare ( Optional[list[Callable]] ) -- Optional list of functions to call, with the function's arguments, between retries, to reset state. |
Returns
The result of the wrapped function or raise.
Return type
Callable[[Callable[Ellipsis, RT]], Callable[Ellipsis, RT]]
toil.lib.retry.return_status_code(e)
toil.lib.retry.get_error_code(e)
Get the error code name from a Boto 2 or 3 error, or compatible types.
Returns empty
string for other errors.
Parameters
e ( Exception )
Return type
str
toil.lib.retry.get_error_message(e)
Get the error message string from a Boto 2 or 3 error, or compatible types.
Note that error
message conditions also check more than this; this function
does not fall back to the traceback for incompatible types.
Parameters
e ( Exception )
Return type
str
toil.lib.retry.get_error_status(e)
Get the HTTP status code from a compatible source.
Such as a Boto 2 or 3 error, kubernetes.client.rest.ApiException, http.client.HTTPException, urllib3.exceptions.HTTPError, requests.exceptions.HTTPError, urllib.error.HTTPError, or compatible type
Returns 0 from
other errors.
Parameters
e ( Exception )
Return type
int
toil.lib.retry.get_error_body(e)
Get the body from a Boto 2 or 3 error, or compatible types.
Returns the
code and message if the error does not have a body.
Parameters
e ( Exception )
Return type
str
toil.lib.retry.meets_error_message_condition(e, error_message)
Parameters
|
• |
e ( Exception ) |
|||
|
• |
error_message ( Optional[str] ) |
toil.lib.retry.meets_error_code_condition(e, error_codes)
These are expected to be normal
HTTP error codes, like 404 or 500.
Parameters
|
• |
e ( Exception ) |
|||
|
• |
error_codes ( Optional[list[int]] ) |
toil.lib.retry.meets_boto_error_code_condition(e, boto_error_codes)
These are expected to be AWS's
custom error aliases, like 'BucketNotFound' or
'AccessDenied'.
Parameters
|
• |
e ( Exception ) |
|||
|
• |
boto_error_codes ( Optional[list[str]] ) |
toil.lib.retry.error_meets_conditions(e,
error_conditions)
toil.lib.retry.DEFAULT_DELAYS = (0, 1, 1, 4, 16, 64)
toil.lib.retry.DEFAULT_TIMEOUT = 300
toil.lib.retry.E
toil.lib.retry.old_retry(delays=DEFAULT_DELAYS,
timeout=DEFAULT_TIMEOUT, predicate=lambda e: ...)
Deprecated.
Retry an
operation while the failure matches a given predicate and
until a given timeout expires, waiting a given amount of
time in between attempts. This function is a generator that
yields contextmanagers. See doctests below for example
usage.
Parameters
|
• |
delays ( Iterable[float] ) -- an interable yielding the time in seconds to wait before each retried attempt, the last element of the iterable will be repeated. |
||
|
• |
timeout ( float ) -- a overall timeout that should not be exceeded for all attempts together. This is a best-effort mechanism only and it won't abort an ongoing attempt, even if the timeout expires during that attempt. |
||
|
• |
predicate ( Callable[[Exception],bool] ) -- a unary callable returning True if another attempt should be made to recover from the given exception. The default value for this parameter will prevent any retries! |
Returns
a generator yielding context managers, one per attempt
Return type
Iterator
Retry for a limited amount of time:
>>>
true = lambda _:True
>>> false = lambda _:False
>>> i = 0
>>> for attempt in old_retry( delays=[0],
timeout=.1, predicate=true ):
... with attempt:
... i += 1
... raise RuntimeError('foo')
Traceback (most recent call last):
...
RuntimeError: foo
>>> i > 1
True
If timeout is 0, do exactly one attempt:
>>> i
= 0
>>> for attempt in old_retry( timeout=0 ):
... with attempt:
... i += 1
... raise RuntimeError( 'foo' )
Traceback (most recent call last):
...
RuntimeError: foo
>>> i
1
Don't retry on success:
>>> i
= 0
>>> for attempt in old_retry( delays=[0],
timeout=.1, predicate=true ):
... with attempt:
... i += 1
>>> i
1
Don't retry on unless predicate returns True:
>>> i
= 0
>>> for attempt in old_retry( delays=[0],
timeout=.1, predicate=false):
... with attempt:
... i += 1
... raise RuntimeError( 'foo' )
Traceback (most recent call last):
...
RuntimeError: foo
>>> i
1
toil.lib.retry.retry_flaky_test
toil.lib.threading
Attributes
Classes
Functions
Module Contents
toil.lib.threading.logger
toil.lib.threading.ensure_filesystem_lockable(path,
timeout=30,
hint=None)
Make sure that the filesystem used at the given path is one where locks are safe to use.
File locks are not safe to use on Ceph. See <- https://github.com/DataBiosphere/toil/issues/4972 >.
Raises an exception if the filesystem is detected as one where using locks is known to trigger bugs in the filesystem implementation. Also raises an exception if the given path does not exist, or if attempting to determine the filesystem type takes more than the timeout in seconds.
If the
filesystem type cannot be determined, does nothing.
Parameters
|
• |
hint ( Optional[str] ) -- Extra text to include in an error, if raised, telling the user how to change the offending path. |
||
|
• |
path ( str ) |
||
|
• |
timeout ( float ) |
Return type
None
toil.lib.threading.safe_lock(fd, block=True, shared=False)
Get an fcntl lock, while retrying on IO errors.
Raises OSError
with EACCES or EAGAIN when a nonblocking lock is not
immediately available.
Parameters
|
• |
fd ( int ) |
|||
|
• |
block ( bool ) |
|||
|
• |
shared ( bool ) |
Return type
None
toil.lib.threading.safe_unlock_and_close(fd)
Release an fcntl lock and close
the file descriptor, while handling fcntl IO errors.
Parameters
fd ( int )
Return type
None
class
toil.lib.threading.ExceptionalThread(group=None,
target=None,
name=None, args=(), kwargs=None, *, daemon=None)
Bases: threading.Thread
A thread whose join() method re-raises exceptions raised during run(). While join() is idempotent, the exception is only during the first invocation of join() that successfully joined the thread. If join() times out, no exception will be re reraised even though an exception might already have occurred in run().
When subclassing this thread, override tryRun() instead of run().
>>>
def f():
... assert 0
>>> t = ExceptionalThread(target=f)
>>> t.start()
>>> t.join()
Traceback (most recent call last):
...
AssertionError
>>>
class MyThread(ExceptionalThread):
... def tryRun( self ):
... assert 0
>>> t = MyThread()
>>> t.start()
>>> t.join()
Traceback (most recent call last):
...
AssertionError
exc_info = None
|
run() |
Method representing the thread's activity. |
You may
override this method in a subclass. The standard run()
method invokes the callable object passed to the object's
constructor as the target argument, if any, with sequential
and keyword arguments taken from the args and kwargs
arguments, respectively.
Return type
None
tryRun()
Return type
None
join(*args, **kwargs)
Wait until the thread terminates.
This blocks the calling thread until the thread whose join() method is called terminates -- either normally or through an unhandled exception or until the optional timeout occurs.
When the timeout argument is present and not None, it should be a floating-point number specifying a timeout for the operation in seconds (or fractions thereof). As join() always returns None, you must call is_alive() after join() to decide whether a timeout happened -- if the thread is still alive, the join() call timed out.
When the timeout argument is not present or None, the operation will block until the thread terminates.
A thread can be join()ed many times.
join() raises a
RuntimeError if an attempt is made to join the current
thread as that would cause a deadlock. It is also an error
to join() a thread before it has been started and attempts
to do so raises the same exception.
Parameters
|
• |
args ( Optional[float] ) |
|||
|
• |
kwargs ( Optional[float] ) |
Return type
None
toil.lib.threading.cpu_count()
Get the rounded-up integer number of whole CPUs available.
Counts hyperthreads as CPUs.
Uses the system's actual CPU count, or the current v1 cgroup's quota per period, if the quota is set.
Ignores the cgroup's cpu shares value, because it's extremely difficult to interpret. See - https://github.com/kubernetes/kubernetes/issues/81021 .
Caches result
for efficiency.
Returns
Integer count of available CPUs, minimum 1.
Return type
int
toil.lib.threading.current_process_name_lock
toil.lib.threading.current_process_name_for:
dict
[
str
,
str
]
toil.lib.threading.collect_process_name_garbage()
Delete all the process names that point to files that don't exist anymore (because the work directory was temporary and got cleaned up). This is known to happen during the tests, which get their own temp directories.
Caller must
hold current_process_name_lock.
Return type
None
toil.lib.threading.destroy_all_process_names()
Delete all our process name files because our process is going away.
We let all our FDs get closed by the process death.
We assume there
is nobody else using the system during exit to race with.
Return type
None
toil.lib.threading.get_process_name(base_dir)
Return the name of the current
process. Like a PID but visible between containers on what
to Toil appears to be a node.
Parameters
base_dir ( str ) -- Base directory to work in. Defines the shared namespace.
Returns
Process's assigned name
Return type
str
toil.lib.threading.process_name_exists(base_dir, name)
Return true if the process named by the given name (from process_name) exists, and false otherwise.
Can see across
container boundaries using the given node workflow
directory.
Parameters
|
• |
base_dir ( str ) -- Base directory to work in. Defines the shared namespace. |
||
|
• |
name ( str ) -- Process's name to poll |
Returns
True if the named process is still alive, and False otherwise.
Return type
bool
toil.lib.threading.global_mutex(base_dir, mutex)
Context manager that locks a mutex. The mutex is identified by the given name, and scoped to the given directory. Works across all containers that have access to the given diectory. Mutexes held by dead processes are automatically released.
Only works
between processes, NOT between threads.
Parameters
|
• |
base_dir ( str ) -- Base directory to work in. Defines the shared namespace. |
||
|
• |
mutex ( str ) -- Mutex to lock. Must be a permissible path component. |
Return type
collections.abc.Iterator [None]
class toil.lib.threading.LastProcessStandingArena(base_dir, name)
Class that lets a bunch of processes detect and elect a last process standing.
Process enter and leave (sometimes due to sudden existence failure). We guarantee that the last process to leave, if it leaves properly, will get a chance to do some cleanup. If new processes try to enter during the cleanup, they will be delayed until after the cleanup has happened and the previous "last" process has finished leaving.
The user is
responsible for making sure you always leave if you enter!
Consider using a try/finally; this class is not a context
manager.
Parameters
|
• |
base_dir ( str ) |
|||
|
• |
name ( str ) |
base_dir
|
mutex |
lockfileDir
lockfileFD = None
lockfileName = None
enter()
This process is entering the arena. If cleanup is in progress, blocks until it is finished.
You may not
enter the arena again before leaving it.
Return type
None
leave()
This process is leaving the arena. If this process happens to be the last process standing, yields something, with other processes blocked from joining the arena until the loop body completes and the process has finished leaving. Otherwise, does not yield anything.
Should be used in a loop:
for _ in arena.leave():
# If we get here, we were the last process. Do the cleanup pass
Return type
collections.abc.Iterator [ bool ]
toil.lib.throttle
Classes
Module Contents
class toil.lib.throttle.LocalThrottle(min_interval)
A thread-safe rate limiter that throttles each thread independently. Can be used as a function or method decorator or as a simple object, via its .throttle() method.
The use as a
decorator is deprecated in favor of throttle().
Parameters
min_interval ( int )
min_interval
per_thread
throttle(wait=True)
If the wait parameter is True, this method returns True after suspending the current thread as necessary to ensure that no less than the configured minimum interval has passed since the last invocation of this method in the current thread returned True.
If the wait
parameter is False, this method immediatley returns True (if
at least the configured minimum interval has passed since
the last time this method returned True in the current
thread) or False otherwise.
Parameters
wait ( bool )
Return type
bool
__call__(function)
class toil.lib.throttle.throttle(min_interval)
A context manager for ensuring that the execution of its body takes at least a given amount of time, sleeping if necessary. It is a simpler version of LocalThrottle if used as a decorator.
Ensures that body takes at least the given amount of time.
>>>
start = time.time()
>>> with throttle(1):
... pass
>>> 1 <= time.time() - start <= 1.1
True
Ditto when used as a decorator.
>>>
@throttle(1)
... def f():
... pass
>>> start = time.time()
>>> f()
>>> 1 <= time.time() - start <= 1.1
True
If the body takes longer by itself, don't throttle.
>>>
start = time.time()
>>> with throttle(1):
... time.sleep(2)
>>> 2 <= time.time() - start <= 2.1
True
Ditto when used as a decorator.
>>>
@throttle(1)
... def f():
... time.sleep(2)
>>> start = time.time()
>>> f()
>>> 2 <= time.time() - start <= 2.1
True
If an exception occurs, don't throttle.
>>>
start = time.time()
>>> try:
... with throttle(1):
... raise ValueError('foo')
... except ValueError:
... end = time.time()
... raise
Traceback (most recent call last):
...
ValueError: foo
>>> 0 <= end - start <= 0.1
True
Ditto when used as a decorator.
>>>
@throttle(1)
... def f():
... raise ValueError('foo')
>>> start = time.time()
>>> try:
... f()
... except ValueError:
... end = time.time()
... raise
Traceback (most recent call last):
...
ValueError: foo
>>> 0 <= end - start <= 0.1
True
Parameters
min_interval ( Union[int, float] )
min_interval
__enter__()
__exit__(exc_type, exc_val, exc_tb)
__call__(function)
toil.options
Submodules
toil.options.common
Attributes
Functions
Module Contents
toil.options.common.logger
toil.options.common.defaultTargetTime = 1800
toil.options.common.SYS_MAX_SIZE = 9223372036854775807
toil.options.common.parse_set_env(l)
Parse a list of strings of the form "NAME=VALUE" or just "NAME" into a dictionary.
Strings of the latter from will result in dictionary entries whose value is None.
>>>
parse_set_env([])
{}
>>> parse_set_env(['a'])
{'a': None}
>>> parse_set_env(['a='])
{'a': ''}
>>> parse_set_env(['a=b'])
{'a': 'b'}
>>> parse_set_env(['a=a', 'a=b'])
{'a': 'b'}
>>> parse_set_env(['a=b', 'c=d'])
{'a': 'b', 'c': 'd'}
>>> parse_set_env(['a=b=c'])
{'a': 'b=c'}
>>> parse_set_env([''])
Traceback (most recent call last):
...
ValueError: Empty name
>>> parse_set_env(['=1'])
Traceback (most recent call last):
...
ValueError: Empty name
Parameters
l ( list[str] )
Return type
dict [ str , Optional[ str ]]
toil.options.common.parse_str_list(s)
Parameters
s ( str )
Return type
list [ str ]
toil.options.common.parse_int_list(s)
Parameters
s ( str )
Return type
list [ int ]
toil.options.common.iC(min_value, max_value=None)
Returns a function that checks
if a given int is in the given half-open interval.
Parameters
|
• |
min_value ( int ) |
|||
|
• |
max_value ( Optional[int] ) |
Return type
Callable[[ int ], bool ]
toil.options.common.fC(minValue, maxValue=None)
Returns a function that checks
if a given float is in the given half-open interval.
Parameters
|
• |
minValue ( float ) |
|||
|
• |
maxValue ( Optional[float] ) |
Return type
Callable[[ float ], bool ]
toil.options.common.parse_accelerator_list(specs)
Parse a string description of
one or more accelerator requirements.
Parameters
specs ( Optional[str] )
Return type
list [ toil.job.AcceleratorRequirement ]
toil.options.common.parseBool(val)
Parameters
val ( str )
Return type
bool
toil.options.common.make_open_interval_action(min, max=None)
Returns an argparse action
class to check if the input is within the given half-open
interval. ex: Provided value to argparse must be within the
interval [min, max) Types of min and max must be the same
(max may be None)
Parameters
|
• |
min ( Union[int, float] ) -- float/int |
||
|
• |
max ( Optional[Union[int, float]] ) -- optional float/int |
Returns
argparse action class
Return type
type [ argparse.Action ]
toil.options.common.parse_jobstore(jobstore_uri)
Turn the jobstore string into it's corresponding URI ex: /path/to/jobstore -> file:/path/to/jobstore
If the jobstore
string already is a URI, return the jobstore:
aws:/path/to/jobstore -> aws:/path/to/jobstore :param
jobstore_uri: string of the jobstore :return: URI of the
jobstore
Parameters
jobstore_uri ( str )
Return type
str
toil.options.common.JOBSTORE_HELP = Multiline-String
"""The location of the job store for the workflow. A job store holds persistent information about the jobs, stats, and files in a workflow. If the workflow is run with a distributed batch system, the job store must be accessible by all worker nodes. Depending on the desired job store implementation, the location should be formatted according to one of the following schemes:
file:<path> where <path> points to a directory on the file system
aws:<region>:<prefix> where <region> is the name of an AWS region like us-west-2 and <prefix> will be prepended to the names of any top-level AWS resources in use by job store, e.g. S3 buckets.
google:<project_id>:<prefix> TODO: explain
For backwards compatibility, you may also specify ./foo (equivalent to file:./foo or just file:foo) or /bar (equivalent to file:/bar)."""
toil.options.common.add_base_toil_options(parser,
jobstore_as_flag=False, cwl=False)
Add base Toil command line
options to the parser. :param parser: Argument parser to add
options to :param jobstore_as_flag: make the job store
option a --jobStore flag instead of a required jobStore
positional argument. :param cwl: whether CWL should be
included or not
Parameters
|
• |
parser ( argparse.ArgumentParser ) |
|||
|
• |
jobstore_as_flag ( bool ) |
|||
|
• |
cwl ( bool ) |
Return type
None
toil.options.cwl
Functions
Module Contents
toil.options.cwl.add_cwl_options(parser, suppress=True)
Add CWL options to the parser.
This only adds nonpositional CWL arguments.
Parameters
|
• |
parser ( argparse.ArgumentParser ) -- Parser to add options to |
||
|
• |
suppress ( bool ) -- Suppress help output |
Returns
None
Return type
None
toil.options.runner
Functions
Module Contents
toil.options.runner.add_runner_options(parser, cwl=False, wdl=False)
Add to the WDL or CWL runners
options that are shared or the same between runners :param
parser: parser to add arguments to :param cwl: bool :param
wdl: bool :return: None
Parameters
|
• |
parser ( argparse.ArgumentParser ) |
|||
|
• |
cwl ( bool ) |
|||
|
• |
wdl ( bool ) |
Return type
None
toil.options.wdl
Functions
Module Contents
toil.options.wdl.add_wdl_options(parser, suppress=True)
Add WDL options to a parser.
This only adds nonpositional WDL arguments :param parser:
Parser to add options to :param suppress: Suppress help
output :return: None
Parameters
|
• |
parser ( argparse.ArgumentParser ) |
|||
|
• |
suppress ( bool ) |
Return type
None
toil.provisioners
Submodules
toil.provisioners.abstractProvisioner
Attributes
Exceptions
Classes
Module Contents
toil.provisioners.abstractProvisioner.a_short_time
= 5
toil.provisioners.abstractProvisioner.logger
exception
toil.provisioners.abstractProvisioner.ManagedNodesNotSupportedException
Bases: RuntimeError
Raised when attempting to add managed nodes (which autoscale up and down by themselves, without the provisioner doing the work) to a provisioner that does not support them.
Polling with this and try/except is the Right Way to check if managed nodes are available from a provisioner.
class
toil.provisioners.abstractProvisioner.Shape(wallTime,
memory,
cores, disk, preemptible)
Represents a job or a node's "shape", in terms of the dimensions of memory, cores, disk and wall-time allocation.
The wallTime attribute stores the number of seconds of a node allocation, e.g. 3600 for AWS. FIXME: and for jobs?
The memory and
disk attributes store the number of bytes required by a job
(or provided by a node) in RAM or on disk (SSD or HDD),
respectively.
Parameters
|
• |
wallTime ( Union[int, float] ) |
|||
|
• |
memory ( int ) |
|||
|
• |
cores ( Union[int, float] ) |
|||
|
• |
disk ( int ) |
|||
|
• |
preemptible ( bool ) |
wallTime
|
memory |
||
|
cores |
||
|
disk |
preemptible
__eq__(other)
Parameters
other ( Any )
Return type
bool
greater_than(other)
Parameters
other ( Any )
Return type
bool
__gt__(other)
Parameters
other ( Any )
Return type
bool
__repr__()
Return type
str
__str__()
Return type
str
__hash__()
Return type
int
class
toil.provisioners.abstractProvisioner.AbstractProvisioner(clusterName=None,
clusterType='mesos', zone=None, nodeStorage=50,
nodeStorageOverrides=None, enable_fuse=False)
Bases: abc.ABC
Interface for
provisioning worker nodes to use in a Toil cluster.
Parameters
|
• |
clusterName ( Optional[str] ) |
|||
|
• |
clusterType ( Optional[str] ) |
|||
|
• |
zone ( Optional[str] ) |
|||
|
• |
nodeStorage ( int ) |
|||
|
• |
nodeStorageOverrides ( Optional[list[str]] ) |
|||
|
• |
enable_fuse ( bool ) |
LEADER_HOME_DIR = '/root/'
cloud:
str
= None
clusterName
clusterType
enable_fuse
abstract supportedClusterTypes()
Get all the cluster types that
this provisioner implementation supports.
Return type
set [ str ]
abstract createClusterSettings()
Initialize class for a new cluster, to be deployed, when running outside the cloud.
abstract readClusterSettings()
Initialize class from an existing cluster. This method assumes that the instance we are running on is the leader.
Implementations must call _setLeaderWorkerAuthentication().
setAutoscaledNodeTypes(nodeTypes)
Set node types, shapes and spot
bids for Toil-managed autoscaling. :param nodeTypes: A list
of node types, as parsed with parse_node_types.
Parameters
nodeTypes ( list[tuple[set[str], Optional[float]]] )
hasAutoscaledNodeTypes()
Check if node types have been
configured on the provisioner (via setAutoscaledNodeTypes).
Returns
True if node types are configured for autoscaling, and false otherwise.
Return type
bool
getAutoscaledInstanceShapes()
Get all the node shapes and
their named instance types that the Toil autoscaler should
manage.
Return type
dict [ Shape , str ]
static retryPredicate(e)
Return true if the exception e
should be retried by the cluster scaler. For example, should
return true if the exception was due to exceeding an API
rate limit. The error will be retried with exponential
backoff.
Parameters
e -- exception raised during execution of setNodeCount
Returns
boolean indicating whether the exception e should be retried
abstract launchCluster(*args, **kwargs)
Initialize a cluster and create a leader node.
Implementations
must call _setLeaderWorkerAuthentication() with the leader
so that workers can be launched.
Parameters
|
• |
leaderNodeType -- The leader instance. |
||
|
• |
leaderStorage -- The amount of disk to allocate to the leader in gigabytes. |
||
|
• |
owner -- Tag identifying the owner of the instances. |
abstract addNodes(nodeTypes,
numNodes, preemptible,
spotBid=None)
Used to add worker nodes to the
cluster
Parameters
|
• |
numNodes ( int ) -- The number of nodes to add |
||
|
• |
preemptible ( bool ) -- whether or not the nodes will be preemptible |
||
|
• |
spotBid ( Optional[float] ) -- The bid for preemptible nodes if applicable (this can be set in config, also). |
||
|
• |
nodeTypes ( set[str] ) |
Returns
number of nodes successfully added
Return type
int
addManagedNodes(nodeTypes,
minNodes, maxNodes, preemptible,
spotBid=None)
Add a group of managed nodes of the given type, up to the given maximum. The nodes will automatically be launched and terminated depending on cluster load.
Raises
ManagedNodesNotSupportedException if the provisioner
implementation or cluster configuration can't have managed
nodes.
Parameters
|
• |
minNodes -- The minimum number of nodes to scale to |
||
|
• |
maxNodes -- The maximum number of nodes to scale to |
||
|
• |
preemptible -- whether or not the nodes will be preemptible |
||
|
• |
spotBid -- The bid for preemptible nodes if applicable (this can be set in config, also). |
||
|
• |
nodeTypes ( set[str] ) |
Return type
None
abstract terminateNodes(nodes)
Terminate the nodes represented
by given Node objects
Parameters
nodes ( list[toil.provisioners.node.Node] ) -- list of Node objects
Return type
None
abstract getLeader()
Returns
The leader node.
abstract
getProvisionedWorkers(instance_type=None,
preemptible=None)
Gets all nodes, optionally of
the given instance type or preemptability, from the
provisioner. Includes both static and autoscaled nodes.
Parameters
|
• |
preemptible ( Optional[bool] ) -- Boolean value to restrict to preemptible nodes or non-preemptible nodes |
||
|
• |
instance_type ( Optional[str] ) |
Returns
list of Node objects
Return type
list [ toil.provisioners.node.Node ]
abstract getNodeShape(instance_type, preemptible=False)
The shape of a preemptible or
non-preemptible node managed by this provisioner. The node
shape defines key properties of a machine, such as its
number of cores or the time between billing intervals.
Parameters
instance_type ( str ) -- Instance type name to return the shape of.
Return type
Shape
abstract destroyCluster()
Terminates all nodes in the
specified cluster and cleans up all resources associated
with the cluster. :param clusterName: identifier of the
cluster to terminate.
Return type
None
class InstanceConfiguration
Allows defining the initial
setup for an instance and then turning it into an Ignition
configuration for instance user data.
files = []
units = []
sshPublicKeys = []
addFile(path, filesystem='root', mode='0755',
contents='', append=False)
Make a file on the instance with the given filesystem, mode, and contents.
See the
storage.files section: -
https://github.com/kinvolk/ignition/blob/flatcar-master/doc/configuration-v2_2.md
Parameters
|
• |
path ( str ) |
|||
|
• |
filesystem ( str ) |
|||
|
• |
mode ( Union[str, int] ) |
|||
|
• |
contents ( str ) |
|||
|
• |
append ( bool ) |
addUnit(name, enabled=True, contents='')
Make a systemd unit on the
instance with the given name (including .service), and
content. Units will be enabled by default.
Unit logs can be investigated with:
systemctl status whatever.service
|
or: |
journalctl -xe |
Parameters
|
• |
name ( str ) |
|||
|
• |
enabled ( bool ) |
|||
|
• |
contents ( str ) |
addSSHRSAKey(keyData)
Authorize the given bare,
encoded RSA key (without "ssh-rsa").
Parameters
keyData ( str )
toIgnitionConfig()
Return an Ignition
configuration describing the desired config.
Return type
str
getBaseInstanceConfiguration()
Get the base configuration for
both leader and worker instances for all cluster types.
Return type
InstanceConfiguration
addVolumesService(config)
Add a service to prepare and
mount local scratch volumes.
Parameters
config ( InstanceConfiguration )
addNodeExporterService(config)
Add the node exporter service
for Prometheus to an instance configuration.
Parameters
config ( InstanceConfiguration )
toil_service_env_options()
Return type
str
add_toil_service(config, role, keyPath=None, preemptible=False)
Add the Toil leader or worker service to an instance configuration.
Will run Mesos
master or agent as appropriate in Mesos clusters. For
Kubernetes clusters, will just sleep to provide a place to
shell into on the leader, and shouldn't run on the worker.
Parameters
|
• |
role ( str ) -- Should be 'leader' or 'worker'. Will not work for 'worker' until leader credentials have been collected. |
||
|
• |
keyPath ( str ) -- path on the node to a server-side encryption key that will be added to the node after it starts. The service will wait until the key is present before starting. |
||
|
• |
preemptible ( bool ) -- Whether a worker should identify itself as preemptible or not to the scheduler. |
||
|
• |
config ( InstanceConfiguration ) |
getKubernetesValues(architecture='amd64')
Returns a dict of Kubernetes
component versions and paths for formatting into
Kubernetes-related templates.
Parameters
architecture ( str )
addKubernetesServices(config, architecture='amd64')
Add installing Kubernetes and
Kubeadm and setting up the Kubelet to run when configured to
an instance configuration. The same process applies to
leaders and workers.
Parameters
|
• |
config ( InstanceConfiguration ) |
|||
|
• |
architecture ( str ) |
abstract getKubernetesAutoscalerSetupCommands(values)
Return Bash commands that set up the Kubernetes cluster autoscaler for provisioning from the environment supported by this provisioner.
Should only be
implemented if Kubernetes clusters are supported.
Parameters
values ( dict[str, str] ) -- Contains definitions of cluster variables, like AUTOSCALER_VERSION and CLUSTER_NAME.
Returns
Bash snippet
Return type
str
getKubernetesCloudProvider()
Return the Kubernetes cloud provider (for example, 'aws'), to pass to the kubelets in a Kubernetes cluster provisioned using this provisioner.
Defaults to
None if not overridden, in which case no cloud provider
integration will be used.
Returns
Cloud provider name, or None
Return type
Optional[ str ]
addKubernetesLeader(config)
Add services to configure as a
Kubernetes leader, if Kubernetes is already set to be
installed.
Parameters
config ( InstanceConfiguration )
addKubernetesWorker(config, authVars, preemptible=False)
Add services to configure as a Kubernetes worker, if Kubernetes is already set to be installed.
Authenticate
back to the leader using the JOIN_TOKEN, JOIN_CERT_HASH, and
JOIN_ENDPOINT set in the given authentication data dict.
Parameters
|
• |
config ( InstanceConfiguration ) -- The configuration to add services to |
||
|
• |
authVars ( dict[str, str] ) -- Dict with authentication info |
||
|
• |
preemptible ( bool ) -- Whether the worker should be labeled as preemptible or not |
toil.provisioners.aws
Submodules
toil.provisioners.aws.awsProvisioner
Attributes
Exceptions
Classes
Functions
Module Contents
toil.provisioners.aws.awsProvisioner.logger
toil.provisioners.aws.awsProvisioner.awsRetryPredicate(e)
Parameters
e ( Exception )
Return type
bool
toil.provisioners.aws.awsProvisioner.expectedShutdownErrors(e)
Matches errors that we expect to occur during shutdown, and which indicate that we need to wait or try again.
Should
not
match any errors which indicate that an operation
is impossible or unnecessary (such as errors resulting from
a thing not existing to be deleted).
Parameters
e ( Exception )
Return type
bool
toil.provisioners.aws.awsProvisioner.F
toil.provisioners.aws.awsProvisioner.awsRetry(f)
This decorator retries the wrapped function if aws throws unexpected errors.
It should wrap
any function that makes use of boto
Parameters
f ( Callable[Ellipsis, F] )
Return type
Callable[Ellipsis, F]
toil.provisioners.aws.awsProvisioner.awsFilterImpairedNodes(nodes,
boto3_ec2)
Parameters
|
• |
nodes ( list[mypy_boto3_ec2.type_defs.InstanceTypeDef] ) |
|||
|
• |
boto3_ec2 ( mypy_boto3_ec2.client.EC2Client ) |
Return type
list [mypy_boto3_ec2.type_defs.InstanceTypeDef]
exception
toil.provisioners.aws.awsProvisioner.InvalidClusterStateException
Bases: Exception
Common base class for all non-exit exceptions.
toil.provisioners.aws.awsProvisioner.collapse_tags(instance_tags)
Collapse tags from boto3 format
to node format :param instance_tags: tags as a list :return:
Dict of tags
Parameters
instance_tags ( list[mypy_boto3_ec2.type_defs.TagTypeDef] )
Return type
dict [ str , str ]
class
toil.provisioners.aws.awsProvisioner.AWSProvisioner(clusterName,
clusterType, zone, nodeStorage, nodeStorageOverrides,
sseKey,
enable_fuse)
Bases: toil.provisioners.abstractProvisioner.AbstractProvisioner
Interface for
provisioning worker nodes to use in a Toil cluster.
Parameters
|
• |
clusterName ( str | None ) |
|||
|
• |
clusterType ( str | None ) |
|||
|
• |
zone ( str | None ) |
|||
|
• |
nodeStorage ( int ) |
|||
|
• |
nodeStorageOverrides ( list[str] | None ) |
|||
|
• |
sseKey ( str | None ) |
|||
|
• |
enable_fuse ( bool ) |
cloud = 'aws'
|
aws |
s3_bucket_name
supportedClusterTypes()
Get all the cluster types that
this provisioner implementation supports.
Return type
set [ str ]
createClusterSettings()
Create a new set of cluster
settings for a cluster to be deployed into AWS.
Return type
None
readClusterSettings()
Reads the cluster settings from
the instance metadata, which assumes the instance is the
leader.
Return type
None
launchCluster(leaderNodeType,
leaderStorage, owner, keyName,
botoPath, userTags, vpcSubnet, awsEc2ProfileArn,
awsEc2ExtraSecurityGroupIds, **kwargs)
Starts a single leader node and
populates this class with the leader's metadata.
Parameters
|
• |
leaderNodeType ( str ) -- An AWS instance type, like "t2.medium", for example. |
||
|
• |
leaderStorage ( int ) -- An integer number of gigabytes to provide the leader instance with. |
||
|
• |
owner ( str ) -- Resources will be tagged with this owner string. |
||
|
• |
keyName ( str ) -- The ssh key to use to access the leader node. |
||
|
• |
botoPath ( str ) -- The path to the boto credentials directory. |
||
|
• |
userTags ( dict[str, str] | None ) -- Optionally provided user tags to put on the cluster. |
||
|
• |
vpcSubnet ( str | None ) -- Optionally specify the VPC subnet for the leader. |
||
|
• |
awsEc2ProfileArn ( str | None ) -- Optionally provide the profile ARN. |
||
|
• |
awsEc2ExtraSecurityGroupIds ( list[str] | None ) -- Optionally provide additional security group IDs. |
||
|
• |
kwargs ( dict[str, Any] ) |
Returns
None
Return type
None
toil_service_env_options()
Set AWS tags in user docker
container
Return type
str
getKubernetesAutoscalerSetupCommands(values)
Get the Bash commands necessary
to configure the Kubernetes Cluster Autoscaler for AWS.
Parameters
values ( dict[str, str] )
Return type
str
getKubernetesCloudProvider()
Use the "aws"
Kubernetes cloud provider when setting up Kubernetes.
Return type
str | None
getNodeShape(instance_type, preemptible=False)
Get the Shape for the given
instance type (e.g. 't2.medium').
Parameters
|
• |
instance_type ( str ) |
|||
|
• |
preemptible ( bool ) |
Return type
toil.provisioners.abstractProvisioner.Shape
static retryPredicate(e)
Return true if the exception e
should be retried by the cluster scaler. For example, should
return true if the exception was due to exceeding an API
rate limit. The error will be retried with exponential
backoff.
Parameters
e ( Exception ) -- exception raised during execution of setNodeCount
Returns
boolean indicating whether the exception e should be retried
Return type
bool
destroyCluster()
Terminate instances and delete
the profile and security group.
Return type
None
terminateNodes(nodes)
Terminate the nodes represented
by given Node objects
Parameters
nodes ( list[toil.provisioners.node.Node] ) -- list of Node objects
Return type
None
addNodes(nodeTypes, numNodes, preemptible, spotBid=None)
Used to add worker nodes to the
cluster
Parameters
|
• |
numNodes ( int ) -- The number of nodes to add |
||
|
• |
preemptible ( bool ) -- whether or not the nodes will be preemptible |
||
|
• |
spotBid ( float | None ) -- The bid for preemptible nodes if applicable (this can be set in config, also). |
||
|
• |
nodeTypes ( set[str] ) |
Returns
number of nodes successfully added
Return type
int
addManagedNodes(nodeTypes,
minNodes, maxNodes, preemptible,
spotBid=None)
Add a group of managed nodes of the given type, up to the given maximum. The nodes will automatically be launched and terminated depending on cluster load.
Raises
ManagedNodesNotSupportedException if the provisioner
implementation or cluster configuration can't have managed
nodes.
Parameters
|
• |
minNodes ( int ) -- The minimum number of nodes to scale to |
||
|
• |
maxNodes ( int ) -- The maximum number of nodes to scale to |
||
|
• |
preemptible ( bool ) -- whether or not the nodes will be preemptible |
||
|
• |
spotBid ( float | None ) -- The bid for preemptible nodes if applicable (this can be set in config, also). |
||
|
• |
nodeTypes ( set[str] ) |
Return type
None
getProvisionedWorkers(instance_type=None, preemptible=None)
Gets all nodes, optionally of
the given instance type or preemptability, from the
provisioner. Includes both static and autoscaled nodes.
Parameters
|
• |
preemptible ( bool | None ) -- Boolean value to restrict to preemptible nodes or non-preemptible nodes |
||
|
• |
instance_type ( str | None ) |
Returns
list of Node objects
Return type
list [ toil.provisioners.node.Node ]
getLeader(wait=False)
Get the leader for the cluster
as a Toil Node object.
Parameters
wait ( bool )
Return type
toil.provisioners.node.Node
full_policy(resource)
Produce a dict describing the
JSON form of a full-access-granting AWS IAM policy for the
service with the given name (e.g. 's3').
Parameters
resource ( str )
Return type
dict [ str , Any]
kubernetes_policy()
Get the Kubernetes policy grants not provided by the full grants on EC2 and IAM. See <- https://github.com/DataBiosphere/toil/wiki/Manual-Autoscaling-Kubernetes-Setup#leader-policy > and <- https://github.com/DataBiosphere/toil/wiki/Manual-Autoscaling-Kubernetes-Setup#worker-policy >.
These are mostly needed to support Kubernetes' AWS CloudProvider, and some are for the Kubernetes Cluster Autoscaler's AWS integration.
Some of these
are really only needed on the leader.
Return type
dict [ str , Any]
Attributes
Functions
Package Contents
toil.provisioners.aws.logger
toil.provisioners.aws.ZoneTuple
toil.provisioners.aws.get_aws_zone_from_spot_market(spotBid,
nodeType,
boto3_ec2, zone_options)
If a spot bid, node type, and Boto2 EC2 connection are specified, picks a zone where instances are easy to buy from the zones in the region of the Boto2 connection. These parameters must always be specified together, or not at all.
In this case,
zone_options can be used to restrict to a subset of the
zones in the region.
Parameters
|
• |
spotBid ( Optional[float] ) |
|||
|
• |
nodeType ( Optional[str] ) |
|||
|
• |
boto3_ec2 ( Optional[botocore.client.BaseClient] ) |
|||
|
• |
zone_options ( Optional[list[str]] ) |
Return type
Optional[ str ]
toil.provisioners.aws.get_best_aws_zone(spotBid=None,
nodeType=None,
boto3_ec2=None, zone_options=None)
Get the right AWS zone to use.
Reports the TOIL_AWS_ZONE environment variable if set.
Otherwise, if we are running on EC2 or ECS, reports the zone we are running in.
Otherwise, if a spot bid, node type, and Boto2 EC2 connection are specified, picks a zone where instances are easy to buy from the zones in the region of the Boto2 connection. These parameters must always be specified together, or not at all.
In this case, zone_options can be used to restrict to a subset of the zones in the region.
Otherwise, if we have the TOIL_AWS_REGION variable set, chooses a zone in that region.
Finally, if a default region is configured in Boto 2, chooses a zone in that region.
Returns None if
no method can produce a zone to use.
Parameters
|
• |
spotBid ( Optional[float] ) |
|||
|
• |
nodeType ( Optional[str] ) |
|||
|
• |
boto3_ec2 ( Optional[botocore.client.BaseClient] ) |
|||
|
• |
zone_options ( Optional[list[str]] ) |
Return type
Optional[ str ]
toil.provisioners.aws.choose_spot_zone(zones, bid, spot_history)
Returns the zone to put the spot request based on, in order of priority:
|
1. |
zones with prices currently under the bid |
|||
|
2. |
zones with the most stable price |
Returns
the name of the selected zone
Parameters
|
• |
zones ( list[str] ) |
||
|
• |
bid ( float ) |
||
|
• |
spot_history (- list[boto.ec2.spotpricehistory.SpotPriceHistory] ) |
Return type
str
>>>
from collections import namedtuple
>>> FauxHistory = namedtuple('FauxHistory',
['price', 'availability_zone'])
>>> zones = ['us-west-2a', 'us-west-2b']
>>> spot_history = [FauxHistory(0.1, 'us-west-2a'),
FauxHistory(0.2, 'us-west-2a'), FauxHistory(0.3,
'us-west-2b'), FauxHistory(0.6, 'us-west-2b')]
>>> choose_spot_zone(zones, 0.15, spot_history)
'us-west-2a'
>>>
spot_history=[FauxHistory(0.3, 'us-west-2a'),
FauxHistory(0.2, 'us-west-2a'), FauxHistory(0.1,
'us-west-2b'), FauxHistory(0.6, 'us-west-2b')]
>>> choose_spot_zone(zones, 0.15, spot_history)
'us-west-2b'
>>>
spot_history=[FauxHistory(0.1, 'us-west-2a'),
FauxHistory(0.7, 'us-west-2a'), FauxHistory(0.1,
'us-west-2b'), FauxHistory(0.6, 'us-west-2b')]
>>> choose_spot_zone(zones, 0.15, spot_history)
'us-west-2b'
toil.provisioners.aws.optimize_spot_bid(boto3_ec2,
instance_type,
spot_bid, zone_options)
Check whether the bid is in
line with history and makes an effort to place the instance
in a sensible zone.
Parameters
|
• |
zone_options ( list[str] ) -- The collection of allowed zones to consider, within the region associated with the Boto2 connection. |
||
|
• |
boto3_ec2 ( botocore.client.BaseClient ) |
||
|
• |
instance_type ( str ) |
||
|
• |
spot_bid ( float ) |
toil.provisioners.clusterScaler
Attributes
Exceptions
Classes
Functions
Module Contents
toil.provisioners.clusterScaler.logger
toil.provisioners.clusterScaler.EVICTION_THRESHOLD
toil.provisioners.clusterScaler.RESERVE_SMALL_LIMIT
toil.provisioners.clusterScaler.RESERVE_SMALL_AMOUNT
toil.provisioners.clusterScaler.RESERVE_BREAKPOINTS:
list
[
int
|
float
]
toil.provisioners.clusterScaler.RESERVE_FRACTIONS = [0.25,
0.2, 0.1,
0.06, 0.02]
toil.provisioners.clusterScaler.OS_SIZE
toil.provisioners.clusterScaler.FailedConstraint
class
toil.provisioners.clusterScaler.BinPackedFit(nodeShapes,
targetTime=defaultTargetTime)
If jobShapes is a set of tasks with run requirements (mem/disk/cpu), and nodeShapes is a sorted list of available computers to run these jobs on, this function attempts to return a dictionary representing the minimum set of computerNode computers needed to run the tasks in jobShapes.
Uses a first
fit decreasing (FFD) bin packing like algorithm to calculate
an approximate minimum number of nodes that will fit the
given list of jobs. BinPackingFit assumes the ordered list,
nodeShapes, is ordered for "node preference"
outside of BinPackingFit beforehand. So when virtually
"creating" nodes, the first node within nodeShapes
that fits the job is the one that's added.
Parameters
|
• |
nodeShapes ( list ) -- The properties of an atomic node allocation, in terms of wall-time, memory, cores, disk, and whether it is preemptible or not. |
||
|
• |
targetTime ( float ) -- The time before which all jobs should at least be started. |
Returns
The minimum number of minimal node allocations estimated to be required to run all the jobs in jobShapes.
nodeReservations:
dict
[-
toil.provisioners.abstractProvisioner.Shape
,
list
[-
NodeReservation
]]
nodeShapes
targetTime
binPack(jobShapes)
Pack a list of jobShapes into the fewest nodes reasonable.
Can be run multiple times.
Returns any
distinct Shapes that did not fit, mapping to reasons they
did not fit.
Parameters
jobShapes ( list[- toil.provisioners.abstractProvisioner.Shape] )
Return type
dict [ toil.provisioners.abstractProvisioner.Shape , list [FailedConstraint]]
addJobShape(jobShape)
Add the job to the first node reservation in which it will fit. (This is the bin-packing aspect).
Returns the job
shape again, and a list of failed constraints, if it did not
fit.
Parameters
jobShape (- toil.provisioners.abstractProvisioner.Shape )
Return type
Optional[ tuple [- toil.provisioners.abstractProvisioner.Shape , list [FailedConstraint]]]
getRequiredNodes()
Return a dict from node shape
to number of nodes required to run the packed jobs.
Return type
dict [ toil.provisioners.abstractProvisioner.Shape , int ]
class toil.provisioners.clusterScaler.NodeReservation(shape)
The amount of resources that we expect to be available on a given node at each point in time.
To represent
the resources available in a reservation, we represent a
reservation as a linked list of NodeReservations, each
giving the resources free within a single timeslice.
Parameters
shape ( toil.provisioners.abstractProvisioner.Shape )
|
shape |
nReservation:
NodeReservation
|
None
= None
__str__()
Return type
str
get_failed_constraints(job_shape)
Check if a job shape's resource requirements will fit within this allocation.
If the job does not fit, returns the failing constraints: the resources that can't be accomodated, and the limits that were hit.
If the job does fit, returns an empty list.
Must always
agree with fits()! This codepath is slower and used for
diagnosis.
Parameters
job_shape (- toil.provisioners.abstractProvisioner.Shape )
Return type
list [FailedConstraint]
fits(jobShape)
Check if a job shape's resource
requirements will fit within this allocation.
Parameters
jobShape (- toil.provisioners.abstractProvisioner.Shape )
Return type
bool
shapes()
Get all time-slice shapes, in
order, from this reservation on.
Return type
list [ toil.provisioners.abstractProvisioner.Shape ]
subtract(jobShape)
Subtract the resources
necessary to run a jobShape from the reservation.
Parameters
jobShape (- toil.provisioners.abstractProvisioner.Shape )
Return type
None
attemptToAddJob(jobShape, nodeShape, targetTime)
Attempt to pack a job into this reservation timeslice and/or the reservations after it.
jobShape is the
Shape of the job requirements, nodeShape is the Shape of the
node this is a reservation for, and targetTime is the
maximum time to wait before starting this job.
Parameters
|
• |
jobShape (- toil.provisioners.abstractProvisioner.Shape ) |
||
|
• |
nodeShape (- toil.provisioners.abstractProvisioner.Shape ) |
||
|
• |
targetTime ( float ) |
Return type
bool
toil.provisioners.clusterScaler.adjustEndingReservationForJob(reservation,
jobShape, wallTime)
Add a job to an ending reservation that ends at wallTime.
(splitting the
reservation if the job doesn't fill the entire timeslice)
Parameters
|
• |
reservation ( NodeReservation ) |
|||
|
• |
jobShape ( toil.provisioners.abstractProvisioner.Shape ) |
|||
|
• |
wallTime ( float ) |
Return type
None
toil.provisioners.clusterScaler.split(nodeShape, jobShape, wallTime)
Partition a node allocation into two to fit the job.
Returning the
modified shape of the node and a new node reservation for
the extra time that the job didn't fill.
Parameters
|
• |
nodeShape ( toil.provisioners.abstractProvisioner.Shape ) |
||
|
• |
jobShape ( toil.provisioners.abstractProvisioner.Shape ) |
||
|
• |
wallTime ( float ) |
Return type
tuple [ toil.provisioners.abstractProvisioner.Shape , NodeReservation ]
toil.provisioners.clusterScaler.binPacking(nodeShapes,
jobShapes,
goalTime)
Using the given node shape bins, pack the given job shapes into nodes to get them done in the given amount of time.
Returns a dict
saying how many of each node will be needed, a dict from job
shapes that could not fit to reasons why.
Parameters
|
• |
nodeShapes ( list[- toil.provisioners.abstractProvisioner.Shape] ) |
||
|
• |
jobShapes ( list[- toil.provisioners.abstractProvisioner.Shape] ) |
||
|
• |
goalTime ( float ) |
Return type
tuple [ dict [ toil.provisioners.abstractProvisioner.Shape , int ], dict [ toil.provisioners.abstractProvisioner.Shape , list [FailedConstraint]]]
class
toil.provisioners.clusterScaler.ClusterScaler(provisioner,
leader, config)
Parameters
|
• |
provisioner (- toil.provisioners.abstractProvisioner.AbstractProvisioner ) |
||
|
• |
leader ( toil.leader.Leader ) |
||
|
• |
config ( toil.common.Config ) |
provisioner
|
leader |
||
|
config |
static:
dict
[
bool
,
dict
[
str
,
toil.provisioners.node.Node
]]
on_too_big:
list
[Callable[[-
toil.provisioners.abstractProvisioner.Shape
,
list
[-
toil.provisioners.abstractProvisioner.Shape
]], Any]]
= []
jobNameToAvgRuntime:
dict
[
str
,
float
]
jobNameToNumCompleted:
dict
[
str
,
int
]
totalAvgRuntime = 0.0
totalJobsCompleted = 0
targetTime:
float
betaInertia
nodeShapeToType
instance_types
nodeShapes
ignoredNodes:
set
[
str
]
preemptibleNodeDeficit
previousWeightedEstimate
minNodes
maxNodes
node_shapes_after_overhead
without_overhead
getAverageRuntime(jobName, service=False)
Parameters
|
• |
jobName ( str ) |
|||
|
• |
service ( bool ) |
Return type
float
addCompletedJob(job, wallTime)
Adds the shape of a completed
job to the queue, allowing the scalar to use the last N
completed jobs in factoring how many nodes are required in
the cluster. :param toil.job.JobDescription job: The
description of the completed job :param int wallTime: The
wall-time taken to complete the job in seconds.
Parameters
|
• |
job ( toil.job.JobDescription ) |
|||
|
• |
wallTime ( int ) |
Return type
None
setStaticNodes(nodes, preemptible)
Used to track statically provisioned nodes. This method must be called before any auto-scaled nodes are provisioned.
These nodes are
treated differently than auto-scaled nodes in that they
should not be automatically terminated.
Parameters
|
• |
nodes ( list[toil.provisioners.node.Node] ) -- list of Node objects |
||
|
• |
preemptible ( bool ) |
Return type
None
getStaticNodes(preemptible)
Returns nodes set in
setStaticNodes().
Parameters
preemptible ( bool )
Returns
Statically provisioned nodes.
Return type
dict [ str , toil.provisioners.node.Node ]
smoothEstimate(nodeShape, estimatedNodeCount)
Smooth out fluctuations in the estimate for this node compared to previous runs.
Returns an
integer.
Parameters
|
• |
nodeShape (- toil.provisioners.abstractProvisioner.Shape ) |
||
|
• |
estimatedNodeCount ( int ) |
Return type
int
getEstimatedNodeCounts(queuedJobShapes, currentNodeCounts)
Given the resource requirements of queued jobs and the current size of the cluster.
Returns a dict
mapping from nodeShape to the number of nodes we want in the
cluster right now, and a dict from job shapes that are too
big to run on any node to reasons why.
Parameters
|
• |
queuedJobShapes ( list[- toil.provisioners.abstractProvisioner.Shape] ) |
||
|
• |
currentNodeCounts ( dict[- toil.provisioners.abstractProvisioner.Shape, int] ) |
Return type
tuple [ dict [- toil.provisioners.abstractProvisioner.Shape , int ], dict [ toil.provisioners.abstractProvisioner.Shape , list [FailedConstraint]]]
updateClusterSize(estimatedNodeCounts)
Given the desired and current size of the cluster, attempts to launch/remove instances to get to the desired size.
Also attempts to remove ignored nodes that were marked for graceful removal.
Returns the new
size of the cluster.
Parameters
estimatedNodeCounts ( dict[- toil.provisioners.abstractProvisioner.Shape, int] )
Return type
dict [ toil.provisioners.abstractProvisioner.Shape , int ]
setNodeCount(instance_type,
numNodes, preemptible=False,
force=False)
Attempt to grow or shrink the
number of preemptible or non-preemptible worker nodes in the
cluster to the given value, or as close a value as possible,
and, after performing the necessary additions or removals of
worker nodes, return the resulting number of preemptible or
non-preemptible nodes currently in the cluster.
Parameters
|
• |
instance_type ( str ) -- The instance type to add or remove. |
||
|
• |
numNodes ( int ) -- Desired size of the cluster |
||
|
• |
preemptible ( bool ) -- whether the added nodes will be preemptible, i.e. whether they may be removed spontaneously by the underlying platform at any time. |
||
|
• |
force ( bool ) -- If False, the provisioner is allowed to deviate from the given number of nodes. For example, when downsizing a cluster, a provisioner might leave nodes running if they have active jobs running on them. |
Returns
the number of worker nodes in the cluster after making the necessary adjustments. This value should be, but is not guaranteed to be, close or equal to the numNodes argument. It represents the closest possible approximation of the actual cluster size at the time this method returns.
Return type
int
filter_out_static_nodes(nodes, preemptible=False)
Parameters
|
• |
nodes ( dict[toil.provisioners.node.Node, toil.batchSystems.abstractBatchSystem.NodeInfo] ) |
||
|
• |
preemptible ( bool ) |
Return type
list [ tuple [ toil.provisioners.node.Node , toil.batchSystems.abstractBatchSystem.NodeInfo ]]
getNodes(preemptible=None)
Returns a dictionary mapping node identifiers of preemptible or non-preemptible nodes to NodeInfo objects, one for each node.
This method is
the definitive source on nodes in cluster, & is
responsible for consolidating cluster state between the
provisioner & batch system.
Parameters
preemptible ( bool ) -- If True (False) only (non-)preemptible nodes will be returned. If None, all nodes will be returned.
Return type
dict [ toil.provisioners.node.Node , toil.batchSystems.abstractBatchSystem.NodeInfo ]
shutDown()
Return type
None
exception
toil.provisioners.clusterScaler.JobTooBigError(job=None,
shape=None, constraints=None)
Bases: Exception
Raised in the
scaler thread when a job cannot fit in any available node
type and is likely to lock up the workflow.
Parameters
|
• |
job ( Optional[toil.job.JobDescription] ) |
||
|
• |
shape ( Optional[- toil.provisioners.abstractProvisioner.Shape] ) |
||
|
• |
constraints ( Optional[list[FailedConstraint]] ) |
||
|
job |
|||
|
shape |
constraints
|
msg |
__str__()
Stringify the exception,
including the message.
Return type
str
class
toil.provisioners.clusterScaler.ScalerThread(provisioner,
leader,
config, stop_on_exception=False)
Bases: toil.lib.threading.ExceptionalThread
A thread that automatically scales the number of either preemptible or non-preemptible worker nodes according to the resource requirements of the queued jobs.
The scaling calculation is essentially as follows: start with 0 estimated worker nodes. For each queued job, check if we expect it can be scheduled into a worker node before a certain time (currently one hour). Otherwise, attempt to add a single new node of the smallest type that can fit that job.
At each scaling
decision point a comparison between the current, C, and
newly estimated number of nodes is made. If the absolute
difference is less than beta * C then no change is made,
else the size of the cluster is adapted. The beta factor is
an inertia parameter that prevents continual fluctuations in
the number of nodes.
Parameters
|
• |
provisioner (- toil.provisioners.abstractProvisioner.AbstractProvisioner ) |
||
|
• |
leader ( toil.leader.Leader ) |
||
|
• |
config ( toil.common.Config ) |
||
|
• |
stop_on_exception ( bool ) |
||
|
scaler |
stop = False
stop_on_exception
stats = None
check()
Attempt to join any existing scaler threads that may have died or finished.
This insures
any exceptions raised in the threads are propagated in a
timely fashion.
Return type
None
shutdown()
Shutdown the cluster.
Return type
None
addCompletedJob(job, wallTime)
Parameters
|
• |
job ( toil.job.JobDescription ) |
|||
|
• |
wallTime ( int ) |
Return type
None
tryRun()
Return type
None
class
toil.provisioners.clusterScaler.ClusterStats(path,
batchSystem,
clusterName)
Parameters
|
• |
path ( str ) |
||
|
• |
batchSystem (- toil.batchSystems.abstractBatchSystem.AbstractBatchSystem ) |
||
|
• |
clusterName ( Optional[str] ) |
stats:
dict
[
str
,
dict
[
str
,
list
[
dict
[
str
, Any]]]]
statsThreads:
list
[
toil.lib.threading.ExceptionalThread
]
= []
statsPath
stop = False
clusterName
batchSystem
scaleable
shutDownStats()
Return type
None
startStats(preemptible)
Parameters
preemptible ( bool )
Return type
None
checkStats()
Return type
None
toil.provisioners.gceProvisioner
Attributes
Classes
Module Contents
toil.provisioners.gceProvisioner.logger
class
toil.provisioners.gceProvisioner.GCEProvisioner(clusterName,
clusterType, zone, nodeStorage, nodeStorageOverrides,
sseKey,
enable_fuse)
Bases: toil.provisioners.abstractProvisioner.AbstractProvisioner
Implements a
Google Compute Engine Provisioner using libcloud.
NODE_BOTO_PATH = '/root/.boto'
SOURCE_IMAGE =
b'projects/kinvolk-public/global/images/family/flatcar-stable'
cloud = 'gce'
supportedClusterTypes()
Get all the cluster types that this provisioner implementation supports.
createClusterSettings()
Initialize class for a new cluster, to be deployed, when running outside the cloud.
readClusterSettings()
Read the cluster settings from the instance, which should be the leader. See - https://cloud.google.com/compute/docs/storing-retrieving-metadata for details about reading the metadata.
launchCluster(leaderNodeType, leaderStorage, owner, **kwargs)
In addition to the parameters inherited from the abstractProvisioner, the Google launchCluster takes the following parameters: keyName: The key used to communicate with instances botoPath: Boto credentials for reading an AWS jobStore (optional). network: a network (optional) vpcSubnet: A subnet (optional). use_private_ip: even though a public ip exists, ignore it (optional)
getNodeShape(instance_type, preemptible=False)
The shape of a preemptible or
non-preemptible node managed by this provisioner. The node
shape defines key properties of a machine, such as its
number of cores or the time between billing intervals.
Parameters
instance_type ( str ) -- Instance type name to return the shape of.
Return type
toil.provisioners.abstractProvisioner.Shape
static retryPredicate(e)
Not used by GCE
destroyCluster()
Try a few times to terminate
all of the instances in the group.
Return type
None
terminateNodes(nodes)
Terminate the nodes represented
by given Node objects
Parameters
nodes -- list of Node objects
addNodes(nodeTypes, numNodes, preemptible, spotBid=None)
Used to add worker nodes to the
cluster
Parameters
|
• |
numNodes -- The number of nodes to add |
||
|
• |
preemptible -- whether or not the nodes will be preemptible |
||
|
• |
spotBid -- The bid for preemptible nodes if applicable (this can be set in config, also). |
||
|
• |
nodeTypes ( set[str] ) |
Returns
number of nodes successfully added
Return type
int
getProvisionedWorkers(instance_type=None, preemptible=None)
Gets all nodes, optionally of
the given instance type or preemptability, from the
provisioner. Includes both static and autoscaled nodes.
Parameters
|
• |
preemptible ( Optional[bool] ) -- Boolean value to restrict to preemptible nodes or non-preemptible nodes |
||
|
• |
instance_type ( Optional[str] ) |
Returns
list of Node objects
getLeader()
Returns
The leader node.
DEFAULT_TASK_COMPLETION_TIMEOUT
= 180
ex_create_multiple_nodes(base_name, size, image, number,
location=None, ex_network='default', ex_subnetwork=None,
ex_tags=None, ex_metadata=None, ignore_errors=True,
use_existing_disk=True, poll_interval=2,
external_ip='ephemeral', ex_disk_type='pd-standard',
ex_disk_auto_delete=True, ex_service_accounts=None,
timeout=DEFAULT_TASK_COMPLETION_TIMEOUT, description=None,
ex_can_ip_forward=None, ex_disks_gce_struct=None,
ex_nic_gce_struct=None, ex_on_host_maintenance=None,
ex_automatic_restart=None, ex_image_family=None,
ex_preemptible=None)
Monkey patch to gce.py in libcloud to allow disk and images to be specified. Also changed name to a uuid below. The prefix 'wp' identifies preemptible nodes and 'wn' non-preemptible nodes.
toil.provisioners.node
Attributes
Classes
Module Contents
toil.provisioners.node.a_short_time
= 5
toil.provisioners.node.logger
class toil.provisioners.node.Node(publicIP, privateIP, name,
launchTime, nodeType, preemptible, tags=None,
use_private_ip=None)
Parameters
|
• |
publicIP ( str ) |
|||
|
• |
privateIP ( str ) |
|||
|
• |
name ( str ) |
|||
|
• |
launchTime ( Union[datetime.datetime, str] ) |
|||
|
• |
nodeType ( Optional[str] ) |
|||
|
• |
preemptible ( bool ) |
|||
|
• |
tags ( Optional[dict[str, str]] ) |
|||
|
• |
use_private_ip ( Optional[bool] ) |
maxWaitTime
publicIP
privateIP
|
name |
nodeType
preemptible
|
tags |
__str__()
__repr__()
__hash__()
remainingBillingInterval()
If the node has a launch time, this function returns a floating point value between 0 and 1.0 representing how far we are into the current billing cycle for the given instance. If the return value is .25, we are one quarter into the billing cycle, with three quarters remaining before we will be charged again for that instance.
Assumes a
billing cycle of one hour.
Returns
Float from 0 -> 1.0 representing percentage of pre-paid time left in cycle.
Return type
float
waitForNode(role, keyName='core')
Parameters
|
• |
role ( str ) |
|||
|
• |
keyName ( str ) |
Return type
None
copySshKeys(keyName)
Copy authorized_keys file to the core user from the keyName user.
injectFile(fromFile, toFile, role)
rysnc a file to the container with the given role
extractFile(fromFile, toFile, role)
rysnc a file from the container with the given role
sshAppliance(*args, **kwargs)
Parameters
|
• |
args -- arguments to execute in the appliance |
||
|
• |
kwargs -- tty=bool tells docker whether or not to create a TTY shell for interactive SSHing. The default value is False. Input=string is passed as input to the Popen call. |
sshInstance(*args, **kwargs)
Run a command on the instance. Returns the binary output of the command.
coreSSH(*args, **kwargs)
If strict=False, strict host key checking will be temporarily disabled. This is provided as a convenience for internal/automated functions and ought to be set to True whenever feasible, or whenever the user is directly interacting with a resource (e.g. rsync-cluster or ssh-cluster). Assumed to be False by default.
kwargs: input,
tty, appliance, collectStdout, sshOptions, strict
Parameters
input ( bytes ) -- UTF-8 encoded input bytes to send to the command
coreRsync(args, applianceName='toil_leader', **kwargs)
Parameters
|
• |
args ( list[str] ) |
|||
|
• |
applianceName ( str ) |
|||
|
• |
kwargs ( Any ) |
Return type
int
Attributes
Exceptions
Functions
Package Contents
toil.provisioners.logger
toil.provisioners.cluster_factory(provisioner,
clusterName=None,
clusterType='mesos', zone=None, nodeStorage=50,
nodeStorageOverrides=None, sseKey=None,
enable_fuse=False)
Find and instantiate the appropriate provisioner instance to make clusters in the given cloud.
Raises
ClusterTypeNotSupportedException if the given provisioner
does not implement clusters of the given type.
Parameters
|
• |
provisioner ( str ) -- The cloud type of the cluster. |
||
|
• |
clusterName ( Optional[str] ) -- The name of the cluster. |
||
|
• |
clusterType ( str ) -- The type of cluster: 'mesos' or 'kubernetes'. |
||
|
• |
zone ( Optional[str] ) -- The cloud zone |
||
|
• |
nodeStorage ( int ) |
||
|
• |
nodeStorageOverrides ( Optional[list[str]] ) |
||
|
• |
sseKey ( Optional[str] ) |
||
|
• |
enable_fuse ( bool ) |
Returns
A cluster object for the the cloud type.
Return type
Union[ aws.awsProvisioner.AWSProvisioner , gceProvisioner.GCEProvisioner ]
toil.provisioners.add_provisioner_options(parser)
Parameters
parser ( argparse.ArgumentParser )
Return type
None
toil.provisioners.parse_node_types(node_type_specs)
Parse a specification for zero or more node types.
Takes a comma-separated list of node types. Each node type is a slash-separated list of at least one instance type name (like 'm5a.large' for AWS), and an optional bid in dollars after a colon.
Raises ValueError if a node type cannot be parsed.
Inputs should look something like this:
>>>
parse_node_types('c5.4xlarge/c5a.4xlarge:0.42,t2.large')
[({'c5.4xlarge', 'c5a.4xlarge'}, 0.42), ({'t2.large'},
None)]
Parameters
node_type_specs ( Optional[str] ) -- A string defining node types
Returns
a list of node types, where each type is the set of instance types, and the float bid, or None.
Return type
list [ tuple [ set [ str ], Optional[ float ]]]
toil.provisioners.check_valid_node_types(provisioner, node_types)
Raises if an invalid nodeType
is specified for aws or gce.
Parameters
|
• |
provisioner ( str ) -- 'aws' or 'gce' to specify which cloud provisioner used. |
||
|
• |
node_types ( list[tuple[set[str], Optional[float]]] ) -- A list of node types. Example: [({'t2.micro'}, None), ({'t2.medium'}, 0.5)] |
Returns
Nothing. Raises if any instance type in the node type isn't real.
exception toil.provisioners.NoSuchClusterException(cluster_name)
Bases: Exception
Indicates that
the specified cluster does not exist.
Parameters
cluster_name ( str )
exception toil.provisioners.NoSuchZoneException
Bases: Exception
Indicates that a valid zone could not be found.
exception
toil.provisioners.ClusterTypeNotSupportedException(provisioner_class,
cluster_type)
Bases: Exception
Indicates that a provisioner does not support a given cluster type.
exception
toil.provisioners.ClusterCombinationNotSupportedException(provisioner_class,
cluster_type, architecture, reason=None)
Bases: Exception
Indicates that
a provisioner does not support making a given type of
cluster with a given architecture.
Parameters
|
• |
provisioner_class ( type ) |
|||
|
• |
cluster_type ( str ) |
|||
|
• |
architecture ( str ) |
|||
|
• |
reason ( Optional[str] ) |
toil.realtimeLogger
Implements a real-time UDP-based logging system that user scripts can use for debugging.
Attributes
Classes
Module Contents
toil.realtimeLogger.logger
class toil.realtimeLogger.LoggingDatagramHandler(request,
client_address, server)
Bases: socketserver.BaseRequestHandler
Receive logging messages from the jobs and display them on the leader.
Uses bare JSON
message encoding.
handle()
Handle a single message. SocketServer takes care of splitting out the messages.
Messages are
JSON-encoded logging module records.
Return type
None
class toil.realtimeLogger.JSONDatagramHandler(host, port)
Bases: logging.handlers.DatagramHandler
Send logging records over UDP serialized as JSON.
They have to
fit in a single UDP datagram, so don't try to log more than
64kb at once.
makePickle(record)
Actually, encode the record as
bare JSON instead.
Parameters
record ( logging.LogRecord )
Return type
bytes
class toil.realtimeLogger.RealtimeLoggerMetaclass
Bases: type
Metaclass for RealtimeLogger that lets add logging methods.
Like
RealtimeLogger.warning(), RealtimeLogger.info(), etc.
__getattr__(name)
Fallback to attributes on the
logger.
Parameters
name ( str )
Return type
Any
class
toil.realtimeLogger.RealtimeLogger(batchSystem,
level=defaultLevel)
Provide a logger that logs over UDP to the leader.
To use in a Toil job, do:
>>>
from toil.realtimeLogger import RealtimeLogger
>>> RealtimeLogger.info("This logging message
goes straight to the leader")
That's all a
user of Toil would need to do. On the leader,
Job.Runner.startToil() automatically starts the UDP server
by using an instance of this class as a context manager.
Parameters
|
• |
batchSystem (- toil.batchSystems.abstractBatchSystem.AbstractBatchSystem ) |
||
|
• |
level ( str ) |
envPrefix =
'TOIL_RT_LOGGING_'
defaultLevel = 'INFO'
|
lock |
loggingServer
= None
serverThread = None
initialized = 0
logger = None
classmethod getLogger()
Get the logger that logs real-time to the leader.
Note that if
the returned logger is used on the leader, you will see the
message twice, since it still goes to the normal log
handlers, too.
Return type
logging.Logger
__enter__()
Return type
None
__exit__(exc_type, exc_val, exc_tb)
Parameters
|
• |
exc_type ( Optional[type[BaseException]] ) |
|||
|
• |
exc_val ( Optional[BaseException] ) |
|||
|
• |
exc_tb ( Optional[types.TracebackType] ) |
Return type
None
toil.resource
Attributes
Exceptions
Classes
Module Contents
toil.resource.logger
class toil.resource.Resource
Bases: namedtuple ( 'Resource' , ( 'name' , 'pathHash' , 'url' , 'contentHash' ))
Represents a file or directory that will be deployed to each node before any jobs in the user script are invoked.
Each instance is a namedtuple with the following elements:
The pathHash element contains the MD5 (in hexdigest form) of the path to the resource on the leader node. The path, and therefore its hash is unique within a job store.
The url element is a "file:" or "http:" URL at which the resource can be obtained.
The contentHash element is an MD5 checksum of the resource, allowing for validation and caching of resources.
If the resource is a regular file, the type attribute will be 'file'.
If the resource
is a directory, the type attribute will be 'dir' and the URL
will point at a ZIP archive of that directory.
resourceEnvNamePrefix = 'JTRES_'
rootDirPathEnvName
classmethod create(jobStore, leaderPath)
Saves the content of the file
or directory at the given path to the given job store and
returns a resource object representing that content for the
purpose of obtaining it again at a generic, public URL. This
method should be invoked on the leader node.
Parameters
|
• |
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore ) |
||
|
• |
leaderPath ( str ) |
Return type
Resource
refresh(jobStore)
Parameters
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore )
Return type
Resource
classmethod prepareSystem()
Prepares this system for the
downloading and lookup of resources. This method should only
be invoked on a worker node. It is idempotent but not
thread-safe.
Return type
None
classmethod cleanSystem()
Remove all downloaded,
localized resources.
Return type
None
register()
Register this resource for
later retrieval via lookup(), possibly in a child process.
Return type
None
classmethod lookup(leaderPath)
Return a resource object representing a resource created from a file or directory at the given path on the leader.
This method
should be invoked on the worker. The given path does not
need to refer to an existing file or directory on the
worker, it only identifies the resource within an instance
of toil. This method returns None if no resource for the
given path exists.
Parameters
leaderPath ( str )
Return type
Optional[ Resource ]
download(callback=None)
Download this resource from its URL to a file on the local system.
This method
should only be invoked on a worker node after the node was
setup for accessing resources via prepareSystem().
Parameters
callback ( Optional[Callable[[str], None]] )
Return type
None
property localPath: str
Abstractmethod
Return type
str
Get the path to resource on the worker.
The file or directory at the returned path may or may not yet exist. Invoking download() will ensure that it does.
property localDirPath: str
The path to the directory
containing the resource on the worker.
Return type
str
pickle()
Return type
str
classmethod unpickle(s)
Parameters
s ( str )
Return type
Resource
class toil.resource.FileResource
Bases: Resource
A resource read
from a file on the leader.
property localPath:
str
Get the path to resource on the worker.
The file or
directory at the returned path may or may not yet exist.
Invoking download() will ensure that it does.
Return type
str
class toil.resource.DirectoryResource
Bases: Resource
A resource read from a directory on the leader.
The URL will
point to a ZIP archive of the directory. All files in that
directory (and any subdirectories) will be included. The
directory may be a package but it does not need to be.
property localPath:
str
Get the path to resource on the worker.
The file or
directory at the returned path may or may not yet exist.
Invoking download() will ensure that it does.
Return type
str
class toil.resource.VirtualEnvResource
Bases: DirectoryResource
A resource read from a virtualenv on the leader.
All modules and packages found in the virtualenv's site-packages directory will be included.
class toil.resource.ModuleDescriptor
Bases: namedtuple ( 'ModuleDescriptor' , ( 'dirPath' , 'name' , 'fromVirtualEnv' ))
A path to a Python module decomposed into a namedtuple of three elements
|
• |
dirPath, the path to the directory that should be added to sys.path before importing the module, |
||
|
• |
moduleName, the fully qualified name of the module with leading package names separated by dot and |
>>>
import toil.resource
>>> ModuleDescriptor.forModule('toil.resource')
ModuleDescriptor(dirPath='/.../src', name='toil.resource',
fromVirtualEnv=False)
>>>
import subprocess, tempfile, os
>>> dirPath = tempfile.mkdtemp()
>>> path = os.path.join( dirPath, 'foo.py' )
>>> with open(path,'w') as f:
... _ = f.write('from toil.resource import
ModuleDescriptor\n'
... 'print(ModuleDescriptor.forModule(__name__))')
>>> subprocess.check_output([ sys.executable, path
])
b"ModuleDescriptor(dirPath='...', name='foo',
fromVirtualEnv=False)\n"
>>>
from shutil import rmtree
>>> rmtree( dirPath )
Now test a collision. 'collections' is part of the standard library in Python 2 and 3. >>> dirPath = tempfile.mkdtemp() >>> path = os.path.join( dirPath, 'collections.py' ) >>> with open(path,'w') as f: ... _ = f.write('from toil.resource import ModuleDescriptorn' ... 'ModuleDescriptor.forModule(__name__)')
This should fail and return exit status 1 due to the collision with the built-in module: >>> subprocess.call([ sys.executable, path ]) 1
Clean up
>>> rmtree( dirPath )
dirPath:
str
name:
str
classmethod forModule(name)
Return an instance of this class representing the module of the given name.
If the given
module name is "__main__", it will be translated
to the actual file name of the top-level script without the
.py or .pyc extension. This method assumes that the module
with the specified name has already been loaded.
Parameters
name ( str )
Return type
ModuleDescriptor
property belongsToToil: bool
True if this module is part of
the Toil distribution
Return type
bool
saveAsResourceTo(jobStore)
Store the file containing this
module--or even the Python package directory hierarchy
containing that file--as a resource to the given job store
and return the corresponding resource object. Should only be
called on a leader node.
Parameters
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore )
Return type
Resource
localize()
Check if this module was saved as a resource.
If it was,
return a new module descriptor that points to a local copy
of that resource. Should only be called on a worker node. On
the leader, this method returns this resource, i.e. self.
Return type
ModuleDescriptor
globalize()
Reverse the effect of
localize().
Return type
ModuleDescriptor
toCommand()
Return type
collections.abc.Sequence [ str ]
classmethod fromCommand(command)
Parameters
command ( collections.abc.Sequence[str] )
Return type
ModuleDescriptor
makeLoadable()
Return type
ModuleDescriptor
|
load() |
Return type
Optional[ types.ModuleType ]
exception toil.resource.ResourceException
Bases: Exception
Common base class for all non-exit exceptions.
toil.server
Submodules
toil.server.api_spec
toil.server.app
Attributes
Functions
Module Contents
toil.server.app.logger
toil.server.app.parser_with_server_options()
Return type
argparse.ArgumentParser
toil.server.app.create_app(args)
Create a
"connexion.FlaskApp" instance with Toil server
configurations.
Parameters
args ( argparse.Namespace )
Return type
connexion.FlaskApp
toil.server.app.start_server(args)
Start a Toil server.
Parameters
args ( argparse.Namespace )
Return type
None
toil.server.celery_app
Attributes
Functions
Module Contents
toil.server.celery_app.create_celery_app()
Return type
celery.Celery
toil.server.celery_app.celery
toil.server.cli
Submodules
toil.server.cli.wes_cwl_runner
Attributes
Classes
Functions
Module Contents
toil.server.cli.wes_cwl_runner.logger
toil.server.cli.wes_cwl_runner.generate_attachment_path_names(paths)
Take in a list of path names and return a list of names with the common path name stripped out, while preserving the input order. This guarantees that there are no relative paths that traverse up.
For example, for the following CWL workflow where "hello.yaml" references a file "message.txt",
˜/toil/workflows/hello.cwl ˜/toil/input_files/hello.yaml ˜/toil/input_files/message.txt
This may be run with the command:
toil-wes-cwl-runner hello.cwl ../input_files/hello.yaml
Where "message.txt" is resolved to "../input_files/message.txt".
We'd send the
workflow file as "workflows/hello.cwl", and send
the inputs as "input_files/hello.yaml" and
"input_files/message.txt".
Parameters
paths ( list[str] ) -- A list of absolute or relative path names. Relative paths are interpreted as relative to the current working directory.
Returns
The common path name and a list of minimal path names.
Return type
tuple [ str , list [ str ]]
class
toil.server.cli.wes_cwl_runner.WESClientWithWorkflowEngineParameters(endpoint,
auth=None)
Bases: wes_client.util.WESClient
A modified version of the WESClient from the wes-service package that includes workflow_engine_parameters support.
TODO: Propose a
PR in wes-service to include workflow_engine_params.
Parameters
|
• |
endpoint ( str ) |
|||
|
• |
auth ( Optional[tuple[str, str]] ) |
get_version(extension, workflow_file)
Determines the version of a
.py, .wdl, or .cwl file.
Parameters
|
• |
extension ( str ) |
|||
|
• |
workflow_file ( str ) |
Return type
str
parse_params(workflow_params_file)
Parse the CWL input file into a
dictionary to be attached to the body of the WES run
request.
Parameters
workflow_params_file ( str ) -- The URL or path to the CWL input file.
Return type
dict [ str , Any]
modify_param_paths(base_dir, workflow_params)
Modify the file paths in the
input workflow parameters to be relative to base_dir.
Parameters
|
• |
base_dir ( str ) -- The base directory to make the file paths relative to. This should be the common ancestor of all attached files, which will become the root of the execution folder. |
||
|
• |
workflow_params ( dict[str, Any] ) -- A dict containing the workflow parameters. |
Return type
None
build_wes_request(workflow_file,
workflow_params_file,
attachments, workflow_engine_parameters=None)
Build the workflow run request
to submit to WES.
Parameters
|
• |
workflow_file ( str ) -- The path or URL to the CWL workflow document. Only file:// URL supported at the moment. |
||
|
• |
workflow_params_file ( Optional[str] ) -- The path or URL to the CWL input file. |
||
|
• |
attachments ( Optional[list[str]] ) -- A list of local paths to files that will be uploaded to the server. |
||
|
• |
workflow_engine_parameters ( Optional[list[str]] ) -- A list of engine parameters to set along with this workflow run. |
Returns
A dictionary of parameters as the body of the request, and an iterable for the pairs of filename and file contents to upload to the server.
Return type
tuple [ dict [ str , str ], collections.abc.Iterable [- tuple [ str , tuple [ str , io.BytesIO ]]]]
run_with_engine_options(workflow_file,
workflow_params_file,
attachments, workflow_engine_parameters)
Composes and sends a post
request that signals the WES server to run a workflow.
Parameters
|
• |
workflow_file ( str ) -- The path to the CWL workflow document. |
||
|
• |
workflow_params_file ( Optional[str] ) -- The path to the CWL input file. |
||
|
• |
attachments ( Optional[list[str]] ) -- A list of local paths to files that will be uploaded to the server. |
||
|
• |
workflow_engine_parameters ( Optional[list[str]] ) -- A list of engine parameters to set along with this workflow run. |
Returns
The body of the post result as a dictionary.
Return type
dict [ str , Any]
toil.server.cli.wes_cwl_runner.get_deps_from_cwltool(cwl_file,
input_file=None)
Return a list of dependencies
of the given workflow from cwltool.
Parameters
|
• |
cwl_file ( str ) -- The CWL file. |
||
|
• |
input_file ( Optional[str] ) -- Omit to get the dependencies from the CWL file. If set, this returns the dependencies from the input file. |
Return type
list [ str ]
toil.server.cli.wes_cwl_runner.submit_run(client,
cwl_file,
input_file=None, engine_options=None)
Given a CWL file, its input files, and an optional list of engine options, submit the CWL workflow to the WES server via the WES client.
This function
also attempts to find the attachments from the CWL workflow
and its input file, and attach them to the WES run request.
Parameters
|
• |
client ( WESClientWithWorkflowEngineParameters ) -- The WES client. |
||
|
• |
cwl_file ( str ) -- The path to the CWL workflow document. |
||
|
• |
input_file ( Optional[str] ) -- The path to the CWL input file. |
||
|
• |
engine_options ( Optional[list[str]] ) -- A list of engine parameters to set along with this workflow run. |
Return type
str
toil.server.cli.wes_cwl_runner.poll_run(client, run_id)
Return True if the given
workflow run is in a finished state.
Parameters
|
• |
client ( WESClientWithWorkflowEngineParameters ) |
|||
|
• |
run_id ( str ) |
Return type
bool
toil.server.cli.wes_cwl_runner.print_logs_and_exit(client, run_id)
Fetch the workflow logs from
the WES server, print the results, then exit the program
with the same exit code as the workflow run.
Parameters
|
• |
client ( WESClientWithWorkflowEngineParameters ) -- The WES client. |
||
|
• |
run_id ( str ) -- The run_id of the target workflow. |
Return type
None
toil.server.cli.wes_cwl_runner.main()
Return type
None
toil.server.utils
Attributes
Classes
Functions
Module Contents
toil.server.utils.HAVE_S3
= True
toil.server.utils.logger
toil.server.utils.get_iso_time()
Return the current time in ISO
8601 format.
Return type
str
toil.server.utils.link_file(src, dest)
Create a link to a file from
src to dest.
Parameters
|
• |
src ( str ) |
|||
|
• |
dest ( str ) |
Return type
None
toil.server.utils.download_file_from_internet(src,
dest,
content_type=None)
Download a file from the
Internet and write it to dest.
Parameters
|
• |
src ( str ) |
|||
|
• |
dest ( str ) |
|||
|
• |
content_type ( Optional[str] ) |
Return type
None
toil.server.utils.download_file_from_s3(src, dest, content_type=None)
Download a file from Amazon S3
and write it to dest.
Parameters
|
• |
src ( str ) |
|||
|
• |
dest ( str ) |
|||
|
• |
content_type ( Optional[str] ) |
Return type
None
toil.server.utils.get_file_class(path)
Return the type of the file as
a human readable string.
Parameters
path ( str )
Return type
str
toil.server.utils.safe_read_file(file)
Safely read a file by acquiring
a shared lock to prevent other processes from writing to it
while reading.
Parameters
file ( str )
Return type
Optional[ str ]
toil.server.utils.safe_write_file(file, s)
Safely write to a file by
acquiring an exclusive lock to prevent other processes from
reading and writing to it while writing.
Parameters
|
• |
file ( str ) |
|||
|
• |
s ( str ) |
Return type
None
class toil.server.utils.MemoryStateCache
An in-memory place to store
workflow state.
get(workflow_id, key)
Get a key value from memory.
Parameters
|
• |
workflow_id ( str ) |
|||
|
• |
key ( str ) |
Return type
Optional[ str ]
set(workflow_id, key, value)
Set or clear a key value in
memory.
Parameters
|
• |
workflow_id ( str ) |
|||
|
• |
key ( str ) |
|||
|
• |
value ( Optional[str] ) |
Return type
None
class toil.server.utils.AbstractStateStore
A place for the WES server to keep its state: the set of workflows that exist and whether they are done or not.
This is a key-value store, with keys namespaced by workflow ID. Concurrent access from multiple threads or processes is safe and globally consistent.
Keys and workflow IDs are restricted to [-a-zA-Z0-9_] , because backends may use them as path or URL components.
Key values are either a string, or None if the key is not set.
Workflow existence isn't a thing; nonexistent workflows just have None for all keys.
Note that we don't yet have a cleanup operation: things are stored permanently. Even clearing all the keys may leave data behind.
Also handles storage for a local cache, with a separate key namespace (not a read/write-through cache).
TODO: Can we
replace this with just using a JobStore eventually, when
AWSJobStore no longer needs SimpleDB?
abstract get(workflow_id, key)
Get the value of the given key
for the given workflow, or None if the key is not set for
the workflow.
Parameters
|
• |
workflow_id ( str ) |
|||
|
• |
key ( str ) |
Return type
Optional[ str ]
abstract set(workflow_id, key, value)
Set the value of the given key
for the given workflow. If the value is None, clear the key.
Parameters
|
• |
workflow_id ( str ) |
|||
|
• |
key ( str ) |
|||
|
• |
value ( Optional[str] ) |
Return type
None
read_cache(workflow_id, key)
Read a value from a local
cache, without checking the actual backend.
Parameters
|
• |
workflow_id ( str ) |
|||
|
• |
key ( str ) |
Return type
Optional[ str ]
write_cache(workflow_id, key, value)
Write a value to a local cache,
without modifying the actual backend.
Parameters
|
• |
workflow_id ( str ) |
|||
|
• |
key ( str ) |
|||
|
• |
value ( Optional[str] ) |
Return type
None
class toil.server.utils.MemoryStateStore
Bases: MemoryStateCache , AbstractStateStore
An in-memory place to store workflow state, for testing.
Inherits from MemoryStateCache first to provide implementations for AbstractStateStore.
class toil.server.utils.FileStateStore(url)
Bases: AbstractStateStore
A place to
store workflow state that uses a POSIX-compatible file
system.
Parameters
url ( str )
get(workflow_id, key)
Get a key value from the
filesystem.
Parameters
|
• |
workflow_id ( str ) |
|||
|
• |
key ( str ) |
Return type
Optional[ str ]
set(workflow_id, key, value)
Set or clear a key value on the
filesystem.
Parameters
|
• |
workflow_id ( str ) |
|||
|
• |
key ( str ) |
|||
|
• |
value ( Optional[str] ) |
Return type
None
class toil.server.utils.S3StateStore(url)
Bases: AbstractStateStore
A place to
store workflow state that uses an S3-compatible object
store.
Parameters
url ( str )
get(workflow_id, key)
Get a key value from S3.
Parameters
|
• |
workflow_id ( str ) |
|||
|
• |
key ( str ) |
Return type
Optional[ str ]
set(workflow_id, key, value)
Set or clear a key value on S3.
Parameters
|
• |
workflow_id ( str ) |
|||
|
• |
key ( str ) |
|||
|
• |
value ( Optional[str] ) |
Return type
None
toil.server.utils.state_store_cache:
dict
[
str
,
AbstractStateStore
]
toil.server.utils.connect_to_state_store(url)
Connect to a place to store state for workflows, defined by a URL.
URL may be a
local file path or URL or an S3 URL.
Parameters
url ( str )
Return type
AbstractStateStore
class toil.server.utils.WorkflowStateStore(state_store, workflow_id)
Slice of a state store for the
state of a particular workflow.
Parameters
|
• |
state_store ( AbstractStateStore ) |
|||
|
• |
workflow_id ( str ) |
get(key)
Get the given item of workflow
state.
Parameters
key ( str )
Return type
Optional[ str ]
set(key, value)
Set the given item of workflow
state.
Parameters
|
• |
key ( str ) |
|||
|
• |
value ( Optional[str] ) |
Return type
None
read_cache(key)
Read a value from a local
cache, without checking the actual backend.
Parameters
key ( str )
Return type
Optional[ str ]
write_cache(key, value)
Write a value to a local cache,
without modifying the actual backend.
Parameters
|
• |
key ( str ) |
|||
|
• |
value ( Optional[str] ) |
Return type
None
toil.server.utils.connect_to_workflow_state_store(url, workflow_id)
Connect to a place to store
state for the given workflow, in the state store defined by
the given URL.
Parameters
|
• |
url ( str ) -- A URL that can be used for connect_to_state_store() |
||
|
• |
workflow_id ( str ) |
Return type
WorkflowStateStore
toil.server.utils.TERMINAL_STATES
toil.server.utils.MAX_CANCELING_SECONDS = 30
class toil.server.utils.WorkflowStateMachine(store)
Class for managing the WES workflow state machine.
This is the authority on the WES "state" of a workflow. You need one to read or change the state.
Guaranteeing that only certain transitions can be observed is possible but not worth it. Instead, we just let updates clobber each other and grab and cache the first terminal state we see forever. If it becomes important that clients never see e.g. CANCELED -> COMPLETE or COMPLETE -> SYSTEM_ERROR, we can implement a real distributed state machine here.
We do handle making sure that tasks don't get stuck in CANCELING.
State can be:
"UNKNOWN" "QUEUED" "INITIALIZING" "RUNNING" "PAUSED" "COMPLETE" "EXECUTOR_ERROR" "SYSTEM_ERROR" "CANCELED" "CANCELING"
Uses the state
store's local cache to prevent needing to read things we've
seen already.
Parameters
store ( WorkflowStateStore )
send_enqueue()
Send an enqueue message that
would move from UNKNOWN to QUEUED.
Return type
None
send_initialize()
Send an initialize message that
would move from QUEUED to INITIALIZING.
Return type
None
send_run()
Send a run message that would
move from INITIALIZING to RUNNING.
Return type
None
send_cancel()
Send a cancel message that
would move to CANCELING from any non-terminal state.
Return type
None
send_canceled()
Send a canceled message that
would move to CANCELED from CANCELLING.
Return type
None
send_complete()
Send a complete message that
would move from RUNNING to COMPLETE.
Return type
None
send_executor_error()
Send an executor_error message
that would move from QUEUED, INITIALIZING, or RUNNING to
EXECUTOR_ERROR.
Return type
None
send_system_error()
Send a system_error message
that would move from QUEUED,
INITIALIZING,
or RUNNING to SYSTEM_ERROR.
Return type
None
get_current_state()
Get the current state of the
workflow.
Return type
str
toil.server.wes
Submodules
toil.server.wes.abstract_backend
Attributes
Exceptions
Classes
Functions
Module Contents
toil.server.wes.abstract_backend.logger
toil.server.wes.abstract_backend.TaskLog
exception
toil.server.wes.abstract_backend.VersionNotImplementedException(wf_type,
version=None, supported_versions=None)
Bases: Exception
Raised when the
requested workflow version is not implemented.
Parameters
|
• |
wf_type ( str ) |
|||
|
• |
version ( Optional[str] ) |
|||
|
• |
supported_versions ( Optional[list[str]] ) |
exception
toil.server.wes.abstract_backend.MalformedRequestException(message)
Bases: Exception
Raised when the
request is malformed.
Parameters
message ( str )
exception toil.server.wes.abstract_backend.WorkflowNotFoundException
Bases: Exception
Raised when the requested run ID is not found.
exception
toil.server.wes.abstract_backend.WorkflowConflictException(run_id)
Bases: Exception
Raised when the
requested workflow is not in the expected state.
Parameters
run_id ( str )
exception toil.server.wes.abstract_backend.OperationForbidden(message)
Bases: Exception
Raised when the
request is forbidden.
Parameters
message ( str )
exception
toil.server.wes.abstract_backend.WorkflowExecutionException(message)
Bases: Exception
Raised when an
internal error occurred during the execution of the
workflow.
Parameters
message ( str )
toil.server.wes.abstract_backend.handle_errors(func)
This decorator catches errors
from the wrapped function and returns a JSON formatted error
message with the appropriate status code defined by the
GA4GH WES spec.
Parameters
func ( Callable[Ellipsis, Any] )
Return type
Callable[Ellipsis, Any]
class toil.server.wes.abstract_backend.WESBackend(options)
A class to represent a GA4GH
Workflow Execution Service (WES) API backend. Intended to be
inherited. Subclasses should implement all abstract methods
to handle user requests when they hit different endpoints.
Parameters
options ( list[str] )
options
resolve_operation_id(operation_id)
Map an operationId defined in
the OpenAPI or swagger yaml file to a function.
Parameters
operation_id ( str ) -- The operation ID defined in the specification.
Returns
A function that should be called when the given endpoint is reached.
Return type
Any
abstract get_service_info()
Get information about the Workflow Execution Service.
GET
/service-info
Return type
dict [ str , Any]
abstract list_runs(page_size=None, page_token=None)
List the workflow runs.
GET /runs
Parameters
|
• |
page_size ( Optional[int] ) |
|||
|
• |
page_token ( Optional[str] ) |
Return type
dict [ str , Any]
abstract run_workflow()
Run a workflow. This endpoint creates a new workflow run and returns a RunId to monitor its progress.
POST /runs
Return type
dict [ str , str ]
abstract get_run_log(run_id)
Get detailed info about a workflow run.
GET
/runs/{run_id}
Parameters
run_id ( str )
Return type
dict [ str , Any]
abstract cancel_run(run_id)
Cancel a running workflow.
POST
/runs/{run_id}/cancel
Parameters
run_id ( str )
Return type
dict [ str , str ]
abstract get_run_status(run_id)
Get quick status info about a workflow run, returning a simple result with the overall state of the workflow run.
GET
/runs/{run_id}/status
Parameters
run_id ( str )
Return type
dict [ str , str ]
static log_for_run(run_id, message)
Parameters
|
• |
run_id ( Optional[str] ) |
|||
|
• |
message ( str ) |
Return type
None
static secure_path(path)
Parameters
path ( str )
Return type
str
collect_attachments(run_id, temp_dir)
Collect attachments from the
current request by staging uploaded files to temp_dir, and
return the temp_dir and parsed body of the request.
Parameters
|
• |
run_id ( Optional[str] ) -- The run ID for logging. |
||
|
• |
temp_dir ( Optional[str] ) -- The directory where uploaded files should be staged. If None, a temporary directory is created. |
Return type
tuple [ str , dict [ str , Any]]
toil.server.wes.amazon_wes_utils
Attributes
Classes
Functions
Module Contents
toil.server.wes.amazon_wes_utils.logger
toil.server.wes.amazon_wes_utils.NOTICE =
Multiline-String
"""
Copyright Amazon.com, Inc. or its affiliates. All Rights
Reserved.
"""
class toil.server.wes.amazon_wes_utils.WorkflowPlan
Bases: TypedDict
These functions
pass around dicts of a certain type, with
data
and
files
keys.
data:
DataDict
files:
FilesDict
class toil.server.wes.amazon_wes_utils.DataDict
Bases: TypedDict
Under
data
, there can be: *
workflowUrl
(required if
no
workflowSource
): URL to main workflow code.
workflowUrl:
str
class toil.server.wes.amazon_wes_utils.FilesDict
Bases: TypedDict
Under
files
, there can be: *
workflowSource
(required if no
workflowUrl
): Open binary-mode file
for the main workflow code. *
workflowInputFiles
:
List of open binary-mode file for input files. Expected to
be JSONs. *
workflowOptions
: Open binary-mode file
for a JSON of options sent along with the workflow. *
workflowDependencies
: Open binary-mode file for the
zip the workflow came in, if any.
workflowSource: IO[
bytes
]
workflowInputFiles:
list
[IO[
bytes
]]
workflowOptions: IO[
bytes
]
workflowDependencies: IO[
bytes
]
toil.server.wes.amazon_wes_utils.parse_workflow_zip_file(file,
workflow_type)
Processes a workflow zip bundle
Parameters
|
• |
file ( str ) -- String or Path-like path to a workflow.zip file |
||
|
• |
workflow_type ( str ) -- String, extension of workflow to expect (e.g. "wdl") |
Return type
dict of data and files
If the zip only contains a single file, that file is set as workflowSource
If the zip
contains multiple files with a MANIFEST.json file, the
MANIFEST is used to determine appropriate
data
and
file
arguments. (See: parse_workflow_manifest_file())
If the zip contains multiple files without a MANIFEST.json
file:
|
• |
a main workflow file with an extension matching the workflow_type is expected and will be set as workflowSource |
||
|
• |
optionally, if inputs*.json files are found in the root level of the zip, they will be set as workflowInputs(_d)* in the order they are found |
||
|
• |
optionally, if an options.json file is found in the root level of the zip, it will be set as workflowOptions |
If the zip contains multiple files, the original zip is set as workflowDependencies
toil.server.wes.amazon_wes_utils.parse_workflow_manifest_file(manifest_file)
Reads a MANIFEST.json file for
a workflow zip bundle
Parameters
manifest_file ( str ) -- String or Path-like path to a MANIFEST.json file
Return type
dict of data and files
MANIFEST.json is expected to be formatted like:
{
"mainWorkflowURL":
"relpath/to/workflow",
"inputFileURLs": [
"relpath/to/input-file-1",
"relpath/to/input-file-2",
"relpath/to/input-file-3"
],
"optionsFileURL":
"relpath/to/option-file"
}
The mainWorkflowURL property that provides a relative file path in the zip to a workflow file, which will be set as workflowSource
The inputFileURLs property is optional and provides a list of relative file paths in the zip to input.json files. The list is assumed to be in the order the inputs should be applied - e.g. higher list index is higher priority. If present, it will be used to set workflowInputs(_d) arguments.
The optionsFileURL property is optional and provides a relative file path in the zip to an options.json file. If present, it will be used to set workflowOptions .
toil.server.wes.amazon_wes_utils.workflow_manifest_url_to_path(url,
parent_dir=None)
Interpret a possibly-relative
parsed URL, relative to the given parent directory.
Parameters
|
• |
url ( urllib.parse.ParseResult ) |
|||
|
• |
parent_dir ( Optional[str] ) |
Return type
str
toil.server.wes.amazon_wes_utils.task_filter(task, job_status)
AGC requires task names to be annotated with an AWS Batch job ID that they were run under. If it encounters an un-annotated task name, it will crash. See <- https://github.com/aws/amazon-genomics-cli/issues/494 >.
This encodes
the AWSBatchJobID annotation, from the
AmazonBatchBatchSystem, into the task name of the given
task, and returns the modified task. If no such annotation
is available, the task is censored and None is returned.
Parameters
|
• |
task ( toil.server.wes.abstract_backend.TaskLog ) |
|||
|
• |
job_status ( toil.bus.JobStatus ) |
Return type
Optional[toil.server.wes.abstract_backend.TaskLog]
toil.server.wes.tasks
Attributes
Classes
Functions
Module Contents
toil.server.wes.tasks.logger
toil.server.wes.tasks.WAIT_FOR_DEATH_TIMEOUT = 20
class
toil.server.wes.tasks.ToilWorkflowRunner(base_scratch_dir,
state_store_url, workflow_id, request,
engine_options)
A class to represent a workflow runner to run the requested workflow.
Responsible for
parsing the user request into a shell command, executing
that command, and collecting the outputs of the resulting
workflow run.
Parameters
|
• |
base_scratch_dir ( str ) |
|||
|
• |
state_store_url ( str ) |
|||
|
• |
workflow_id ( str ) |
|||
|
• |
request ( dict[str, Any] ) |
|||
|
• |
engine_options ( list[str] ) |
scratch_dir
|
store |
state_machine
request
engine_options
wf_type:
str
version:
str
exec_dir
out_dir
default_job_store
job_store
write_scratch_file(filename, contents)
Write a file to the scratch
directory.
Parameters
|
• |
filename ( str ) |
|||
|
• |
contents ( str ) |
Return type
None
get_state()
Return type
str
write_workflow(src_url)
Fetch the workflow file from
its source and write it to a destination file.
Parameters
src_url ( str )
Return type
str
sort_options(workflow_engine_parameters=None)
Sort the command line arguments
in the order that can be recognized by the workflow
execution engine.
Parameters
workflow_engine_parameters ( Optional[dict[str, Optional[str]]] ) -- User-specified parameters for this particular workflow. Keys are command-line options, and values are option arguments, or None for options that are flags.
Return type
list [ str ]
initialize_run()
Write workflow and input files
and construct a list of shell commands to be executed.
Return that list of shell commands that should be executed
in order to complete this workflow run.
Return type
list [ str ]
call_cmd(cmd, cwd)
Calls a command with Popen.
Writes stdout, stderr, and the command to separate files.
Parameters
|
• |
cmd ( Union[list[str], str] ) |
|||
|
• |
cwd ( str ) |
Return type
subprocess.Popen [ bytes ]
|
run() |
Construct a command to run a the requested workflow with the options, run it, and deposit the outputs in the output directory. |
Return type
None
write_output_files()
Fetch all the files that this
workflow generated and output information about them to
outputs.json
.
Return type
None
toil.server.wes.tasks.run_wes_task(base_scratch_dir,
state_store_url,
workflow_id, request, engine_options)
Run a requested workflow.
Parameters
|
• |
base_scratch_dir ( str ) -- Directory where the workflow's scratch dir will live, under the workflow's ID. |
||
|
• |
state_store_url ( str ) -- URL/path at which the server and Celery task communicate about workflow state. |
||
|
• |
workflow_id ( str ) -- ID of the workflow run. |
||
|
• |
request ( dict[str, Any] ) |
||
|
• |
engine_options ( list[str] ) |
Returns
the state of the workflow run.
Return type
str
toil.server.wes.tasks.run_wes
toil.server.wes.tasks.cancel_run(task_id)
Send a SIGTERM signal to the
process that is running task_id.
Parameters
task_id ( str )
Return type
None
class toil.server.wes.tasks.TaskRunner
Abstraction over the Celery API. Runs our run_wes task and allows canceling it.
We can swap
this out in the server to allow testing without Celery.
static run(args, task_id)
Run the given task args with
the given ID on Celery.
Parameters
|
• |
args ( tuple[str, str, str, dict[str, Any], list[str]] ) |
||
|
• |
task_id ( str ) |
Return type
None
static cancel(task_id)
Cancel the task with the given
ID on Celery.
Parameters
task_id ( str )
Return type
None
static is_ok(task_id)
Make sure that the task running
system is working for the given task. If the task system has
detected an internal failure, return False.
Parameters
task_id ( str )
Return type
bool
class toil.server.wes.tasks.MultiprocessingTaskRunner
Bases: TaskRunner
Version of TaskRunner that just runs tasks with Multiprocessing.
Can't use
threading because there's no way to send a cancel signal or
exception to a Python thread, if loops in the task (i.e.
ToilWorkflowRunner) don't poll for it.
static set_up_and_run_task(output_path, args)
Set up logging for the process into the given file and then call run_wes_task with the given arguments.
If the process
finishes successfully, it will clean up the log, but if the
process crashes, the caller must clean up the log.
Parameters
|
• |
output_path ( str ) |
||
|
• |
args ( tuple[str, str, str, dict[str, Any], list[str]] ) |
Return type
None
classmethod run(args, task_id)
Run the given task args with
the given ID.
Parameters
|
• |
args ( tuple[str, str, str, dict[str, Any], list[str]] ) |
||
|
• |
task_id ( str ) |
Return type
None
classmethod cancel(task_id)
Cancel the task with the given
ID.
Parameters
task_id ( str )
Return type
None
classmethod is_ok(task_id)
Make sure that the task running
system is working for the given task. If the task system has
detected an internal failure, return False.
Parameters
task_id ( str )
Return type
bool
toil.server.wes.toil_backend
Attributes
Classes
Module Contents
toil.server.wes.toil_backend.logger
class
toil.server.wes.toil_backend.ToilWorkflow(base_work_dir,
state_store_url, run_id)
Parameters
|
• |
base_work_dir ( str ) |
|||
|
• |
state_store_url ( str ) |
|||
|
• |
run_id ( str ) |
base_scratch_dir
state_store_url
|
run_id |
scratch_dir
exec_dir
|
store |
state_machine
fetch_state(key:
str
, default:
str
) ->
str
fetch_state(key:
str
, default:
None
= None) ->
str
|
None
Return the contents of the given key in the workflow's state store. If the key does not exist, the default value is returned.
fetch_scratch(filename)
Get a context manager for
either a stream for the given file from the workflow's
scratch directory, or None if it isn't there.
Parameters
filename ( str )
Return type
collections.abc.Generator [Optional[TextIO], None, None]
exists()
Return True if the workflow run
exists.
Return type
bool
get_state()
Return the state of the current
run.
Return type
str
check_on_run(task_runner)
Check to make sure nothing has
gone wrong in the task runner for this workflow. If
something has, log, and fail the workflow with an error.
Parameters
task_runner ( type[- toil.server.wes.tasks.TaskRunner] )
Return type
None
set_up_run()
Set up necessary directories
for the run.
Return type
None
clean_up()
Clean directory and files
related to the run.
Return type
None
queue_run(task_runner, request, options)
This workflow should be ready
to run. Hand this to the task system.
Parameters
|
• |
task_runner ( type[- toil.server.wes.tasks.TaskRunner] ) |
||
|
• |
request ( dict[str, Any] ) |
||
|
• |
options ( list[str] ) |
Return type
None
get_output_files()
Return a collection of output
files that this workflow generated.
Return type
Any
get_stdout_path()
Return the path to the standard
output log, relative to the run's scratch_dir, or None if it
doesn't exist.
Return type
Optional[ str ]
get_stderr_path()
Return the path to the standard
output log, relative to the run's scratch_dir, or None if it
doesn't exist.
Return type
Optional[ str ]
get_messages_path()
Return the path to the bus
message log, relative to the run's scratch_dir, or None if
it doesn't exist.
Return type
Optional[ str ]
get_task_logs(filter_function=None)
Return all the task log objects for the individual tasks in the workflow.
Task names will
be the job_type values from issued/completed/failed
messages, with annotations from JobAnnotationMessage
messages if available.
Parameters
filter_function ( Optional[Callable[[toil.server.wes.abstract_backend.TaskLog, toil.bus.JobStatus], Optional[toil.server.wes.abstract_backend.TaskLog]]] ) -- If set, will be called with each task log and its job annotations. Returns a modified copy of the task log to actually report, or None if the task log should be omitted.
Return type
list [ dict [ str , Union[ str , int , None]]]
class
toil.server.wes.toil_backend.ToilBackend(work_dir,
state_store,
options, dest_bucket_base, bypass_celery=False,
wes_dialect='standard')
Bases: toil.server.wes.abstract_backend.WESBackend
WES backend
implemented for Toil to run CWL, WDL, or Toil workflows.
This class is responsible for validating and executing
submitted workflows.
Parameters
|
• |
work_dir ( str ) |
|||
|
• |
state_store ( Optional[str] ) |
|||
|
• |
options ( list[str] ) |
|||
|
• |
dest_bucket_base ( Optional[str] ) |
|||
|
• |
bypass_celery ( bool ) |
|||
|
• |
wes_dialect ( str ) |
run_id_prefix = 'run-'
task_runner
wes_dialect
dest_bucket_base
work_dir
supported_versions
get_runs()
A generator of a list of run
ids and their state.
Return type
collections.abc.Generator [ tuple [ str , str ], None, None]
get_state(run_id)
Return the state of the
workflow run with the given run ID. May raise an error if
the workflow does not exist.
Parameters
run_id ( str )
Return type
str
get_service_info()
Get information about the
Workflow Execution Service.
Return type
dict [ str , Any]
list_runs(page_size=None, page_token=None)
List the workflow runs.
Parameters
|
• |
page_size ( Optional[int] ) |
|||
|
• |
page_token ( Optional[str] ) |
Return type
dict [ str , Any]
run_workflow()
Run a workflow.
Return type
dict [ str , str ]
get_run_log(run_id)
Get detailed info about a
workflow run.
Parameters
run_id ( str )
Return type
dict [ str , Any]
cancel_run(run_id)
Cancel a running workflow.
Parameters
run_id ( str )
Return type
dict [ str , str ]
get_run_status(run_id)
Get quick status info about a
workflow run, returning a simple result with the overall
state of the workflow run.
Parameters
run_id ( str )
Return type
dict [ str , str ]
get_stdout(run_id)
Get the stdout of a workflow
run as a static file.
Parameters
run_id ( str )
Return type
Any
get_stderr(run_id)
Get the stderr of a workflow
run as a static file.
Parameters
run_id ( str )
Return type
Any
get_health()
Return successfully if the
server is healthy.
Return type
werkzeug.wrappers.response.Response
get_homepage()
Provide a sensible result for /
other than 404.
Return type
werkzeug.wrappers.response.Response
toil.server.wsgi_app
Classes
Functions
Module Contents
class toil.server.wsgi_app.GunicornApplication(app, options=None)
Bases: gunicorn.app.base.BaseApplication
An entry point to integrate a Gunicorn WSGI server in Python. To start a WSGI application with callable app , run the following code:
WSGIApplication(app,
options={
...
}).run()
For more
details, see: -
https://docs.gunicorn.org/en/latest/custom.html
Parameters
|
• |
app ( object ) |
|||
|
• |
options ( Optional[dict[str, Any]] ) |
options
application
init(*args)
Parameters
args ( Any )
Return type
None
load_config()
Return type
None
|
load() |
Return type
object
toil.server.wsgi_app.run_app(app, options=None)
Run a Gunicorn WSGI server.
Parameters
|
• |
app ( object ) |
|||
|
• |
options ( Optional[dict[str, Any]] ) |
Return type
None
toil.serviceManager
Attributes
Classes
Module Contents
toil.serviceManager.logger
class toil.serviceManager.ServiceManager(job_store,
toil_state)
Manages the scheduling of
services.
Parameters
|
• |
job_store (- toil.jobStores.abstractJobStore.AbstractJobStore ) |
||
|
• |
toil_state ( toil.toilState.ToilState ) |
services_are_starting(job_id)
Check if services are being
started.
Returns
True if the services for the given job are currently being started, and False otherwise.
Parameters
job_id ( str )
Return type
bool
get_job_count()
Get the total number of jobs we are working on.
(services and
their parent non-service jobs)
Return type
int
start()
Start the service scheduling
thread.
Return type
None
put_client(client_id)
Schedule the services of a job asynchronously.
When the job's
services are running the ID for the job will be returned by
toil.leader.ServiceManager.get_ready_client.
Parameters
client_id ( str ) -- ID of job with services to schedule.
Return type
None
get_ready_client(maxWait)
Fetch a ready client, waiting
as needed.
Parameters
maxWait ( float ) -- Time in seconds to wait to get a JobDescription before returning
Returns
the ID of a client whose services are running, or None if no such job is available.
Return type
Optional[ str ]
get_unservable_client(maxWait)
Fetch a client whos services
failed to start.
Parameters
maxWait ( float ) -- Time in seconds to wait to get a JobDescription before returning
Returns
the ID of a client whose services failed to start, or None if no such job is available.
Return type
Optional[ str ]
get_startable_service(maxWait)
Fetch a service job that is
ready to start.
Parameters
maxWait ( float ) -- Time in seconds to wait to get a job before returning.
Returns
the ID of a service job that the leader can start, or None if no such job exists.
Return type
Optional[ str ]
kill_services(service_ids, error=False)
Stop all the given service
jobs.
Parameters
|
• |
services -- Service jobStoreIDs to kill |
||
|
• |
error ( bool ) -- Whether to signal that the service failed with an error when stopping it. |
||
|
• |
service_ids ( collections.abc.Iterable[str] ) |
Return type
None
is_active(service_id)
Return true if the service job
has not been told to terminate.
Parameters
service_id ( str ) -- Service to check on
Return type
bool
is_running(service_id)
Return true if the service job
has started and is active.
Parameters
|
• |
service -- Service to check on |
|||
|
• |
service_id ( str ) |
Return type
bool
check()
Check on the service manager thread.
|
Raises |
RuntimeError -- If the underlying thread has quit. |
Return type
None
shutdown()
Terminate worker threads cleanly; starting and killing all service threads.
Will block
until all services are started and blocked.
Return type
None
toil.statsAndLogging
Attributes
Classes
Functions
Module Contents
toil.statsAndLogging.logger
toil.statsAndLogging.root_logger
toil.statsAndLogging.toil_logger
toil.statsAndLogging.DEFAULT_LOGLEVEL
toil.statsAndLogging.TRACE
class toil.statsAndLogging.StatsAndLogging(jobStore,
config)
A thread to aggregate
statistics and logging.
Parameters
|
• |
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore ) |
||
|
• |
config ( toil.common.Config ) |
start()
Start the stats and logging
thread.
Return type
None
classmethod formatLogStream(stream, stream_name)
Given a stream of text or bytes, and the job name, job itself, or some other optional stringifyable identity info for the job, return a big text string with the formatted job log, suitable for printing for the user.
We don't want
to prefix every line of the job's log with our own logging
info, or we get prefixes wider than any reasonable terminal
and longer than the messages.
Parameters
|
• |
stream ( Union[IO[str], IO[bytes]] ) -- The stream of text or bytes to print for the user. |
||
|
• |
stream_name ( str ) |
Return type
str
classmethod
logWithFormatting(stream_name, jobLogs,
method=logger.debug, message=None)
Parameters
|
• |
stream_name ( str ) |
|||
|
• |
jobLogs ( Union[IO[str], IO[bytes]] ) |
|||
|
• |
method ( Callable[[str], None] ) |
|||
|
• |
message ( Optional[str] ) |
Return type
None
classmethod
writeLogFiles(jobNames, jobLogList, config,
failed=False)
Parameters
|
• |
jobNames ( list[str] ) |
|||
|
• |
jobLogList ( list[str] ) |
|||
|
• |
config ( toil.common.Config ) |
|||
|
• |
failed ( bool ) |
Return type
None
classmethod statsAndLoggingAggregator(jobStore, stop, config)
The following function is used
for collating stats/reporting log messages from the workers.
Works inside of a thread, collates as long as the stop flag
is not True.
Parameters
|
• |
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore ) |
||
|
• |
stop ( threading.Event ) |
||
|
• |
config ( toil.common.Config ) |
Return type
None
check()
Check on the stats and logging
aggregator. :raise RuntimeError: If the underlying thread
has quit.
Return type
None
shutdown()
Finish up the stats/logging
aggregation thread.
Return type
None
toil.statsAndLogging.set_log_level(level, set_logger=None)
Sets the root logger level to a
given string level (like "INFO").
Parameters
|
• |
level ( str ) |
|||
|
• |
set_logger ( Optional[logging.Logger] ) |
Return type
None
toil.statsAndLogging.install_log_color(set_logger=None)
Make logs colored.
Parameters
set_logger ( Optional[logging.Logger] )
Return type
None
toil.statsAndLogging.add_logging_options(parser, default_level=None)
Add logging options to set the
global log level.
Parameters
|
• |
default_level ( Optional[int] ) -- A logging level, like logging.INFO, to use as the default. |
||
|
• |
parser ( argparse.ArgumentParser ) |
Return type
None
toil.statsAndLogging.configure_root_logger()
Set up the root logger with handlers and formatting.
Should be
called before any entry point tries to log anything, to
ensure consistent formatting.
Return type
None
toil.statsAndLogging.log_to_file(log_file, log_rotation)
Parameters
|
• |
log_file ( Optional[str] ) |
|||
|
• |
log_rotation ( bool ) |
Return type
None
toil.statsAndLogging.set_logging_from_options(options)
Parameters
options ( Union[toil.common.Config, argparse.Namespace] )
Return type
None
toil.statsAndLogging.suppress_exotic_logging(local_logger)
Attempts to suppress the loggers of all non-Toil packages by setting them to CRITICAL.
For example: 'requests_oauthlib', 'google', 'boto', 'websocket', 'oauthlib', etc.
This will only suppress loggers that have already been instantiated and can be seen in the environment, except for the list declared in "always_suppress".
This is
important because some packages, particularly boto3, are not
always instantiated yet in the environment when this is run,
and so we create the logger and set the level preemptively.
Parameters
local_logger ( str )
Return type
None
toil.test
Base testing class for Toil.
Submodules
toil.test.batchSystems
Submodules
toil.test.batchSystems.batchSystemTest
Attributes
Classes
Functions
Module Contents
toil.test.batchSystems.batchSystemTest.logger
toil.test.batchSystems.batchSystemTest.numCores = 2
toil.test.batchSystems.batchSystemTest.preemptible = False
toil.test.batchSystems.batchSystemTest.defaultRequirements
class
toil.test.batchSystems.batchSystemTest.BatchSystemPluginTest(methodName='runTest')
Bases: toil.test.ToilTest
Class for
testing batch system plugin functionality.
setUp()
Hook method for setting up the test fixture before exercising it.
tearDown()
Hook method for deconstructing the test fixture after testing it.
test_add_batch_system_factory()
class toil.test.batchSystems.batchSystemTest.hidden
Hide abstract base class from unittest's test case loader
-
http://stackoverflow.com/questions/1323455/python-unit-test-with-base-and-sub-class#answer-25695512
class AbstractBatchSystemTest(methodName='runTest')
Bases: toil.test.ToilTest
A base test case with generic tests that every batch system should pass.
Cannot assume
that the batch system actually executes commands on the
local machine/filesystem.
abstract createBatchSystem()
Return type
toil.batchSystems.abstractBatchSystem.AbstractBatchSystem
supportsWallTime()
classmethod createConfig()
Returns a dummy config for the
batch system tests. We need a workflowID to be set up since
we are running tests without setting up a jobstore. This is
the class version to be used when an instance is not
available.
Return type
toil.common.Config
classmethod setUpClass()
Hook method for setting up class fixture before running tests in the class.
setUp()
Hook method for setting up the test fixture before exercising it.
tearDown()
Hook method for deconstructing the test fixture after testing it.
get_max_startup_seconds()
Get the number of seconds this
test ought to wait for the first job to run. Some batch
systems may need time to scale up.
Return type
int
test_available_cores()
test_run_jobs()
test_set_env()
test_set_job_env()
Test the mechanism for setting per-job environment variables to batch system jobs.
testCheckResourceRequest()
testScalableBatchSystem()
class AbstractBatchSystemJobTest(methodName='runTest')
Bases: toil.test.ToilTest
An abstract
base class for batch system tests that use a full Toil
workflow rather than using the batch system directly.
cpuCount
allocatedCores
sleepTime = 30
abstract getBatchSystemName()
Return type
( str , AbstractBatchSystem )
getOptions(tempDir)
Configures options for Toil workflow and makes job store. :param str tempDir: path to test directory :return: Toil options object
setUp()
Hook method for setting up the test fixture before exercising it.
tearDown()
Hook method for deconstructing the test fixture after testing it.
testJobConcurrency()
Tests that the batch system is allocating core resources properly for concurrent tasks.
test_omp_threads()
Test if the OMP_NUM_THREADS env var is set correctly based on jobs.cores.
class AbstractGridEngineBatchSystemTest(methodName='runTest')
Bases: AbstractBatchSystemTest
An abstract class to reduce redundancy between Grid Engine, Slurm, and other similar batch systems
class
toil.test.batchSystems.batchSystemTest.KubernetesBatchSystemTest(methodName='runTest')
Bases: hidden
Tests against
the Kubernetes batch system
supportsWallTime()
createBatchSystem()
class
toil.test.batchSystems.batchSystemTest.KubernetesBatchSystemBenchTest(methodName='runTest')
Bases: toil.test.ToilTest
Kubernetes
batch system unit tests that don't need to actually talk to
a cluster.
test_preemptability_constraints()
Make sure we generate the right preemptability constraints.
test_label_constraints()
Make sure we generate the right preemptability constraints.
class
toil.test.batchSystems.batchSystemTest.AWSBatchBatchSystemTest(methodName='runTest')
Bases: hidden
Tests against
the AWS Batch batch system
supportsWallTime()
createBatchSystem()
get_max_startup_seconds()
Get the number of seconds this
test ought to wait for the first job to run. Some batch
systems may need time to scale up.
Return type
int
class
toil.test.batchSystems.batchSystemTest.MesosBatchSystemTest(methodName='runTest')
Bases: hidden , toil.batchSystems.mesos.test.MesosTestSupport
Tests against
the Mesos batch system
classmethod createConfig()
needs to set mesos_endpoint to localhost for testing since the default is now the private IP address
supportsWallTime()
createBatchSystem()
tearDown()
Hook method for deconstructing the test fixture after testing it.
testIgnoreNode()
toil.test.batchSystems.batchSystemTest.write_temp_file(s, temp_dir)
Dump a string into a temp file
and return its path.
Parameters
|
• |
s ( str ) |
|||
|
• |
temp_dir ( str ) |
Return type
str
class
toil.test.batchSystems.batchSystemTest.SingleMachineBatchSystemTest(methodName='runTest')
Bases: hidden
Tests against
the single-machine batch system
supportsWallTime()
Return type
bool
createBatchSystem()
Return type
toil.batchSystems.abstractBatchSystem.AbstractBatchSystem
testProcessEscape(hide=False)
Test to make sure that child processes and their descendants go away when the Toil workflow stops.
If hide is
true, will try and hide the child processes to make them
hard to stop.
Parameters
hide ( bool )
Return type
None
testHidingProcessEscape()
Test to make sure that child processes and their descendants go away when the Toil workflow stops, even if the job process stops and leaves children.
class
toil.test.batchSystems.batchSystemTest.MaxCoresSingleMachineBatchSystemTest(methodName='runTest')
Bases: toil.test.ToilTest
This test
ensures that single machine batch system doesn't exceed the
configured number cores
classmethod setUpClass()
Hook method for setting up
class fixture before running tests in the class.
Return type
None
setUp()
Hook method for setting up the
test fixture before exercising it.
Return type
None
tearDown()
Hook method for deconstructing
the test fixture after testing it.
Return type
None
scriptCommand()
Return type
str
|
test() |
testServices()
toil.test.batchSystems.batchSystemTest.parentJob(job,
cmd)
toil.test.batchSystems.batchSystemTest.childJob(job, cmd)
toil.test.batchSystems.batchSystemTest.grandChildJob(job,
cmd)
toil.test.batchSystems.batchSystemTest.greatGrandChild(cmd)
class
toil.test.batchSystems.batchSystemTest.Service(cmd)
Bases: toil.job.Job.Service
Abstract class used to define the interface to a service.
Should be subclassed by the user to define services.
Is not executed as a job; runs within a ServiceHostJob.
|
cmd |
start(fileStore)
Start the service.
Parameters
job -- The underlying host job that the service is being run in. Can be used to register deferred functions, or to access the fileStore for creating temporary files.
Returns
An object describing how to access the service. The object must be pickleable and will be used by jobs to access the service (see toil.job.Job.addService() ).
check()
Checks the service is still running.
|
Raises |
exceptions.RuntimeError -- If the service failed, this will cause the service job to be labeled failed. |
Returns
True if the service is still running, else False. If False then the service job will be terminated, and considered a success. Important point: if the service job exits due to a failure, it should raise a RuntimeError, not return False!
stop(fileStore)
Stops the service. Function can
block until complete.
Parameters
job -- The underlying host job that the service is being run in. Can be used to register deferred functions, or to access the fileStore for creating temporary files.
class
toil.test.batchSystems.batchSystemTest.GridEngineBatchSystemTest(methodName='runTest')
Bases: hidden
Tests against
the GridEngine batch system
createBatchSystem()
Return type
toil.batchSystems.abstractBatchSystem.AbstractBatchSystem
tearDown()
Hook method for deconstructing the test fixture after testing it.
class
toil.test.batchSystems.batchSystemTest.SlurmBatchSystemTest(methodName='runTest')
Bases: hidden
Tests against
the Slurm batch system
createBatchSystem()
Return type
toil.batchSystems.abstractBatchSystem.AbstractBatchSystem
tearDown()
Hook method for deconstructing the test fixture after testing it.
class
toil.test.batchSystems.batchSystemTest.LSFBatchSystemTest(methodName='runTest')
Bases: hidden
Tests against
the LSF batch system
createBatchSystem()
Return type
toil.batchSystems.abstractBatchSystem.AbstractBatchSystem
class
toil.test.batchSystems.batchSystemTest.TorqueBatchSystemTest(methodName='runTest')
Bases: hidden
Tests against
the Torque batch system
createBatchSystem()
Return type
toil.batchSystems.abstractBatchSystem.AbstractBatchSystem
tearDown()
Hook method for deconstructing the test fixture after testing it.
class
toil.test.batchSystems.batchSystemTest.HTCondorBatchSystemTest(methodName='runTest')
Bases: hidden
Tests against
the HTCondor batch system
createBatchSystem()
Return type
toil.batchSystems.abstractBatchSystem.AbstractBatchSystem
tearDown()
Hook method for deconstructing the test fixture after testing it.
class
toil.test.batchSystems.batchSystemTest.SingleMachineBatchSystemJobTest(methodName='runTest')
Bases: hidden
Tests Toil
workflow against the SingleMachine batch system
getBatchSystemName()
Return type
( str , AbstractBatchSystem )
testConcurrencyWithDisk()
Tests that the batch system is allocating disk resources properly
testNestedResourcesDoNotBlock()
Resources are requested in the order Memory > Cpu > Disk. Test that unavailability of cpus for one job that is scheduled does not block another job that can run.
class
toil.test.batchSystems.batchSystemTest.MesosBatchSystemJobTest(methodName='runTest')
Bases: hidden , toil.batchSystems.mesos.test.MesosTestSupport
Tests Toil
workflow against the Mesos batch system
getOptions(tempDir)
Configures options for Toil workflow and makes job store. :param str tempDir: path to test directory :return: Toil options object
getBatchSystemName()
Return type
( str , AbstractBatchSystem )
tearDown()
Hook method for deconstructing the test fixture after testing it.
toil.test.batchSystems.batchSystemTest.measureConcurrency(filepath,
sleep_time=10)
Run in parallel to determine the number of concurrent tasks. This code was copied from toil.batchSystemTestMaxCoresSingleMachineBatchSystemTest :param str filepath: path to counter file :param int sleep_time: number of seconds to sleep before counting down :return int max concurrency value:
toil.test.batchSystems.batchSystemTest.count(delta, file_path)
Increments counter file and
returns the max number of times the file has been modified.
Counter data must be in the form: concurrent tasks, max
concurrent tasks (counter should be initialized to 0,0)
Parameters
|
• |
delta ( int ) -- increment value |
|||
|
• |
file_path ( str ) -- path to shared counter file |
Return int max concurrent tasks
toil.test.batchSystems.batchSystemTest.getCounters(path)
toil.test.batchSystems.batchSystemTest.resetCounters(path)
toil.test.batchSystems.batchSystemTest.get_omp_threads()
Return type
str
toil.test.batchSystems.batch_system_plugin_test
Attributes
Classes
Module Contents
toil.test.batchSystems.batch_system_plugin_test.logger
class
toil.test.batchSystems.batch_system_plugin_test.FakeBatchSystem(config,
maxCores, maxMemory, maxDisk)
Bases: toil.batchSystems.cleanup_support.BatchSystemCleanupSupport
Adds cleanup
support when the last running job leaves a node, for batch
systems that can't provide it using the backing scheduler.
Parameters
|
• |
config ( toil.common.Config ) |
|||
|
• |
maxCores ( float ) |
|||
|
• |
maxMemory ( int ) |
|||
|
• |
maxDisk ( int ) |
classmethod supportsAutoDeployment()
Whether this batch system supports auto-deployment of the user script itself.
If it does, the setUserScript() can be invoked to set the resource object representing the user script.
Note to
implementors: If your implementation returns True here, it
should also override
Return type
bool
issueBatchJob(command, job_desc, job_environment=None)
Issues a job with the specified
command to the batch system and returns a unique job ID
number.
Parameters
|
• |
command ( str ) -- the command to execute somewhere to run the Toil worker process |
||
|
• |
job_desc ( toil.job.JobDescription ) -- the JobDescription for the job being run |
||
|
• |
job_environment ( Optional[dict[str, str]] ) -- a collection of job-specific environment variables to be set on the worker. |
Returns
a unique job ID number that can be used to reference the newly issued job
Return type
int
killBatchJobs(jobIDs)
Kills the given job IDs. After
returning, the killed jobs will not appear in the results of
getRunningBatchJobIDs. The killed job will not be returned
from getUpdatedBatchJob.
Parameters
jobIDs ( list[int] ) -- list of IDs of jobs to kill
Return type
None
getIssuedBatchJobIDs()
Gets all currently issued jobs
Returns
A list of jobs (as job ID numbers) currently issued (may be running, or may be waiting to be run). Despite the result being a list, the ordering should not be depended upon.
Return type
list [ int ]
getRunningBatchJobIDs()
Gets a map of jobs as job ID
numbers that are currently running (not just waiting) and
how long they have been running, in seconds.
Returns
dictionary with currently running job ID number keys and how many seconds they have been running as the value
Return type
dict [ int , float ]
getUpdatedBatchJob(maxWait)
Returns information about job that has updated its status (i.e. ceased running, either successfully or with an error). Each such job will be returned exactly once.
Does not return
info for jobs killed by killBatchJobs, although they may
cause None to be returned earlier than maxWait.
Parameters
maxWait ( int ) -- the number of seconds to block, waiting for a result
Returns
If a result is available, returns UpdatedBatchJobInfo. Otherwise it returns None. wallTime is the number of seconds (a strictly positive float) in wall-clock time the job ran for, or None if this batch system does not support tracking wall time.
Return type
Optional[- toil.batchSystems.abstractBatchSystem.UpdatedBatchJobInfo ]
shutdown()
Called at the completion of a
toil invocation. Should cleanly terminate all worker
threads.
Return type
None
classmethod add_options(parser)
If this batch system provides
any command line options, add them to the given parser.
Parameters
parser ( configargparse.ArgumentParser )
Return type
None
classmethod setOptions(setOption)
Process command line or
configuration options relevant to this batch system.
Parameters
setOption ( toil.batchSystems.options.OptionSetter ) -- A function with signature setOption(option_name, parsing_function=None, check_function=None, default=None, env=None) returning nothing, used to update run configuration as a side effect.
Return type
None
class
toil.test.batchSystems.batch_system_plugin_test.BatchSystemPluginTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
test_batchsystem_plugin_installable()
Test that installing a batch system plugin works. :return:
toil.test.batchSystems.test_gridengine
Classes
Functions
Module Contents
class toil.test.batchSystems.test_gridengine.FakeBatchSystem
Class that implements a minimal Batch System, needed to create a Worker (see below).
|
config |
getWaitDuration()
with_retries(operation, *args, **kwargs)
The grid engine batch system needs a with_retries function when running the GridEngineThread, so fake one
toil.test.batchSystems.test_gridengine.call_qstat_or_qacct(args,
**_)
class
toil.test.batchSystems.test_gridengine.GridEngineTest(methodName='runTest')
Bases: toil.test.ToilTest
Class for
unit-testing GridEngineBatchSystem
setUp()
Hook method for setting up the test fixture before exercising it.
test_coalesce_job_exit_codes_one_exists()
test_coalesce_job_exit_codes_one_still_running()
test_coalesce_job_exit_codes_many_all_exist()
toil.test.batchSystems.test_lsf_helper
lsfHelper.py shouldn't need a batch system and so the unit tests here should aim to run on any system.
Classes
Module Contents
class
toil.test.batchSystems.test_lsf_helper.LSFHelperTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
test_parse_mem_and_cmd_from_output()
toil.test.batchSystems.test_slurm
Classes
Functions
Module Contents
toil.test.batchSystems.test_slurm.call_sacct(args, **_)
The arguments passed to call_command when executing sacct are: ['sacct', '-n', '-j', '<comma-separated list of job-ids>', '--format', 'JobIDRaw,State,ExitCode', '-P', '-S', '1970-01-01'] The multi-line output is something like:
1234|COMPLETED|0:0
1234.batch|COMPLETED|0:0
1235|PENDING|0:0
1236|FAILED|0:2
1236.extern|COMPLETED|0:0
Return type
str
toil.test.batchSystems.test_slurm.call_scontrol(args, **_)
The arguments passed to
call_command
when executing
scontrol
are:
['scontrol', 'show', 'job']
or
['scontrol',
'show', 'job', '<job-id>']
Return type
str
toil.test.batchSystems.test_slurm.call_sacct_raises(*_)
Fake that the sacct command fails by raising a CalledProcessErrorStderr
class toil.test.batchSystems.test_slurm.FakeBatchSystem
Class that implements a minimal Batch System, needed to create a Worker (see below).
|
config |
getWaitDuration()
class toil.test.batchSystems.test_slurm.SlurmTest(methodName='runTest')
Bases: toil.test.ToilTest
Class for
unit-testing SlurmBatchSystem
setUp()
Hook method for setting up the test fixture before exercising it.
test_getJobDetailsFromSacct_one_exists()
test_getJobDetailsFromSacct_one_not_exists()
test_getJobDetailsFromSacct_many_all_exist()
test_getJobDetailsFromSacct_many_some_exist()
test_getJobDetailsFromSacct_many_none_exist()
test_getJobDetailsFromScontrol_one_exists()
test_getJobDetailsFromScontrol_one_not_exists()
Asking for the job details of a single job that scontrol doesn't know about should raise an exception.
test_getJobDetailsFromScontrol_many_all_exist()
test_getJobDetailsFromScontrol_many_some_exist()
test_getJobDetailsFromScontrol_many_none_exist()
test_getJobExitCode_job_exists()
test_getJobExitCode_job_not_exists()
test_getJobExitCode_sacct_raises_job_exists()
This test forces the use of scontrol to get job information, by letting sacct raise an exception.
test_getJobExitCode_sacct_raises_job_not_exists()
This test forces the use of scontrol to get job information, by letting sacct raise an exception. Next, scontrol should also raise because it doesn't know the job.
test_coalesce_job_exit_codes_one_exists()
test_coalesce_job_exit_codes_one_not_exists()
test_coalesce_job_exit_codes_many_all_exist()
test_coalesce_job_exit_codes_some_exists()
test_coalesce_job_exit_codes_sacct_raises_job_exists()
This test forces the use of scontrol to get job information, by letting sacct raise an exception.
test_coalesce_job_exit_codes_sacct_raises_job_not_exists()
This test forces the use of scontrol to get job information, by letting sacct raise an exception. Next, scontrol should also raise because it doesn't know the job.
toil.test.cactus
Submodules
toil.test.cactus.test_cactus_integration
Classes
Module Contents
class
toil.test.cactus.test_cactus_integration.CactusIntegrationTest(methodName)
Bases: toil.test.provisioners.clusterTest.AbstractClusterTest
Run the Cactus
Integration test on a Kubernetes AWS cluster
clusterName
leaderNodeType = 't2.medium'
clusterType = 'kubernetes'
setUp()
Set up for the test. Must be overridden to call this method and set self.jobStore.
test_cactus_integration()
toil.test.cwl
Submodules
toil.test.cwl.conftest
Attributes
Module Contents
toil.test.cwl.conftest.collect_ignore = ['spec']
toil.test.cwl.cwlTest
Attributes
Classes
Functions
Module Contents
toil.test.cwl.cwlTest.pkg_root
toil.test.cwl.cwlTest.log
toil.test.cwl.cwlTest.CONFORMANCE_TEST_TIMEOUT = 10000
toil.test.cwl.cwlTest.run_conformance_tests(workDir, yml,
runner=None,
caching=False, batchSystem=None, selected_tests=None,
selected_tags=None, skipped_tests=None, extra_args=None,
must_support_all_features=False, junit_file=None)
Run the CWL conformance tests.
Parameters
|
• |
workDir ( str ) -- Directory to run tests in. |
||
|
• |
yml ( str ) -- CWL test list YML to run tests from. |
||
|
• |
runner ( Optional[str] ) -- If set, use this cwl runner instead of the default toil-cwl-runner. |
||
|
• |
caching ( bool ) -- If True, use Toil file store caching. |
||
|
• |
batchSystem ( Optional[str] ) -- If set, use this batch system instead of the default single_machine. |
||
|
• |
selected_tests ( Optional[str] ) -- If set, use this description of test numbers to run (comma-separated numbers or ranges) |
||
|
• |
selected_tags ( Optional[str] ) -- As an alternative to selected_tests, run tests with the given tags. |
||
|
• |
skipped_tests ( Optional[str] ) -- Comma-separated string labels of tests to skip. |
||
|
• |
extra_args ( Optional[list[str]] ) -- Provide these extra arguments to runner for each test. |
||
|
• |
must_support_all_features ( bool ) -- If set, fail if some CWL optional features are unsupported. |
||
|
• |
junit_file ( Optional[str] ) -- JUnit XML file to write test info to. |
Return type
None
toil.test.cwl.cwlTest.TesterFuncType
class
toil.test.cwl.cwlTest.CWLWorkflowTest(methodName='runTest')
Bases: toil.test.ToilTest
CWL tests
included in Toil that don't involve the whole CWL
conformance test suite. Tests Toil-specific functions like
URL types supported for inputs.
setUp()
Runs anew before each test to
create farm fresh temp dirs.
Return type
None
tearDown()
Clean up outputs.
Return type
None
test_cwl_cmdline_input()
Test that running a CWL
workflow with inputs specified on the command line passes.
Return type
None
revsort(cwl_filename, tester_fn)
Parameters
|
• |
cwl_filename ( str ) |
|||
|
• |
tester_fn ( TesterFuncType ) |
Return type
None
revsort_no_checksum(cwl_filename, tester_fn)
Parameters
|
• |
cwl_filename ( str ) |
|||
|
• |
tester_fn ( TesterFuncType ) |
Return type
None
download(inputs, tester_fn)
Parameters
|
• |
inputs ( str ) |
|||
|
• |
tester_fn ( TesterFuncType ) |
Return type
None
load_contents(inputs, tester_fn)
Parameters
|
• |
inputs ( str ) |
|||
|
• |
tester_fn ( TesterFuncType ) |
Return type
None
download_directory(inputs, tester_fn)
Parameters
|
• |
inputs ( str ) |
|||
|
• |
tester_fn ( TesterFuncType ) |
Return type
None
download_subdirectory(inputs, tester_fn)
Parameters
|
• |
inputs ( str ) |
|||
|
• |
tester_fn ( TesterFuncType ) |
Return type
None
test_mpi()
Return type
None
test_s3_as_secondary_file()
Return type
None
test_run_revsort()
Return type
None
test_run_revsort_nochecksum()
Return type
None
test_run_revsort_no_container()
Return type
None
test_run_revsort2()
Return type
None
test_run_revsort_debug_worker()
Return type
None
test_run_colon_output()
Return type
None
test_run_dockstore_trs()
Return type
None
test_glob_dir_bypass_file_store()
Return type
None
test_required_input_condition_protection()
Return type
None
test_slurm_node_memory()
Return type
None
test_download_s3()
Return type
None
test_download_http()
Return type
None
test_download_https()
Return type
None
test_download_https_reference()
Return type
None
test_download_file()
Return type
None
test_download_directory_s3()
Return type
None
test_download_directory_s3_reference()
Return type
None
test_download_directory_file()
Return type
None
test_download_subdirectory_s3()
Return type
None
test_download_subdirectory_file()
Return type
None
test_load_contents_s3()
Return type
None
test_load_contents_http()
Return type
None
test_load_contents_https()
Return type
None
test_load_contents_file()
Return type
None
test_bioconda()
Return type
None
test_default_args()
Return type
None
test_biocontainers()
Return type
None
test_cuda()
Return type
None
test_restart()
Enable restarts with
toil-cwl-runner -- run failing test, re-run correct test.
Only implemented for single machine.
Return type
None
test_streamable(extra_args=None)
Test that a file with
'streamable'=True is a named pipe. This is a CWL1.2 feature.
Parameters
extra_args ( Optional[list[str]] )
Return type
None
test_streamable_reference()
Test that a streamable file is
a stream even when passed around by URI.
Return type
None
test_preemptible()
Tests that the
http://arvados.org/cwl#UsePreemptible
extension is
supported.
Return type
None
test_preemptible_expression()
Tests that the
http://arvados.org/cwl#UsePreemptible
extension is
validated.
Return type
None
class toil.test.cwl.cwlTest.CWLv10Test(methodName='runTest')
Bases: toil.test.ToilTest
Run the CWL 1.0
conformance tests in various environments.
setUp()
Runs anew before each test to
create farm fresh temp dirs.
Return type
None
tearDown()
Clean up outputs.
Return type
None
test_run_conformance_with_caching()
Return type
None
test_run_conformance(batchSystem=None,
caching=False,
selected_tests=None, skipped_tests=None,
extra_args=None)
Parameters
|
• |
batchSystem ( Optional[str] ) |
|||
|
• |
caching ( bool ) |
|||
|
• |
selected_tests ( Optional[str] ) |
|||
|
• |
skipped_tests ( Optional[str] ) |
|||
|
• |
extra_args ( Optional[list[str]] ) |
Return type
None
test_lsf_cwl_conformance(caching=False)
Parameters
caching ( bool )
Return type
None
test_slurm_cwl_conformance(caching=False)
Parameters
caching ( bool )
Return type
None
test_torque_cwl_conformance(caching=False)
Parameters
caching ( bool )
Return type
None
test_gridengine_cwl_conformance(caching=False)
Parameters
caching ( bool )
Return type
None
test_mesos_cwl_conformance(caching=False)
Parameters
caching ( bool )
Return type
None
test_kubernetes_cwl_conformance(caching=False)
Parameters
caching ( bool )
Return type
None
test_lsf_cwl_conformance_with_caching()
Return type
None
test_slurm_cwl_conformance_with_caching()
Return type
None
test_torque_cwl_conformance_with_caching()
Return type
None
test_gridengine_cwl_conformance_with_caching()
Return type
None
test_mesos_cwl_conformance_with_caching()
Return type
None
test_kubernetes_cwl_conformance_with_caching()
Return type
None
class toil.test.cwl.cwlTest.CWLv11Test(methodName='runTest')
Bases: toil.test.ToilTest
Run the CWL 1.1
conformance tests in various environments.
cwlSpec:
str
test_yaml:
str
classmethod setUpClass()
Runs anew before each test.
Return type
None
tearDown()
Clean up outputs.
Return type
None
test_run_conformance(caching=False,
batchSystem=None,
skipped_tests=None, extra_args=None)
Parameters
|
• |
caching ( bool ) |
|||
|
• |
batchSystem ( Optional[str] ) |
|||
|
• |
skipped_tests ( Optional[str] ) |
|||
|
• |
extra_args ( Optional[list[str]] ) |
Return type
None
test_run_conformance_with_caching()
Return type
None
test_kubernetes_cwl_conformance(caching=False)
Parameters
caching ( bool )
Return type
None
test_kubernetes_cwl_conformance_with_caching()
Return type
None
class toil.test.cwl.cwlTest.CWLv12Test(methodName='runTest')
Bases: toil.test.ToilTest
Run the CWL 1.2
conformance tests in various environments.
rootDir:
str
cwlSpec:
str
test_yaml:
str
classmethod setUpClass()
Runs anew before each test.
Return type
None
tearDown()
Clean up outputs.
Return type
None
test_run_conformance(runner=None,
caching=False,
batchSystem=None, selected_tests=None, skipped_tests=None,
extra_args=None, must_support_all_features=False,
junit_file=None)
Parameters
|
• |
runner ( Optional[str] ) |
|||
|
• |
caching ( bool ) |
|||
|
• |
batchSystem ( Optional[str] ) |
|||
|
• |
selected_tests ( Optional[str] ) |
|||
|
• |
skipped_tests ( Optional[str] ) |
|||
|
• |
extra_args ( Optional[list[str]] ) |
|||
|
• |
must_support_all_features ( bool ) |
|||
|
• |
junit_file ( Optional[str] ) |
Return type
None
test_run_conformance_with_caching()
Return type
None
test_run_conformance_with_in_place_update()
Make sure that with
--bypass-file-store we properly support in place update on a
single node, and that this doesn't break any other features.
Return type
None
test_kubernetes_cwl_conformance(caching=False, junit_file=None)
Parameters
|
• |
caching ( bool ) |
|||
|
• |
junit_file ( Optional[str] ) |
Return type
None
test_kubernetes_cwl_conformance_with_caching()
Return type
None
test_wes_server_cwl_conformance()
Run the CWL conformance tests via WES. TOIL_WES_ENDPOINT must be specified. If the WES server requires authentication, set TOIL_WES_USER and TOIL_WES_PASSWORD.
To run manually:
TOIL_WES_ENDPOINT=http://localhost:8080
TOIL_WES_USER=test TOIL_WES_PASSWORD=password python -m
pytest
src/toil/test/cwl/cwlTest.py::CWLv12Test::test_wes_server_cwl_conformance
-vv --log-level INFO --log-cli-level INFO
Return type
None
toil.test.cwl.cwlTest.test_workflow_echo_string_scatter_stderr_log_dir(tmp_path)
Parameters
tmp_path ( pathlib.Path )
Return type
None
toil.test.cwl.cwlTest.test_log_dir_echo_no_output(tmp_path)
Parameters
tmp_path ( pathlib.Path )
Return type
None
toil.test.cwl.cwlTest.test_log_dir_echo_stderr(tmp_path)
Parameters
tmp_path ( pathlib.Path )
Return type
None
toil.test.cwl.cwlTest.test_filename_conflict_resolution(tmp_path)
Parameters
tmp_path ( pathlib.Path )
Return type
None
toil.test.cwl.cwlTest.test_filename_conflict_resolution_3_or_more(tmp_path)
Parameters
tmp_path ( pathlib.Path )
Return type
None
toil.test.cwl.cwlTest.test_filename_conflict_detection(tmp_path)
Make sure we don't just stage
files over each other when using a container.
Parameters
tmp_path ( pathlib.Path )
Return type
None
toil.test.cwl.cwlTest.test_filename_conflict_detection_at_root(tmp_path)
Make sure we don't just stage files over each other.
Specifically,
when using a container and the files are at the root of the
work dir.
Parameters
tmp_path ( pathlib.Path )
Return type
None
toil.test.cwl.cwlTest.test_pick_value_with_one_null_value(caplog)
Make sure toil-cwl-runner does
not false log a warning when pickValue is used but
outputSource only contains one null value. See: #3991.
Parameters
caplog ( pytest.LogCaptureFixture )
Return type
None
toil.test.cwl.cwlTest.test_workflow_echo_string()
Return type
None
toil.test.cwl.cwlTest.test_workflow_echo_string_scatter_capture_stdout()
Return type
None
toil.test.cwl.cwlTest.test_visit_top_cwl_class()
Return type
None
toil.test.cwl.cwlTest.test_visit_cwl_class_and_reduce()
Return type
None
toil.test.cwl.cwlTest.test_download_structure(tmp_path)
Make sure that
download_structure makes the right calls to what it thinks
is the file store.
Parameters
tmp_path ( pathlib.Path )
Return type
None
toil.test.cwl.cwlTest.test_import_on_workers()
Return type
None
class toil.test.cwl.cwlTest.ImportWorkersMessageHandler
Bases: _stream_handler
Detect the
import workers log message and set a flag.
detected = False
emit(record)
Emit a record.
If a formatter
is specified, it is used to format the record. The record is
then written to the stream with a trailing newline. If
exception information is present, it is formatted using
traceback.print_exception and appended to the stream. If the
stream has an 'encoding' attribute, it is used to determine
how to do the output to the stream.
Parameters
record ( logging.LogRecord )
Return type
None
toil.test.docs
Submodules
toil.test.docs.scriptsTest
Attributes
Classes
Module Contents
toil.test.docs.scriptsTest.pkg_root
class
toil.test.docs.scriptsTest.ToilDocumentationTest(methodName='runTest')
Bases: toil.test.ToilTest
Tests for
scripts in the toil tutorials.
classmethod setUpClass()
Hook method for setting up class fixture before running tests in the class.
tearDown()
Hook method for deconstructing
the test fixture after testing it.
Return type
None
checkExitCode(script, extra_args=[])
Parameters
extra_args ( list[str] )
checkExpectedOut(script,
expectedOutput)
checkExpectedPattern(script, expectedPattern)
testStats()
testDynamic()
testEncapsulation()
testEncapsulation2()
testHelloworld()
testInvokeworkflow()
testInvokeworkflow2()
testJobFunctions()
testManaging()
testManaging2()
testMultiplejobs()
testMultiplejobs2()
testMultiplejobs3()
testPromises2()
testQuickstart()
testRequirements()
testArguments()
testDocker()
testPromises()
testServices()
testStaging()
toil.test.jobStores
Submodules
toil.test.jobStores.jobStoreTest
Attributes
Classes
Functions
Module Contents
toil.test.jobStores.jobStoreTest.google_retry(x)
toil.test.jobStores.jobStoreTest.logger
toil.test.jobStores.jobStoreTest.tearDownModule()
class
toil.test.jobStores.jobStoreTest.AbstractJobStoreTest
Hide abstract base class from unittest's test case loader
-
http://stackoverflow.com/questions/1323455/python-unit-test-with-base-and-sub-class#answer-25695512
class Test(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
classmethod setUpClass()
Hook method for setting up class fixture before running tests in the class.
setUp()
Hook method for setting up the test fixture before exercising it.
tearDown()
Hook method for deconstructing the test fixture after testing it.
testInitialState()
Ensure proper handling of nonexistent files.
testJobCreation()
Test creation of a job.
Does the job exist in the jobstore it is supposed to be in? Are its attributes what is expected?
testConfigEquality()
Ensure that the command line configurations are successfully loaded and stored.
In setUp() self.jobstore1 is created and initialized. In this test, after creating newJobStore, .resume() will look for a previously instantiated job store and load its config options. This is expected to be equal but not the same object.
testJobLoadEquality()
Tests that a job created via one JobStore instance can be loaded from another.
testChildLoadingEquality()
Test that loading a child job operates as expected.
testPersistantFilesToDelete()
Make sure that updating a job persists filesToDelete.
The following demonstrates the job update pattern, where files to be deleted atomically with a job update are referenced in "filesToDelete" array, which is persisted to disk first. If things go wrong during the update, this list of files to delete is used to ensure that the updated job and the files are never both visible at the same time.
testUpdateBehavior()
Tests the proper behavior during updating jobs.
testJobDeletions()
Tests the consequences of deleting jobs.
testSharedFiles()
Tests the sharing of files.
testReadWriteSharedFilesTextMode()
Checks if text mode is compatible for shared file streams.
testReadWriteFileStreamTextMode()
Checks if text mode is compatible for file streams.
testPerJobFiles()
Tests the behavior of files on jobs.
testStatsAndLogging()
Tests behavior of reading and writing stats and logging.
testWriteLogFiles()
Test writing log files.
testBatchCreate()
Test creation of many jobs.
testGrowingAndShrinkingJob()
Make sure jobs update correctly if they grow/shrink.
externalStoreCache
classmethod cleanUpExternalStores()
mpTestPartSize
classmethod makeImportExportTests()
testImportHttpFile()
Test importing a file over HTTP.
testImportFtpFile()
Test importing a file over FTP
testFileDeletion()
Intended to cover the batch deletion of items in the AWSJobStore, but it doesn't hurt running it on the other job stores.
testMultipartUploads()
This test is meant to cover multi-part uploads in the AWSJobStore but it doesn't hurt running it against the other job stores as well.
testZeroLengthFiles()
Test reading and writing of empty files.
testLargeFile()
Test the reading and writing of large files.
fetch_url(url)
Fetch the given URL. Throw an
error if it cannot be fetched in a reasonable number of
attempts.
Parameters
url ( str )
Return type
None
assertUrl(url)
testCleanCache()
testPartialReadFromStream()
Test whether readFileStream will deadlock on a partial read.
testDestructionOfCorruptedJobStore()
testDestructionIdempotence()
testEmptyFileStoreIDIsReadable()
Simply creates an empty fileStoreID and attempts to read from it.
class toil.test.jobStores.jobStoreTest.AbstractEncryptedJobStoreTest
class Test(methodName='runTest')
Bases: AbstractJobStoreTest
A test of job
stores that use encryption
setUp()
Hook method for setting up the test fixture before exercising it.
tearDown()
Hook method for deconstructing the test fixture after testing it.
testEncrypted()
Create an encrypted file. Read it in encrypted mode then try with encryption off to ensure that it fails.
class
toil.test.jobStores.jobStoreTest.FileJobStoreTest(methodName='runTest')
Bases: AbstractJobStoreTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
testPreserveFileName()
Check that the fileID ends with the given file name.
test_jobstore_init_preserves_symlink_path()
Test that if we provide a fileJobStore with a symlink to a directory, it doesn't de-reference it.
test_jobstore_does_not_leak_symlinks()
Test that if we link imports into the FileJobStore, we can't get hardlinks to symlinks.
test_file_link_imports()
Test that imported files are symlinked when when expected
test_symlink_read_control()
Test that files are read by symlink when expected
class
toil.test.jobStores.jobStoreTest.GoogleJobStoreTest(methodName='runTest')
Bases: AbstractJobStoreTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
projectID
headers
class
toil.test.jobStores.jobStoreTest.AWSJobStoreTest(methodName='runTest')
Bases: AbstractJobStoreTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
testSDBDomainsDeletedOnFailedJobstoreBucketCreation()
This test ensures that SDB domains bound to a jobstore are deleted if the jobstore bucket failed to be created. We simulate a failed jobstore bucket creation by using a bucket in a different region with the same name.
testInlinedFiles()
testOverlargeJob()
testMultiThreadImportFile()
Tests that importFile is
thread-safe.
Return type
None
class
toil.test.jobStores.jobStoreTest.InvalidAWSJobStoreTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
testInvalidJobStoreName()
class
toil.test.jobStores.jobStoreTest.EncryptedAWSJobStoreTest(methodName='runTest')
Bases: AWSJobStoreTest , AbstractEncryptedJobStoreTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running tests you may optionally set the TOIL_TEST_TEMP environment variable to the path of a directory where you want temporary test files be placed. The directory will be created if it doesn't exist. The path may be relative in which case it will be assumed to be relative to the project root. If TOIL_TEST_TEMP is not defined, temporary files and directories will be created in the system's default location for such files and any temporary files or directories left over from tests will be removed automatically removed during tear down. Otherwise, left-over files will not be removed.
class
toil.test.jobStores.jobStoreTest.StubHttpRequestHandler(*args,
directory=None, **kwargs)
Bases: http.server.SimpleHTTPRequestHandler
Simple HTTP request handler with GET and HEAD commands.
This serves files from the current directory and any of its subdirectories. The MIME type for files is determined by calling the .guess_type() method.
The GET and
HEAD requests are identical except that the HEAD request
omits the actual contents of the file.
fileContents = 'A good programmer looks both ways before
crossing a one-way street'
do_GET()
Serve a GET request.
toil.test.lib
Submodules
toil.test.lib.aws
Submodules
toil.test.lib.aws.test_iam
Attributes
Classes
Module Contents
toil.test.lib.aws.test_iam.logger
class
toil.test.lib.aws.test_iam.IAMTest(methodName='runTest')
Bases: toil.test.ToilTest
Check that
given permissions and associated functions perform correctly
test_permissions_iam()
test_negative_permissions_iam()
test_wildcard_handling()
test_get_policy_permissions()
test_create_delete_iam_role()
toil.test.lib.aws.test_s3
Attributes
Classes
Module Contents
toil.test.lib.aws.test_s3.logger
class
toil.test.lib.aws.test_s3.S3Test(methodName='runTest')
Bases: toil.test.ToilTest
Confirm the
workarounds for us-east-1.
s3_resource: mypy_boto3_s3.S3ServiceResource |
None
bucket: mypy_boto3_s3.service_resource.Bucket |
None
classmethod setUpClass()
Hook method for setting up
class fixture before running tests in the class.
Return type
None
test_create_bucket()
Test bucket creation for
us-east-1.
Return type
None
test_get_bucket_location_public_bucket()
Test getting buket location for
a bucket we don't own.
Return type
None
classmethod tearDownClass()
Hook method for deconstructing
the class fixture after running all tests in the class.
Return type
None
toil.test.lib.aws.test_utils
Attributes
Classes
Module Contents
toil.test.lib.aws.test_utils.logger
class
toil.test.lib.aws.test_utils.TagGenerationTest(methodName='runTest')
Bases: toil.test.ToilTest
Test for tag
generation from environment variables
test_build_tag()
test_empty_aws_tags()
test_incorrect_json_object()
test_incorrect_json_emoji()
test_build_tag_with_tags()
toil.test.lib.dockerTest
Attributes
Classes
Module Contents
toil.test.lib.dockerTest.logger
class
toil.test.lib.dockerTest.DockerTest(methodName='runTest')
Bases: toil.test.ToilTest
Tests
dockerCall and ensures no containers are left around. When
running tests you may optionally set the TOIL_TEST_TEMP
environment variable to the path of a directory where you
want temporary test files be placed. The directory will be
created if it doesn't exist. The path may be relative in
which case it will be assumed to be relative to the project
root. If TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
setUp()
Hook method for setting up the test fixture before exercising it.
testDockerClean(caching=False,
detached=True, rm=True,
deferParam=None)
Run the test container that creates a file in the work dir, and sleeps for 5 minutes. Ensure that the calling job gets SIGKILLed after a minute, leaving behind the spooky/ghost/zombie container. Ensure that the container is killed on batch system shutdown (through the deferParam mechanism).
testDockerClean_CRx_FORGO()
testDockerClean_CRx_STOP()
testDockerClean_CRx_RM()
testDockerClean_CRx_None()
testDockerClean_CxD_FORGO()
testDockerClean_CxD_STOP()
testDockerClean_CxD_RM()
testDockerClean_CxD_None()
testDockerClean_Cxx_FORGO()
testDockerClean_Cxx_STOP()
testDockerClean_Cxx_RM()
testDockerClean_Cxx_None()
testDockerClean_xRx_FORGO()
testDockerClean_xRx_STOP()
testDockerClean_xRx_RM()
testDockerClean_xRx_None()
testDockerClean_xxD_FORGO()
testDockerClean_xxD_STOP()
testDockerClean_xxD_RM()
testDockerClean_xxD_None()
testDockerClean_xxx_FORGO()
testDockerClean_xxx_STOP()
testDockerClean_xxx_RM()
testDockerClean_xxx_None()
testDockerPipeChain(caching=False)
Test for piping API for dockerCall(). Using this API (activated when list of argument lists is given as parameters), commands a piped together into a chain. ex: parameters=[ ['printf', 'x\n y\n'], ['wc', '-l'] ] should execute: printf 'x\n y\n' | wc -l
testDockerPipeChainErrorDetection(caching=False)
By default, executing cmd1 | cmd2 | ... | cmdN, will only return an error if cmdN fails. This can lead to all manor of errors being silently missed. This tests to make sure that the piping API for dockerCall() throws an exception if non-last commands in the chain fail.
testNonCachingDockerChain()
testNonCachingDockerChainErrorDetection()
testDockerLogs(stream=False, demux=False)
Test for the different log outputs when deatch=False.
testDockerLogs_Stream()
testDockerLogs_Demux()
testDockerLogs_Demux_Stream()
toil.test.lib.test_conversions
Attributes
Classes
Module Contents
toil.test.lib.test_conversions.logger
class
toil.test.lib.test_conversions.ConversionTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
test_convert()
test_human2bytes()
test_hms_duration_to_seconds()
toil.test.lib.test_ec2
Attributes
Classes
Module Contents
toil.test.lib.test_ec2.logger
class
toil.test.lib.test_ec2.FlatcarFeedTest(methodName='runTest')
Bases: toil.test.ToilTest
Test accessing
the Flatcar AMI release feed, independent of the AWS API
test_parse_archive_feed()
Make sure we can get a Flatcar release from the Internet Archive.
test_parse_beta_feed()
Make sure we can get a Flatcar release from the beta channel.
test_parse_stable_feed()
Make sure we can get a Flatcar release from the stable channel.
class toil.test.lib.test_ec2.AMITest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
classmethod setUpClass()
Hook method for setting up class fixture before running tests in the class.
test_fetch_flatcar()
test_fetch_arm_flatcar()
Test flatcar AMI finder architecture parameter.
toil.test.lib.test_integration
Attributes
Classes
Module Contents
toil.test.lib.test_integration.logger
class
toil.test.lib.test_integration.DockstoreLookupTest(methodName='runTest')
Bases: toil.test.ToilTest
Make sure we
can look up workflows on Dockstore.
read_result(url_or_path)
Read a file or URL.
Binary mode to allow testing for binary file support.
This lets us
test that we have the right workflow contents and not care
how we are being shown them.
Parameters
url_or_path ( str )
Return type
IO[ bytes ]
test_lookup_from_page_url()
Return type
None
test_lookup_from_trs()
Return type
None
test_lookup_from_trs_cached()
Return type
None
test_lookup_from_trs_with_version()
Return type
None
test_lookup_from_trs_nonexistent_version()
Return type
None
toil.test.lib.test_misc
Attributes
Classes
Module Contents
toil.test.lib.test_misc.logger
class
toil.test.lib.test_misc.UserNameAvailableTest(methodName='runTest')
Bases: toil.test.ToilTest
Make sure we
can get user names when they are available.
test_get_user_name()
class
toil.test.lib.test_misc.UserNameUnvailableTest(methodName='runTest')
Bases: toil.test.ToilTest
Make sure we
can get something for a user name when user names are not
available.
setUp()
Hook method for setting up the test fixture before exercising it.
tearDown()
Hook method for deconstructing the test fixture after testing it.
test_get_user_name()
class
toil.test.lib.test_misc.UserNameVeryBrokenTest(methodName='runTest')
Bases: toil.test.ToilTest
Make sure we
can get something for a user name when user name fetching is
broken in ways we did not expect.
setUp()
Hook method for setting up the test fixture before exercising it.
tearDown()
Hook method for deconstructing the test fixture after testing it.
test_get_user_name()
toil.test.mesos
Submodules
toil.test.mesos.MesosDataStructuresTest
Classes
Module Contents
class
toil.test.mesos.MesosDataStructuresTest.DataStructuresTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
testJobQueue(testJobs=1000)
The mesos JobQueue sorts MesosShape objects by requirement and this test ensures that that sorting is what is expected: non-preemptible jobs groups first, with priority given to large jobs.
toil.test.mesos.helloWorld
A simple user script for Toil
Attributes
Functions
Module Contents
toil.test.mesos.helloWorld.childMessage
= 'The child job is now
running!'
toil.test.mesos.helloWorld.parentMessage = 'The parent job
is now
running!'
toil.test.mesos.helloWorld.hello_world(job)
toil.test.mesos.helloWorld.hello_world_child(job, hw)
toil.test.mesos.helloWorld.main()
toil.test.mesos.stress
Classes
Functions
Module Contents
toil.test.mesos.stress.touchFile(fileStore)
class toil.test.mesos.stress.LongTestJob(numJobs)
Bases: toil.job.Job
Class
represents a unit of work in toil.
numJobs
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
class toil.test.mesos.stress.LongTestFollowOn
Bases: toil.job.Job
Class
represents a unit of work in toil.
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
class toil.test.mesos.stress.HelloWorldJob(i)
Bases: toil.job.Job
Class represents a unit of work in toil.
|
i |
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
class toil.test.mesos.stress.HelloWorldFollowOn(i)
Bases: toil.job.Job
Class represents a unit of work in toil.
|
i |
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
toil.test.mesos.stress.main(numJobs)
toil.test.options
Submodules
toil.test.options.options
Classes
Module Contents
class toil.test.options.options.OptionsTest(methodName='runTest')
Bases: toil.test.ToilTest
Class to test
functionality of all Toil options
test_default_caching_slurm()
Test to ensure that caching will be set to false when running on Slurm :return:
test_caching_option_priority()
Test to ensure that the --caching option takes priority over the default_caching() return value :return:
toil.test.provisioners
Submodules
toil.test.provisioners.aws
Submodules
toil.test.provisioners.aws.awsProvisionerTest
Attributes
Classes
Module Contents
toil.test.provisioners.aws.awsProvisionerTest.log
class
toil.test.provisioners.aws.awsProvisionerTest.AWSProvisionerBenchTest(methodName='runTest')
Bases: toil.test.ToilTest
Tests for the
AWS provisioner that don't actually provision anything.
test_AMI_finding()
test_read_write_global_files()
Make sure the _write_file_to_cloud() and _read_file_from_cloud() functions of the AWS provisioner work as intended.
class
toil.test.provisioners.aws.awsProvisionerTest.AbstractAWSAutoscaleTest(methodName)
Bases: toil.test.provisioners.clusterTest.AbstractClusterTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
instanceTypes = ['m5a.large']
clusterName
numWorkers = ['2']
numSamples = 2
spotBid = 0.15
scriptDir = '/tmp/t'
venvDir = '/tmp/venv'
dataDir = '/tmp'
scriptName = 'test_script.py'
script()
Return the full path to the user script on the leader.
data(filename)
Return the full path to the data file with the given name on the leader.
rsyncUtil(src, dest)
getRootVolID()
Return type
str
putScript(content)
Helper method for _getScript to
inject a script file at the configured script path, from
text.
Parameters
content ( str )
class
toil.test.provisioners.aws.awsProvisionerTest.AWSAutoscaleTest(name)
Bases: AbstractAWSAutoscaleTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
clusterName
requestedLeaderStorage = 80
scriptName = 'sort.py'
setUp()
Set up for the test. Must be overridden to call this method and set self.jobStore.
launchCluster()
getRootVolID()
Adds in test to check that EBS
volume is build with adequate size. Otherwise is
functionally equivalent to parent. :return: volumeID
Return type
str
testAutoScale()
testSpotAutoScale()
testSpotAutoScaleBalancingTypes()
class
toil.test.provisioners.aws.awsProvisionerTest.AWSStaticAutoscaleTest(name)
Bases: AWSAutoscaleTest
Runs the tests
on a statically provisioned cluster with autoscaling
enabled.
requestedNodeStorage = 20
launchCluster()
class
toil.test.provisioners.aws.awsProvisionerTest.AWSManagedAutoscaleTest(name)
Bases: AWSAutoscaleTest
Runs the tests
on a self-scaling Kubernetes cluster.
requestedNodeStorage = 20
launchCluster()
class
toil.test.provisioners.aws.awsProvisionerTest.AWSAutoscaleTestMultipleNodeTypes(name)
Bases: AbstractAWSAutoscaleTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
clusterName
setUp()
Set up for the test. Must be overridden to call this method and set self.jobStore.
testAutoScale()
class
toil.test.provisioners.aws.awsProvisionerTest.AWSRestartTest(name)
Bases: AbstractAWSAutoscaleTest
This test
insures autoscaling works on a restarted Toil run.
clusterName
scriptName = 'restartScript.py'
setUp()
Set up for the test. Must be overridden to call this method and set self.jobStore.
testAutoScaledCluster()
class
toil.test.provisioners.aws.awsProvisionerTest.PreemptibleDeficitCompensationTest(name)
Bases: AbstractAWSAutoscaleTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
clusterName
scriptName = 'userScript.py'
setUp()
Set up for the test. Must be overridden to call this method and set self.jobStore.
|
test() |
toil.test.provisioners.clusterScalerTest
Attributes
Classes
Module Contents
toil.test.provisioners.clusterScalerTest.logger
toil.test.provisioners.clusterScalerTest.c4_8xlarge_preemptible
toil.test.provisioners.clusterScalerTest.c4_8xlarge
toil.test.provisioners.clusterScalerTest.r3_8xlarge
toil.test.provisioners.clusterScalerTest.r5_2xlarge
toil.test.provisioners.clusterScalerTest.r5_4xlarge
toil.test.provisioners.clusterScalerTest.t2_micro
class
toil.test.provisioners.clusterScalerTest.BinPackingTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
setUp()
Hook method for setting up the test fixture before exercising it.
testPackingOneShape()
Pack one shape and check that the resulting reservations look sane.
testSorting()
Test that sorting is correct: preemptible, then memory, then cores, then disk, then wallTime.
testAddingInitialNode()
Pack one shape when no nodes are available and confirm that we fit one node properly.
testLowTargetTime()
Test that a low targetTime (0) parallelizes jobs aggressively (1000 queued jobs require 1000 nodes).
Ideally, low targetTime means: Start quickly and maximize parallelization after the cpu/disk/mem have been packed.
Disk/cpu/mem packing is prioritized first, so we set job resource reqs so that each t2.micro (1 cpu/8G disk/1G RAM) can only run one job at a time with its resources.
Each job is parametrized to take 300 seconds, so (the minimum of) 1 of them should fit into each node's 0 second window, so we expect 1000 nodes.
testHighTargetTime()
Test that a high targetTime (3600 seconds) maximizes packing within the targetTime.
Ideally, high targetTime means: Maximize packing within the targetTime after the cpu/disk/mem have been packed.
Disk/cpu/mem packing is prioritized first, so we set job resource reqs so that each t2.micro (1 cpu/8G disk/1G RAM) can only run one job at a time with its resources.
Each job is parametrized to take 300 seconds, so 12 of them should fit into each node's 3600 second window. 1000/12 = 83.33, so we expect 84 nodes.
testZeroResourceJobs()
Test that jobs requiring zero cpu/disk/mem pack first, regardless of targetTime.
Disk/cpu/mem packing is prioritized first, so we set job resource reqs so that each t2.micro (1 cpu/8G disk/1G RAM) can run a seemingly infinite number of jobs with its resources.
Since all jobs should pack cpu/disk/mem-wise on a t2.micro, we expect only one t2.micro to be provisioned. If we raise this, as in testLowTargetTime, it will launch 1000 t2.micros.
testLongRunningJobs()
Test that jobs with long run times (especially service jobs) are aggressively parallelized.
This is important, because services are one case where the degree of parallelization really, really matters. If you have multiple services, they may all need to be running simultaneously before any real work can be done.
Despite setting globalTargetTime=3600, this should launch 1000 t2.micros because each job's estimated runtime (30000 seconds) extends well beyond 3600 seconds.
run1000JobsOnMicros(jobCores,
jobMem, jobDisk, jobTime,
globalTargetTime)
Test packing 1000 jobs on t2.micros. Depending on the targetTime and resources, these should pack differently.
testPathologicalCase()
Test a pathological case where only one node can be requested to fit months' worth of jobs.
If the reservation is extended to fit a long job, and the bin-packer naively searches through all the reservation slices to find the first slice that fits, it will happily assign the first slot that fits the job, even if that slot occurs days in the future.
testJobTooLargeForAllNodes()
If a job is too large for all node types, the scaler should print a warning, but definitely not crash.
class
toil.test.provisioners.clusterScalerTest.ClusterScalerTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
setUp()
Hook method for setting up the test fixture before exercising it.
testRounding()
Test to make sure the ClusterScaler's rounding rounds properly.
testMaxNodes()
Set the scaler to be very aggressive, give it a ton of jobs, and make sure it doesn't go over maxNodes.
testMinNodes()
Without any jobs queued, the scaler should still estimate "minNodes" nodes.
testPreemptibleDeficitResponse()
When a preemptible deficit was detected by a previous run of the loop, the scaler should add non-preemptible nodes to compensate in proportion to preemptibleCompensation.
testPreemptibleDeficitIsSet()
Make sure that updateClusterSize sets the preemptible deficit if it can't launch preemptible nodes properly. That way, the deficit can be communicated to the next run of estimateNodeCount.
testNoLaunchingIfDeltaAlreadyMet()
Check that the scaler doesn't try to launch "0" more instances if the delta was able to be met by unignoring nodes.
testBetaInertia()
test_overhead_accounting_large()
If a node has a certain raw memory or disk capacity, that won't all be available when it actually comes up; some disk and memory will be used by the OS, and the backing scheduler (Mesos, Kubernetes, etc.).
Make sure this overhead is accounted for for large nodes.
test_overhead_accounting_small()
If a node has a certain raw memory or disk capacity, that won't all be available when it actually comes up; some disk and memory will be used by the OS, and the backing scheduler (Mesos, Kubernetes, etc.).
Make sure this overhead is accounted for for small nodes.
test_overhead_accounting_observed()
If a node has a certain raw memory or disk capacity, that won't all be available when it actually comes up; some disk and memory will be used by the OS, and the backing scheduler (Mesos, Kubernetes, etc.).
Make sure this overhead is accounted for so that real-world observed failures cannot happen again.
class
toil.test.provisioners.clusterScalerTest.ScalerThreadTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
testClusterScaling()
Test scaling for a batch of non-preemptible jobs and no preemptible jobs (makes debugging easier).
testClusterScalingMultipleNodeTypes()
testClusterScalingWithPreemptibleJobs()
Test scaling simultaneously for a batch of preemptible and non-preemptible jobs.
class
toil.test.provisioners.clusterScalerTest.MockBatchSystemAndProvisioner(config,
secondsPerJob)
Bases: toil.batchSystems.abstractBatchSystem.AbstractScalableBatchSystem , toil.provisioners.abstractProvisioner.AbstractProvisioner
Mimics a leader, job batcher, provisioner and scalable batch system.
|
config |
secondsPerJob
provisioner
batchSystem
jobQueue
updatedJobsQueue
jobBatchSystemIDToIssuedJob
totalJobs = 0
totalWorkerTime = 0.0
toilMetrics = None
nodesToWorker
workers
maxWorkers
running = False
leaderThread
toilState
start()
shutDown()
nodeInUse(nodeIP)
Can be used to determine if a
worker node is running any tasks. If the node is doesn't
exist, this function should simply return False.
Parameters
nodeIP -- The worker nodes private IP address
Returns
True if the worker node has been issued any tasks, else False
ignoreNode(nodeAddress)
Stop sending jobs to this node.
Used in autoscaling when the autoscaler is ready to
terminate a node, but jobs are still running. This allows
the node to be terminated after the current jobs have
finished.
Parameters
nodeAddress -- IP address of node to ignore.
unignoreNode(nodeAddress)
Stop ignoring this address, presumably after a node with this address has been terminated. This allows for the possibility of a new node having the same address as a terminated one.
supportedClusterTypes()
Get all the cluster types that this provisioner implementation supports.
createClusterSettings()
Initialize class for a new cluster, to be deployed, when running outside the cloud.
readClusterSettings()
Initialize class from an existing cluster. This method assumes that the instance we are running on is the leader.
Implementations must call _setLeaderWorkerAuthentication().
setAutoscaledNodeTypes(node_types)
Set node types, shapes and spot
bids for Toil-managed autoscaling. :param nodeTypes: A list
of node types, as parsed with parse_node_types.
Parameters
node_types ( list[tuple[set[- toil.provisioners.abstractProvisioner.Shape], Optional[float]]] )
getProvisionedWorkers(instance_type=None, preemptible=None)
Returns a list of Node objects,
each representing a worker node in the cluster
Parameters
preemptible -- If True only return preemptible nodes else return non-preemptible nodes
Returns
list of Node
terminateNodes(nodes)
Terminate the nodes represented
by given Node objects
Parameters
nodes -- list of Node objects
remainingBillingInterval(node)
addJob(jobShape, preemptible=False)
Add a job to the job queue
getNumberOfJobsIssued(preemptible=None)
getJobs()
getNodes(preemptible=False, timeout=600)
Returns a dictionary mapping
node identifiers of preemptible or non-preemptible nodes to
NodeInfo objects, one for each node.
Parameters
|
• |
preemptible ( Optional[bool] ) -- If True (False) only (non-)preemptible nodes will be returned. If None, all nodes will be returned. |
||
|
• |
timeout ( int ) |
addNodes(nodeTypes, numNodes, preemptible)
Used to add worker nodes to the
cluster
Parameters
|
• |
numNodes -- The number of nodes to add |
||
|
• |
preemptible -- whether or not the nodes will be preemptible |
||
|
• |
spotBid -- The bid for preemptible nodes if applicable (this can be set in config, also). |
||
|
• |
nodeTypes ( set[str] ) |
Returns
number of nodes successfully added
Return type
int
getNodeShape(nodeType, preemptible=False)
The shape of a preemptible or
non-preemptible node managed by this provisioner. The node
shape defines key properties of a machine, such as its
number of cores or the time between billing intervals.
Parameters
instance_type ( str ) -- Instance type name to return the shape of.
getWorkersInCluster(nodeShape)
launchCluster(leaderNodeType, keyName, userTags=None,
vpcSubnet=None, leaderStorage=50, nodeStorage=50,
botoPath=None,
**kwargs)
Initialize a cluster and create a leader node.
Implementations
must call _setLeaderWorkerAuthentication() with the leader
so that workers can be launched.
Parameters
|
• |
leaderNodeType -- The leader instance. |
||
|
• |
leaderStorage -- The amount of disk to allocate to the leader in gigabytes. |
||
|
• |
owner -- Tag identifying the owner of the instances. |
destroyCluster()
Terminates all nodes in the
specified cluster and cleans up all resources associated
with the cluster. :param clusterName: identifier of the
cluster to terminate.
Return type
None
getLeader()
Returns
The leader node.
getNumberOfNodes(nodeType=None, preemptible=None)
toil.test.provisioners.clusterTest
Attributes
Classes
Module Contents
toil.test.provisioners.clusterTest.log
class
toil.test.provisioners.clusterTest.AbstractClusterTest(methodName)
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
Parameters
methodName ( str )
keyName
clusterName
leaderNodeType = 't2.medium'
clusterType = 'mesos'
|
zone |
||
|
region |
||
|
aws |
venvDir =
'/tmp/venv'
python()
Return the full path to the
venv Python on the leader.
Return type
str
|
pip() |
Return the full path to the venv pip on the leader. |
Return type
str
destroyCluster()
Destroy the cluster we built, if it exists.
Succeeds if the
cluster does not currently exist.
Return type
None
setUp()
Set up for the test. Must be
overridden to call this method and set self.jobStore.
Return type
None
tearDown()
Hook method for deconstructing
the test fixture after testing it.
Return type
None
sshUtil(command)
Run the given command on the
cluster. Raise subprocess.CalledProcessError if it fails.
Parameters
command ( list[str] )
Return type
None
rsync_util(from_file, to_file)
Transfer a file to/from the cluster.
The
cluster-side path should have a ':' in front of it.
Parameters
|
• |
from_file ( str ) |
|||
|
• |
to_file ( str ) |
Return type
None
createClusterUtil(args=None)
Parameters
args ( Optional[list[str]] )
Return type
None
launchCluster()
Return type
None
class toil.test.provisioners.clusterTest.CWLOnARMTest(methodName)
Bases: AbstractClusterTest
Run the CWL 1.2
conformance tests on ARM specifically.
Parameters
methodName ( str )
clusterName
leaderNodeType = 't4g.2xlarge'
clusterType = 'kubernetes'
cwl_test_dir = '/tmp/toil/cwlTests'
setUp()
Set up for the test. Must be
overridden to call this method and set self.jobStore.
Return type
None
test_cwl_on_arm()
Return type
None
toil.test.provisioners.gceProvisionerTest
Attributes
Classes
Module Contents
toil.test.provisioners.gceProvisionerTest.log
class
toil.test.provisioners.gceProvisionerTest.AbstractGCEAutoscaleTest(methodName)
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
projectID
sshUtil(command)
rsyncUtil(src, dest)
destroyClusterUtil()
createClusterUtil(args=None)
cleanJobStoreUtil()
keyName
botoDir
googleZone = 'us-west1-a'
leaderInstanceType = 'n1-standard-1'
instanceTypes = ['n1-standard-2']
numWorkers = ['2']
numSamples = 2
spotBid = 0.15
setUp()
Hook method for setting up the test fixture before exercising it.
tearDown()
Hook method for deconstructing the test fixture after testing it.
launchCluster()
class toil.test.provisioners.gceProvisionerTest.GCEAutoscaleTest(name)
Bases: AbstractGCEAutoscaleTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
clusterName
requestedLeaderStorage = 80
setUp()
Hook method for setting up the test fixture before exercising it.
launchCluster()
testAutoScale()
testSpotAutoScale()
class
toil.test.provisioners.gceProvisionerTest.GCEStaticAutoscaleTest(name)
Bases: GCEAutoscaleTest
Runs the tests
on a statically provisioned cluster with autoscaling
enabled.
requestedNodeStorage = 20
launchCluster()
class
toil.test.provisioners.gceProvisionerTest.GCEAutoscaleTestMultipleNodeTypes(name)
Bases: AbstractGCEAutoscaleTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
clusterName
setUp()
Hook method for setting up the test fixture before exercising it.
testAutoScale()
class toil.test.provisioners.gceProvisionerTest.GCERestartTest(name)
Bases: AbstractGCEAutoscaleTest
This test
insures autoscaling works on a restarted Toil run
clusterName
setUp()
Hook method for setting up the test fixture before exercising it.
testAutoScaledCluster()
toil.test.provisioners.provisionerTest
Attributes
Classes
Module Contents
toil.test.provisioners.provisionerTest.log
class
toil.test.provisioners.provisionerTest.ProvisionerTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
test_node_type_parsing()
Return type
None
toil.test.provisioners.restartScript
Attributes
Functions
Module Contents
toil.test.provisioners.restartScript.f0(job)
toil.test.provisioners.restartScript.parser
toil.test.server
Submodules
toil.test.server.serverTest
Attributes
Classes
Module Contents
toil.test.server.serverTest.logger
class
toil.test.server.serverTest.ToilServerUtilsTest(methodName='runTest')
Bases: toil.test.ToilTest
Tests for the
utility functions used by the Toil server.
test_workflow_canceling_recovery()
Make sure that a workflow in CANCELING state will be recovered to a terminal state eventually even if the workflow runner Celery task goes away without flipping the state.
class toil.test.server.serverTest.hidden
class AbstractStateStoreTest(methodName='runTest')
Bases: toil.test.ToilTest
Basic tests for
state stores.
abstract get_state_store()
Make a state store to test, on
a single fixed URL.
Return type
AbstractStateStore
test_state_store()
Make sure that the state store
under test can store and load keys.
Return type
None
class
toil.test.server.serverTest.FileStateStoreTest(methodName='runTest')
Bases: hidden
Test file-based
state storage.
setUp()
Hook method for setting up the
test fixture before exercising it.
Return type
None
get_state_store()
Make a state store to test, on
a single fixed local path.
Return type
AbstractStateStore
class
toil.test.server.serverTest.FileStateStoreURLTest(methodName='runTest')
Bases: hidden
Test file-based
state storage using URLs instead of local paths.
setUp()
Hook method for setting up the
test fixture before exercising it.
Return type
None
get_state_store()
Make a state store to test, on
a single fixed URL.
Return type
AbstractStateStore
class toil.test.server.serverTest.BucketUsingTest(methodName='runTest')
Bases: toil.test.ToilTest
Base class for
tests that need a bucket.
region:
str
|
None
s3_resource: mypy_boto3_s3.S3ServiceResource |
None
bucket: mypy_boto3_s3.service_resource.Bucket |
None
bucket_name:
str
|
None
classmethod setUpClass()
Set up the class with a single
pre-existing AWS bucket for all tests.
Return type
None
classmethod tearDownClass()
Hook method for deconstructing
the class fixture after running all tests in the class.
Return type
None
class
toil.test.server.serverTest.AWSStateStoreTest(methodName='runTest')
Bases: hidden , BucketUsingTest
Test AWS-based
state storage.
bucket_path = 'prefix/of/keys'
get_state_store()
Make a state store to test, on
a single fixed URL.
Return type
AbstractStateStore
test_state_store_paths()
Make sure that the S3 state store puts things in the right places.
We don't
really
care about the exact internal structure, but
we do care about actually being under the path we are
supposed to use.
Return type
None
class
toil.test.server.serverTest.AbstractToilWESServerTest(*args,
**kwargs)
Bases: toil.test.ToilTest
Class for
server tests that provides a self.app in testing mode.
setUp()
Hook method for setting up the
test fixture before exercising it.
Return type
None
tearDown()
Hook method for deconstructing
the test fixture after testing it.
Return type
None
class
toil.test.server.serverTest.ToilWESServerBenchTest(*args,
**kwargs)
Bases: AbstractToilWESServerTest
Tests for
Toil's Workflow Execution Service API that don't run
workflows.
test_home()
Test the homepage endpoint.
Return type
None
test_health()
Test the health check endpoint.
Return type
None
test_get_service_info()
Test the GET /service-info
endpoint.
Return type
None
class
toil.test.server.serverTest.ToilWESServerWorkflowTest(*args,
**kwargs)
Bases: AbstractToilWESServerTest
Tests of the
WES server running workflows.
run_zip_workflow(zip_path, include_message=True,
include_params=True)
We have several zip file tests; this submits a zip file and makes sure it ran OK.
If
include_message is set to False, don't send a
"message" argument in workflow_params. If
include_params is also set to False, don't send
workflow_params at all.
Parameters
|
• |
zip_path ( str ) |
|||
|
• |
include_message ( bool ) |
|||
|
• |
include_params ( bool ) |
Return type
None
test_run_workflow_relative_url_no_attachments_fails()
Test run example CWL workflow
from relative workflow URL but with no attachments.
Return type
None
test_run_workflow_relative_url()
Test run example CWL workflow
from relative workflow URL.
Return type
None
test_run_workflow_https_url()
Test run example CWL workflow
from the Internet.
Return type
None
test_run_workflow_single_file_zip()
Test run example CWL workflow
from single-file ZIP.
Return type
None
test_run_workflow_multi_file_zip()
Test run example CWL workflow
from multi-file ZIP.
Return type
None
test_run_workflow_manifest_zip()
Test run example CWL workflow
from ZIP with manifest.
Return type
None
test_run_workflow_inputs_zip()
Test run example CWL workflow
from ZIP without manifest but with inputs.
Return type
None
test_run_workflow_manifest_and_inputs_zip()
Test run example CWL workflow
from ZIP with manifest and inputs.
Return type
None
test_run_workflow_no_params_zip()
Test run example CWL workflow
from ZIP without workflow_params.
Return type
None
test_run_and_cancel_workflows()
Run two workflows, cancel one
of them, and make sure they all exist.
Return type
None
class
toil.test.server.serverTest.ToilWESServerCeleryWorkflowTest(*args,
**kwargs)
Bases: ToilWESServerWorkflowTest
End-to-end workflow-running tests against Celery.
class
toil.test.server.serverTest.ToilWESServerCeleryS3StateWorkflowTest(*args,
**kwargs)
Bases: ToilWESServerWorkflowTest , BucketUsingTest
Test the server
with Celery and state stored in S3.
setUp()
Hook method for setting up the
test fixture before exercising it.
Return type
None
toil.test.sort
Submodules
toil.test.sort.restart_sort
A demonstration of toil. Sorts the lines of a file into ascending order by doing a parallel merge sort. This is an intentionally buggy version that doesn't include restart() for testing purposes.
Attributes
Functions
Module Contents
toil.test.sort.restart_sort.defaultLines
= 1000
toil.test.sort.restart_sort.defaultLineLen = 50
toil.test.sort.restart_sort.sortMemory = '600M'
toil.test.sort.restart_sort.setup(job, inputFile, N,
downCheckpoints,
options)
Sets up the sort. Returns the FileID of the sorted file
toil.test.sort.restart_sort.down(job,
inputFileStoreID, N, path,
downCheckpoints, options, memory=sortMemory)
Input is a file, a subdivision size N, and a path in the hierarchy of jobs. If the range is larger than a threshold N the range is divided recursively and a follow on job is then created which merges back the results else the file is sorted and placed in the output.
toil.test.sort.restart_sort.up(job,
inputFileID1, inputFileID2, path,
options, memory=sortMemory)
Merges the two files and places them in the output.
toil.test.sort.restart_sort.sort(file)
Sorts the given file.
toil.test.sort.restart_sort.merge(fileHandle1,
fileHandle2,
outputFileHandle)
Merges together two files maintaining sorted order.
All handles must be text-mode streams.
toil.test.sort.restart_sort.copySubRangeOfFile(inputFile,
fileStart,
fileEnd)
Copies the range (in bytes) between fileStart and fileEnd to the given output file handle.
toil.test.sort.restart_sort.getMidPoint(file, fileStart, fileEnd)
Finds the point in the file to split. Returns an int i such that fileStart <= i < fileEnd
toil.test.sort.restart_sort.makeFileToSort(fileName,
lines=defaultLines, lineLen=defaultLineLen)
toil.test.sort.restart_sort.main(options=None)
toil.test.sort.sort
A demonstration of toil. Sorts the lines of a file into ascending order by doing a parallel merge sort.
Attributes
Functions
Module Contents
toil.test.sort.sort.defaultLines
= 1000
toil.test.sort.sort.defaultLineLen = 50
toil.test.sort.sort.sortMemory = '600M'
toil.test.sort.sort.setup(job, inputFile, N,
downCheckpoints, options)
Sets up the sort. Returns the FileID of the sorted file
toil.test.sort.sort.down(job,
inputFileStoreID, N, path,
downCheckpoints, options, memory=sortMemory)
Input is a file, a subdivision size N, and a path in the hierarchy of jobs. If the range is larger than a threshold N the range is divided recursively and a follow on job is then created which merges back the results else the file is sorted and placed in the output.
toil.test.sort.sort.up(job,
inputFileID1, inputFileID2, path, options,
memory=sortMemory)
Merges the two files and places them in the output.
toil.test.sort.sort.sort(file)
Sorts the given file.
toil.test.sort.sort.merge(fileHandle1, fileHandle2, outputFileHandle)
Merges together two files maintaining sorted order.
All handles must be text-mode streams.
toil.test.sort.sort.copySubRangeOfFile(inputFile, fileStart, fileEnd)
Copies the range (in bytes) between fileStart and fileEnd to the given output file handle.
toil.test.sort.sort.getMidPoint(file, fileStart, fileEnd)
Finds the point in the file to split. Returns an int i such that fileStart <= i < fileEnd
toil.test.sort.sort.makeFileToSort(fileName,
lines=defaultLines,
lineLen=defaultLineLen)
toil.test.sort.sort.main(options=None)
toil.test.sort.sortTest
Attributes
Classes
Functions
Module Contents
toil.test.sort.sortTest.logger
toil.test.sort.sortTest.defaultLineLen
toil.test.sort.sortTest.defaultLines
toil.test.sort.sortTest.defaultN
toil.test.sort.sortTest.runMain(options)
make sure the output file is deleted every time main is run
class toil.test.sort.sortTest.SortTest(methodName='runTest')
Bases: toil.test.ToilTest , toil.batchSystems.mesos.test.MesosTestSupport
Tests Toil by
sorting a file in parallel on various combinations of job
stores and batch systems.
setUp()
Hook method for setting up the test fixture before exercising it.
tearDown()
Hook method for deconstructing the test fixture after testing it.
testAwsSingle()
testAwsMesos()
testFileMesos()
testGoogleSingle()
testGoogleMesos()
testFileSingle()
testFileSingleNonCaching()
testFileSingleCheckpoints()
testFileSingle10000()
testFileGridEngine()
testFileTorqueEngine()
testNo = 5
testSort()
testMerge()
testCopySubRangeOfFile()
testGetMidPoint()
toil.test.src
Submodules
toil.test.src.autoDeploymentTest
Attributes
Classes
Module Contents
toil.test.src.autoDeploymentTest.logger
class
toil.test.src.autoDeploymentTest.AutoDeploymentTest(methodName='runTest')
Bases: toil.test.ApplianceTestSupport
Tests various
auto-deployment scenarios. Using the appliance, i.e. a
docker container, for these tests allows for running worker
processes on the same node as the leader process while
keeping their file systems separate from each other and the
leader process. Separate file systems are crucial to prove
that auto-deployment does its job.
setUp()
Hook method for setting up the test fixture before exercising it.
sitePackages
testRestart()
Test whether auto-deployment works on restart.
testSplitRootPackages()
Test whether auto-deployment works with a virtualenv in which jobs are defined in completely separate branches of the package hierarchy. Initially, auto-deployment did deploy the entire virtualenv but jobs could only be defined in one branch of the package hierarchy. We define a branch as the maximum set of fully qualified package paths that share the same first component. IOW, a.b and a.c are in the same branch, while a.b and d.c are not.
testUserTypesInJobFunctionArgs()
Test encapsulated, function-wrapping jobs where the function arguments reference user-defined types.
Mainly written to cover - https://github.com/BD2KGenomics/toil/issues/1259 but then also revealed - https://github.com/BD2KGenomics/toil/issues/1278 .
testDeferralWithConcurrentEncapsulation()
Ensure that the following DAG succeeds:
┌───────────┐
│ Root (W1) │
└───────────┘
│
┌──────────┴─────────┐
▼ ▼
┌────────────────┐
┌────────────────────┐
│ Deferring (W2) │ │ Encapsulating (W3)
│═══════════════╗
└────────────────┘
└────────────────────┘
║
│ ║
▼ ▼
┌───────────────────┐
┌────────────────┐
│ Encapsulated (W3) │ │ Follow-on (W6)
│
└───────────────────┘
└────────────────┘
│ │
┌───────┴────────┐
│
▼ ▼ ▼
┌──────────────┐
┌──────────────┐
┌──────────────┐
│ Dummy 1 (W4) │ │ Dummy 2 (W5) │
│ Last (W6) │
└──────────────┘
└──────────────┘
└──────────────┘
The Wn numbers denote the worker processes that a particular job is run in. Deferring adds a deferred function and then runs for a long time. The deferred function will be present in the cache state for the duration of Deferred . Follow-on is the generic Job instance that's added by encapsulating a job. It runs on the same worker node but in a separate worker process, as the first job in that worker. Because …
1) it is the first job in its worker process (the user script has not been made available on the sys.path by a previous job in that worker) and
|
2. |
it shares the cache state with the Deferring job and |
||
|
3. |
it is an instance of Job (and so does not introduce the user script to sys.path itself), |
… it might cause problems with deserializing a deferred function defined in the user script.
Encapsulated has two children to ensure that Follow-on is run in a separate worker.
testDeferralWithFailureAndEncapsulation()
Ensure that the following DAG succeeds:
┌───────────┐
│ Root (W1) │
└───────────┘
│
┌──────────┴─────────┐
▼ ▼
┌────────────────┐
┌────────────────────┐
│ Deferring (W2) │ │ Encapsulating (W3)
│═══════════════════════╗
└────────────────┘
└────────────────────┘
║
│ ║
▼ ▼
┌───────────────────┐
┌────────────────┐
│ Encapsulated (W3)
│════════════╗
│ Follow-on (W7) │
└───────────────────┘
║
└────────────────┘
│ ║
┌──────┴──────┐
║
▼ ▼ ▼
┌────────────┐┌────────────┐
┌──────────────┐
│ Dummy (W4) ││ Dummy (W5) │ │
Trigger (W6) │
└────────────┘└────────────┘
└──────────────┘
Trigger causes Deferring to crash. Follow-on runs next, detects Deferring 's left-overs and runs the deferred function. Follow-on is an instance of Job and the first job in its worker process. This test ensures that despite these circumstances, the user script is loaded before the deferred functions defined in it are being run.
Encapsulated has two children to ensure that Follow-on is run in a new worker. That's the only way to guarantee that the user script has not been loaded yet, which would cause the test to succeed coincidentally. We want to test that auto-deploying and loading of the user script are done properly before deferred functions are being run and before any jobs have been executed by that worker.
toil.test.src.busTest
Attributes
Classes
Functions
Module Contents
toil.test.src.busTest.logger
class
toil.test.src.busTest.MessageBusTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
test_enum_ints_in_file()
Make sure writing bus messages
to files works with enums.
Return type
None
test_cross_thread_messaging()
Make sure message bus works
across threads.
Return type
None
test_restart_without_bus_path()
Test the ability to restart a
workflow when the message bus path used by the previous
attempt is gone.
Return type
None
toil.test.src.busTest.failing_job_fn(job)
This function is guaranteed to
fail.
Parameters
job ( toil.job.Job )
Return type
None
toil.test.src.checkpointTest
Classes
Module Contents
class toil.test.src.checkpointTest.CheckpointTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
testCheckpointNotRetried()
A checkpoint job should not be retried if the workflow has a retryCount of 0.
testCheckpointRetriedOnce()
A checkpoint job should be retried exactly once if the workflow has a retryCount of 1.
testCheckpointedRestartSucceeds()
A checkpointed job should succeed on restart of a failed run if its child job succeeds.
class
toil.test.src.checkpointTest.CheckRetryCount(numFailuresBeforeSuccess)
Bases: toil.job.Job
Fail N times,
succeed on the next try.
numFailuresBeforeSuccess
getNumRetries(fileStore)
Mark a retry in the fileStore, and return the number of retries so far.
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
class
toil.test.src.checkpointTest.AlwaysFail(memory=None,
cores=None,
disk=None, accelerators=None, preemptible=None,
preemptable=None,
unitName='', checkpoint=False, displayName='',
descriptionClass=None,
local=None, files=None)
Bases: toil.job.Job
Class
represents a unit of work in toil.
Parameters
|
• |
memory ( Optional[ParseableIndivisibleResource] ) |
|||
|
• |
cores ( Optional[ParseableDivisibleResource] ) |
|||
|
• |
disk ( Optional[ParseableIndivisibleResource] ) |
|||
|
• |
accelerators ( Optional[ParseableAcceleratorRequirement] ) |
|||
|
• |
preemptible ( Optional[ParseableFlag] ) |
|||
|
• |
preemptable ( Optional[ParseableFlag] ) |
|||
|
• |
unitName ( Optional[str] ) |
|||
|
• |
checkpoint ( Optional[bool] ) |
|||
|
• |
displayName ( Optional[str] ) |
|||
|
• |
descriptionClass ( Optional[type] ) |
|||
|
• |
local ( Optional[bool] ) |
|||
|
• |
files ( Optional[set[toil.fileStores.FileID]] ) |
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
class toil.test.src.checkpointTest.CheckpointFailsFirstTime
Bases: toil.job.Job
Class
represents a unit of work in toil.
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
class
toil.test.src.checkpointTest.FailOnce(memory=None,
cores=None,
disk=None, accelerators=None, preemptible=None,
preemptable=None,
unitName='', checkpoint=False, displayName='',
descriptionClass=None,
local=None, files=None)
Bases: toil.job.Job
Fail the first
time the workflow is run, but succeed thereafter.
Parameters
|
• |
memory ( Optional[ParseableIndivisibleResource] ) |
|||
|
• |
cores ( Optional[ParseableDivisibleResource] ) |
|||
|
• |
disk ( Optional[ParseableIndivisibleResource] ) |
|||
|
• |
accelerators ( Optional[ParseableAcceleratorRequirement] ) |
|||
|
• |
preemptible ( Optional[ParseableFlag] ) |
|||
|
• |
preemptable ( Optional[ParseableFlag] ) |
|||
|
• |
unitName ( Optional[str] ) |
|||
|
• |
checkpoint ( Optional[bool] ) |
|||
|
• |
displayName ( Optional[str] ) |
|||
|
• |
descriptionClass ( Optional[type] ) |
|||
|
• |
local ( Optional[bool] ) |
|||
|
• |
files ( Optional[set[toil.fileStores.FileID]] ) |
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
toil.test.src.deferredFunctionTest
Attributes
Classes
Module Contents
toil.test.src.deferredFunctionTest.logger
class
toil.test.src.deferredFunctionTest.DeferredFunctionTest(methodName='runTest')
Bases: toil.test.ToilTest
Test the
deferred function system.
jobStoreType = 'file'
setUp()
Hook method for setting up the test fixture before exercising it.
testDeferredFunctionRunsWithMethod()
Refer docstring in _testDeferredFunctionRuns. Test with Method
testDeferredFunctionRunsWithClassMethod()
Refer docstring in _testDeferredFunctionRuns. Test with Class Method
testDeferredFunctionRunsWithLambda()
Refer docstring in _testDeferredFunctionRuns. Test with Lambda
testDeferredFunctionRunsWithFailures()
Create 2 non local filesto use as flags. Create a job that registers a function that deletes one non local file. If that file exists, the job SIGKILLs itself. If it doesn't exist, the job registers a second deferred function to delete the second non local file and exits normally.
Initially the first file exists, so the job should SIGKILL itself and neither deferred function will run (in fact, the second should not even be registered). On the restart, the first deferred function should run and the first file should not exist, but the second one should. We assert the presence of the second, then register the second deferred function and exit normally. At the end of the test, neither file should exist.
Incidentally, this also tests for multiple registered deferred functions, and the case where a deferred function fails (since the first file doesn't exist on the retry).
testNewJobsCanHandleOtherJobDeaths()
Create 2 non-local files and then create 2 jobs. The first job registers a deferred job to delete the second non-local file, deletes the first non-local file and then kills itself. The second job waits for the first file to be deleted, then sleeps for a few seconds and then spawns a child. the child of the second does nothing. However starting it should handle the untimely demise of the first job and run the registered deferred function that deletes the first file. We assert the absence of the two files at the end of the run.
testBatchSystemCleanupCanHandleWorkerDeaths()
Create some non-local files. Create a job that registers a deferred function to delete the file and then kills its worker.
Assert that the file is missing after the pipeline fails, because we're using a single-machine batch system and the leader's batch system cleanup will find and run the deferred function.
toil.test.src.dockerCheckTest
Classes
Module Contents
class
toil.test.src.dockerCheckTest.DockerCheckTest(methodName='runTest')
Bases: toil.test.ToilTest
Tests checking
whether a docker image exists or not.
testOfficialUbuntuRepo()
Image exists. This should pass.
testBroadDockerRepo()
Image exists. This should pass.
testBroadDockerRepoBadTag()
Bad tag. This should raise.
testNonexistentRepo()
Bad image. This should raise.
testToilQuayRepo()
Image exists. Should pass.
testBadQuayRepoNTag()
Bad repo and tag. This should raise.
testBadQuayRepo()
Bad repo. This should raise.
testBadQuayTag()
Bad tag. This should raise.
testGoogleRepo()
Image exists. Should pass.
testBadGoogleRepo()
Bad repo and tag. This should raise.
testApplianceParser()
Test that a specified appliance is parsed correctly.
toil.test.src.environmentTest
Attributes
Classes
Functions
Module Contents
toil.test.src.environmentTest.logger
class
toil.test.src.environmentTest.EnvironmentTest(methodName='runTest')
Bases: toil.test.ToilTest
Test to make sure that Toil's environment variable save and restore system (environment.pickle) works.
The environment
should be captured once at the start of the workflow and
should be sent through based on that, not base don the
leader's current environment when the job is launched.
test_environment()
toil.test.src.environmentTest.signal_leader(job)
Make a file in the file store that the leader can see.
toil.test.src.environmentTest.check_environment(job, try_name)
Fail if the test environment is
wrong.
Parameters
try_name ( str )
toil.test.src.environmentTest.wait_a_bit(job)
Toil job that waits.
toil.test.src.environmentTest.check_environment_repeatedly(job)
Toil job that checks the environment, waits, and checks it again, as separate invocations.
toil.test.src.environmentTest.main(options=None)
Run the actual workflow with
the given options.
Parameters
options ( Optional[argparse.Namespace] )
toil.test.src.fileStoreTest
Attributes
Classes
Module Contents
toil.test.src.fileStoreTest.testingIsAutomatic
= True
toil.test.src.fileStoreTest.logger
class toil.test.src.fileStoreTest.hidden
Hiding the abstract test
classes from the Unittest loader so it can be inherited in
different test suites for the different job stores.
class AbstractFileStoreTest(methodName='runTest')
Bases: toil.test.ToilTest
An abstract
base class for testing the various general functions
described in
:class:toil.fileStores.abstractFileStore.AbstractFileStore
jobStoreType = None
setUp()
Hook method for setting up the test fixture before exercising it.
create_file(content,
executable=False)
testToilIsNotBroken()
Runs a simple DAG to test if if any features other that caching were broken.
testFileStoreLogging()
Write a couple of files to the jobstore. Delete a couple of them. Read back written and locally deleted files.
testFileStoreOperations()
Write a couple of files to the jobstore. Delete a couple of them. Read back written and locally deleted files.
testWriteReadGlobalFilePermissions()
Ensures that uploaded files preserve their file permissions when they are downloaded again. This function checks that a written executable file maintains its executability after being read.
testWriteExportFileCompatibility()
Ensures that files created in a job preserve their executable permissions when they are exported from the leader.
testImportReadFileCompatibility()
Ensures that files imported to the leader preserve their executable permissions when they are read by the fileStore.
testReadWriteFileStreamTextMode()
Checks if text mode is compatible with file streams.
class AbstractNonCachingFileStoreTest(methodName='runTest')
Bases: AbstractFileStoreTest
Abstract tests
for the the various functions in
:class:toil.fileStores.nonCachingFileStore.NonCachingFileStore.
These tests are general enough that they can also be used
for :class:toil.fileStores.CachingFileStore.
setUp()
Hook method for setting up the test fixture before exercising it.
class AbstractCachingFileStoreTest(methodName='runTest')
Bases: AbstractFileStoreTest
Abstract tests
for the the various cache-related functions in
:class:toil.fileStores.cachingFileStore.CachingFileStore.
setUp()
Hook method for setting up the test fixture before exercising it.
testExtremeCacheSetup()
Try to create the cache with bad worker active and then have 10 child jobs try to run in the chain. This tests whether the cache is created properly even when the job crashes randomly.
testCacheEvictionPartialEvict()
Ensure the cache eviction happens as expected. Two files (20MB and 30MB) are written sequentially into the job store in separate jobs. The cache max is force set to 50MB. A Third Job requests 10MB of disk requiring eviction of the 1st file. Ensure that the behavior is as expected.
testCacheEvictionTotalEvict()
Ensure the cache eviction happens as expected. Two files (20MB and 30MB) are written sequentially into the job store in separate jobs. The cache max is force set to 50MB. A Third Job requests 10MB of disk requiring eviction of the 1st file. Ensure that the behavior is as expected.
testCacheEvictionFailCase()
Ensure the cache eviction happens as expected. Two files (20MB and 30MB) are written sequentially into the job store in separate jobs. The cache max is force set to 50MB. A Third Job requests 10MB of disk requiring eviction of the 1st file. Ensure that the behavior is as expected.
testAsyncWriteWithCaching()
Ensure the Async Writing of files happens as expected. The first Job forcefully modifies the cache size to 1GB. The second asks for 1GB of disk and writes a 900MB file into cache then rewrites it to the job store triggering an async write since the two unique jobstore IDs point to the same local file. Also, the second write is not cached since the first was written to cache, and there "isn't enough space" to cache the second. Immediately assert that the second write isn't cached, and is being asynchronously written to the job store.
Attempting to get the file from the jobstore should not fail.
testWriteNonLocalFileToJobStore()
Write a file not in localTempDir to the job store. Such a file should not be cached. Ensure the file is not cached.
testWriteLocalFileToJobStore()
Write a file from the localTempDir to the job store. Such a file will be cached by default. Ensure the file is cached.
testReadCacheMissFileFromJobStoreWithoutCachingReadFile()
Read a file from the file store that does not have a corresponding cached copy. Do not cache the read file. Ensure the number of links on the file are appropriate.
testReadCacheMissFileFromJobStoreWithCachingReadFile()
Read a file from the file store that does not have a corresponding cached copy. Cache the read file. Ensure the number of links on the file are appropriate.
testReadCachHitFileFromJobStore()
Read a file from the file store that has a corresponding cached copy. Ensure the number of links on the file are appropriate.
testMultipleJobsReadSameCacheHitGlobalFile()
Write a local file to the job store (hence adding a copy to cache), then have 10 jobs read it. Assert cached file size never goes up, assert unused job required disk space is always (a multiple of job reqs) - (number of current file readers * filesize) . At the end, assert the cache shows unused job-required space = 0.
testMultipleJobsReadSameCacheMissGlobalFile()
Write a non-local file to the job store(hence no cached copy), then have 10 jobs read it. Assert cached file size never goes up, assert unused job required disk space is always (a multiple of job reqs) - (number of current file readers * filesize) . At the end, assert the cache shows unused job-required space = 0.
testFileStoreExportFile()
testReturnFileSizes()
Write a couple of files to the jobstore. Delete a couple of them. Read back written and locally deleted files. Ensure that after every step that the cache is in a valid state.
testReturnFileSizesWithBadWorker()
Write a couple of files to the jobstore. Delete a couple of them. Read back written and locally deleted files. Ensure that after every step that the cache is in a valid state.
testControlledFailedWorkerRetry()
Conduct a couple of job store operations. Then die. Ensure that the restarted job is tracking values in the cache state file appropriately.
testRemoveLocalMutablyReadFile()
If a mutably read file is deleted by the user, it is ok.
testRemoveLocalImmutablyReadFile()
If an immutably read file is deleted by the user, it is not ok.
testDeleteLocalFile()
Test the deletion capabilities of deleteLocalFile
testSimultaneousReadsUncachedStream()
Test many simultaneous read attempts on a file created via a stream directly to the job store.
class
toil.test.src.fileStoreTest.NonCachingFileStoreTestWithFileJobStore(methodName='runTest')
Bases: hidden
Abstract tests
for the the various functions in
:class:toil.fileStores.nonCachingFileStore.NonCachingFileStore.
These tests are general enough that they can also be used
for :class:toil.fileStores.CachingFileStore.
jobStoreType = 'file'
class
toil.test.src.fileStoreTest.CachingFileStoreTestWithFileJobStore(methodName='runTest')
Bases: hidden
Abstract tests
for the the various cache-related functions in
:class:toil.fileStores.cachingFileStore.CachingFileStore.
jobStoreType = 'file'
class
toil.test.src.fileStoreTest.NonCachingFileStoreTestWithAwsJobStore(methodName='runTest')
Bases: hidden
Abstract tests
for the the various functions in
:class:toil.fileStores.nonCachingFileStore.NonCachingFileStore.
These tests are general enough that they can also be used
for :class:toil.fileStores.CachingFileStore.
jobStoreType = 'aws'
class
toil.test.src.fileStoreTest.CachingFileStoreTestWithAwsJobStore(methodName='runTest')
Bases: hidden
Abstract tests
for the the various cache-related functions in
:class:toil.fileStores.cachingFileStore.CachingFileStore.
jobStoreType = 'aws'
class
toil.test.src.fileStoreTest.NonCachingFileStoreTestWithGoogleJobStore(methodName='runTest')
Bases: hidden
Abstract tests
for the the various functions in
:class:toil.fileStores.nonCachingFileStore.NonCachingFileStore.
These tests are general enough that they can also be used
for :class:toil.fileStores.CachingFileStore.
jobStoreType = 'google'
class
toil.test.src.fileStoreTest.CachingFileStoreTestWithGoogleJobStore(methodName='runTest')
Bases: hidden
Abstract tests
for the the various cache-related functions in
:class:toil.fileStores.cachingFileStore.CachingFileStore.
jobStoreType = 'google'
toil.test.src.helloWorldTest
Classes
Functions
Module Contents
class toil.test.src.helloWorldTest.HelloWorldTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
testHelloWorld()
class toil.test.src.helloWorldTest.HelloWorld
Bases: toil.job.Job
Class
represents a unit of work in toil.
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
toil.test.src.helloWorldTest.childFn(job)
class toil.test.src.helloWorldTest.FollowOn(fileId)
Bases: toil.job.Job
Class represents a unit of work in toil.
|
fileId |
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
toil.test.src.importExportFileTest
Classes
Module Contents
class
toil.test.src.importExportFileTest.ImportExportFileTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
setUp()
Hook method for setting up the test fixture before exercising it.
create_file(content,
executable=False)
test_import_export_restart_true()
test_import_export_restart_false()
test_basic_import_export()
Ensures that uploaded files preserve their file permissions when they are downloaded again. This function checks that an imported executable file maintains its executability after being exported.
class
toil.test.src.importExportFileTest.RestartingJob(msg_portion_file_id,
trigger_file_id, message_portion_2)
Bases: toil.job.Job
Class
represents a unit of work in toil.
msg_portion_file_id
trigger_file_id
message_portion_2
run(file_store)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
toil.test.src.jobDescriptionTest
Classes
Module Contents
class
toil.test.src.jobDescriptionTest.JobDescriptionTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
setUp()
Hook method for setting up the test fixture before exercising it.
tearDown()
Hook method for deconstructing the test fixture after testing it.
testJobDescription()
Tests the public interface of a JobDescription.
testJobDescriptionSequencing()
toil.test.src.jobEncapsulationTest
Classes
Functions
Module Contents
class
toil.test.src.jobEncapsulationTest.JobEncapsulationTest(methodName='runTest')
Bases: toil.test.ToilTest
Tests testing
the EncapsulationJob class.
testEncapsulation()
Tests the Job.encapsulation method, which uses the EncapsulationJob class.
testAddChildEncapsulate()
Make sure that the encapsulate child does not have two parents with unique roots.
toil.test.src.jobEncapsulationTest.noOp()
toil.test.src.jobEncapsulationTest.encapsulatedJobFn(job,
string,
outFile)
toil.test.src.jobFileStoreTest
Attributes
Classes
Functions
Module Contents
toil.test.src.jobFileStoreTest.logger
toil.test.src.jobFileStoreTest.PREFIX_LENGTH = 200
class
toil.test.src.jobFileStoreTest.JobFileStoreTest(methodName='runTest')
Bases: toil.test.ToilTest
Tests testing
the methods defined in
:class:toil.fileStores.abstractFileStore.AbstractFileStore.
testCachingFileStore()
testNonCachingFileStore()
testJobFileStore()
Tests case that about half the files are cached
testJobFileStoreWithBadWorker()
Tests case that about half the files are cached and the worker is randomly failing.
toil.test.src.jobFileStoreTest.fileTestJob(job,
inputFileStoreIDs,
testStrings, chainLength)
Test job exercises toil.fileStores.abstractFileStore.AbstractFileStore functions
toil.test.src.jobFileStoreTest.fileStoreString
= 'Testing
writeGlobalFile'
toil.test.src.jobFileStoreTest.streamingFileStoreString =
'Testing
writeGlobalFileStream'
toil.test.src.jobFileStoreTest.simpleFileStoreJob(job)
toil.test.src.jobFileStoreTest.fileStoreChild(job, testID1,
testID2)
toil.test.src.jobServiceTest
Attributes
Classes
Functions
Module Contents
toil.test.src.jobServiceTest.logger
class
toil.test.src.jobServiceTest.JobServiceTest(methodName='runTest')
Bases: toil.test.ToilTest
Tests testing
the Job.Service class
testServiceSerialization()
Tests that a service can receive a promise without producing a serialization error.
testService(checkpoint=False)
Tests the creation of a Job.Service with random failures of the worker.
testServiceDeadlock()
Creates a job with more services than maxServices, checks that deadlock is detected.
testServiceWithCheckpoints()
Tests the creation of a Job.Service with random failures of the worker, making the root job use checkpointing to restart the subtree.
testServiceRecursive(checkpoint=True)
Tests the creation of a Job.Service, creating a chain of services and accessing jobs. Randomly fails the worker.
testServiceParallelRecursive(checkpoint=True)
Tests the creation of a Job.Service, creating parallel chains of services and accessing jobs. Randomly fails the worker.
runToil(rootJob,
retryCount=1, badWorker=0.5,
badWorkedFailInterval=0.1, maxServiceJobs=sys.maxsize,
deadlockWait=60, max_attempts=50)
class
toil.test.src.jobServiceTest.PerfectServiceTest(methodName='runTest')
Bases: JobServiceTest
Tests testing
the Job.Service class
runToil(*args, **kwargs)
Let us run all the tests in the other service test class, but without worker failures.
toil.test.src.jobServiceTest.serviceTest(job, outFile, messageInt)
Creates one service and one accessing job, which communicate with two files to establish that both run concurrently.
toil.test.src.jobServiceTest.serviceTestRecursive(job,
outFile,
messages)
Creates a chain of services and accessing jobs, each paired together.
toil.test.src.jobServiceTest.serviceTestParallelRecursive(job,
outFiles, messageBundles)
Creates multiple chains of services and accessing jobs.
class
toil.test.src.jobServiceTest.ToyService(messageInt, *args,
**kwargs)
Bases: toil.job.Job.Service
Abstract class used to define the interface to a service.
Should be subclassed by the user to define services.
Is not executed
as a job; runs within a ServiceHostJob.
messageInt
start(job)
Start the service.
Parameters
job -- The underlying host job that the service is being run in. Can be used to register deferred functions, or to access the fileStore for creating temporary files.
Returns
An object describing how to access the service. The object must be pickleable and will be used by jobs to access the service (see toil.job.Job.addService() ).
stop(job)
Stops the service. Function can
block until complete.
Parameters
job -- The underlying host job that the service is being run in. Can be used to register deferred functions, or to access the fileStore for creating temporary files.
check()
Checks the service is still running.
|
Raises |
exceptions.RuntimeError -- If the service failed, this will cause the service job to be labeled failed. |
Returns
True if the service is still running, else False. If False then the service job will be terminated, and considered a success. Important point: if the service job exits due to a failure, it should raise a RuntimeError, not return False!
static
serviceWorker(jobStore, terminate, error, inJobStoreID,
outJobStoreID, messageInt)
toil.test.src.jobServiceTest.serviceAccessor(job,
communicationFiles,
outFile, to_subtract)
Writes a random integer iinto the inJobStoreFileID file, then tries 10 times reading from outJobStoreFileID to get a pair of integers, the first equal to i the second written into the outputFile.
class
toil.test.src.jobServiceTest.ToySerializableService(messageInt,
*args, **kwargs)
Bases: toil.job.Job.Service
Abstract class used to define the interface to a service.
Should be subclassed by the user to define services.
Is not executed
as a job; runs within a ServiceHostJob.
messageInt
start(job)
Start the service.
Parameters
job -- The underlying host job that the service is being run in. Can be used to register deferred functions, or to access the fileStore for creating temporary files.
Returns
An object describing how to access the service. The object must be pickleable and will be used by jobs to access the service (see toil.job.Job.addService() ).
stop(job)
Stops the service. Function can
block until complete.
Parameters
job -- The underlying host job that the service is being run in. Can be used to register deferred functions, or to access the fileStore for creating temporary files.
check()
Checks the service is still running.
|
Raises |
exceptions.RuntimeError -- If the service failed, this will cause the service job to be labeled failed. |
Returns
True if the service is still running, else False. If False then the service job will be terminated, and considered a success. Important point: if the service job exits due to a failure, it should raise a RuntimeError, not return False!
toil.test.src.jobServiceTest.fnTest(strings, outputFile)
Function concatenates the strings together and writes them to the output file
toil.test.src.jobTest
Attributes
Classes
Functions
Module Contents
toil.test.src.jobTest.logger
class
toil.test.src.jobTest.JobTest(methodName='runTest')
Bases: toil.test.ToilTest
Tests the job
class.
classmethod setUpClass()
Hook method for setting up class fixture before running tests in the class.
testStatic()
Create a DAG of jobs non-dynamically and run it. DAG is:
A -> F
\-------
B -> D \
\ \
------- C -> E
Follow on is marked by ->
testStatic2()
Create a DAG of jobs non-dynamically and run it. DAG is:
A -> F
\-------
B -> D \
\ \
------- C -> E
Follow on is marked by ->
testTrivialDAGConsistency()
testDAGConsistency()
testSiblingDAGConsistency()
Slightly more complex case. The stranded job's predecessors are siblings instead of parent/child.
testDeadlockDetection()
Randomly generate job graphs with various types of cycle in them and check they cause an exception properly. Also check that multiple roots causes a deadlock exception.
testNewCheckpointIsLeafVertexNonRootCase()
Test for issue #1465: Detection of checkpoint jobs that are not leaf vertices identifies leaf vertices incorrectly
Test verification of new checkpoint jobs being leaf vertices, starting with the following baseline workflow:
Parent
|
Child # Checkpoint=True
testNewCheckpointIsLeafVertexRootCase()
Test for issue #1466:
Detection of checkpoint jobs that
are not leaf vertices
omits the workflow root job
Test verification of a new checkpoint job being leaf vertex, starting with a baseline workflow of a single, root job:
Root # Checkpoint=True
runNewCheckpointIsLeafVertexTest(createWorkflowFn)
Test verification that a
checkpoint job is a leaf vertex using both valid and invalid
cases.
Parameters
createWorkflowFn --
function to create and new workflow and return a tuple of:
|
0. |
the workflow root job |
|||
|
1. |
a checkpoint job to test within the workflow |
runCheckpointVertexTest(workflowRootJob,
checkpointJob,
checkpointJobService=None, checkpointJobChild=None,
checkpointJobFollowOn=None, expectedException=None)
Modifies the checkpoint job according to the given parameters then runs the workflow, checking for the expected exception, if any.
testEvaluatingRandomDAG()
Randomly generate test input then check that the job graph can be run successfully, using the existence of promises to validate the run.
static
getRandomEdge(nodeNumber)
static makeRandomDAG(nodeNumber)
Makes a random dag with "nodeNumber" nodes in which all nodes are connected. Return value is list of edges, each of form (a, b), where a and b are integers >= 0 < nodeNumber referring to nodes and the edge is from a to b.
static getAdjacencyList(nodeNumber, edges)
Make adjacency list representation of edges
reachable(node, adjacencyList, followOnAdjacencyList=None)
Find the set of nodes reachable from this node (including the node). Return is a set of integers.
addRandomFollowOnEdges(childAdjacencyList)
Adds random follow on edges to the graph, represented as an adjacency list. The follow on edges are returned as a set and their augmented edges are added to the adjacency list.
makeJobGraph(nodeNumber,
childEdges, followOnEdges, outPath,
addServices=True)
Converts a DAG into a job graph. childEdges and followOnEdges are the lists of child and followOn edges.
isAcyclic(adjacencyList)
Returns true if there are any cycles in the graph, which is represented as an adjacency list.
toil.test.src.jobTest.simpleJobFn(job,
value)
toil.test.src.jobTest.fn1Test(string, outputFile)
Function appends the next character after the last character in the given string to the string, writes the string to a file, and returns it. For example, if string is "AB", we will write and return "ABC".
toil.test.src.jobTest.fn2Test(pStrings, s, outputFile)
Function concatenates the strings in pStrings and s, in that order, and writes the result to the output file. Returns s.
toil.test.src.jobTest.trivialParent(job)
toil.test.src.jobTest.parent(job)
toil.test.src.jobTest.diamond(job)
toil.test.src.jobTest.child(job)
toil.test.src.jobTest.errorChild(job)
class toil.test.src.jobTest.TrivialService(message, *args,
**kwargs)
Bases: toil.job.Job.Service
Abstract class used to define the interface to a service.
Should be subclassed by the user to define services.
Is not executed
as a job; runs within a ServiceHostJob.
message
start(job)
Start the service.
Parameters
job -- The underlying host job that the service is being run in. Can be used to register deferred functions, or to access the fileStore for creating temporary files.
Returns
An object describing how to access the service. The object must be pickleable and will be used by jobs to access the service (see toil.job.Job.addService() ).
stop(job)
Stops the service. Function can
block until complete.
Parameters
job -- The underlying host job that the service is being run in. Can be used to register deferred functions, or to access the fileStore for creating temporary files.
check()
Checks the service is still running.
|
Raises |
exceptions.RuntimeError -- If the service failed, this will cause the service job to be labeled failed. |
Returns
True if the service is still running, else False. If False then the service job will be terminated, and considered a success. Important point: if the service job exits due to a failure, it should raise a RuntimeError, not return False!
toil.test.src.miscTests
Attributes
Classes
Module Contents
toil.test.src.miscTests.log
class
toil.test.src.miscTests.MiscTests(methodName='runTest')
Bases: toil.test.ToilTest
This class
contains miscellaneous tests that don't have enough content
to be their own test file, and that don't logically fit in
with any of the other test suites.
setUp()
Hook method for setting up the test fixture before exercising it.
testIDStability()
testGetSizeOfDirectoryWorks()
A test to make sure toil.common.getDirSizeRecursively does not underestimate the amount of disk space needed.
Disk space allocation varies from system to system. The computed value should always be equal to or slightly greater than the creation value. This test generates a number of random directories and randomly sized files to test this using getDirSizeRecursively.
test_atomic_install()
test_atomic_install_dev()
test_atomic_context_ok()
test_atomic_context_error()
test_call_command_ok()
test_call_command_err()
class toil.test.src.miscTests.TestPanic(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
test_panic_by_hand()
test_panic()
test_panic_with_secondary()
test_nested_panic()
try_and_panic_by_hand()
try_and_panic()
try_and_panic_with_secondary()
try_and_nested_panic_with_secondary()
toil.test.src.promisedRequirementTest
Attributes
Classes
Functions
Module Contents
toil.test.src.promisedRequirementTest.log
class toil.test.src.promisedRequirementTest.hidden
Hide abstract base class from unittest's test case loader.
-
http://stackoverflow.com/questions/1323455/python-unit-test-with-base-and-sub-class#answer-25695512
class
AbstractPromisedRequirementsTest(methodName='runTest')
Bases: toil.test.batchSystems.batchSystemTest.hidden.AbstractBatchSystemJobTest
An abstract
base class for testing Toil workflows with promised
requirements.
testConcurrencyDynamic()
Asserts that promised core resources are allocated properly using a dynamic Toil workflow
testConcurrencyStatic()
Asserts that promised core resources are allocated properly using a static DAG
getOptions(tempDir, caching=True)
Configures options for Toil workflow and makes job store. :param str tempDir: path to test directory :return: Toil options object
getCounterPath(tempDir)
Returns path to a counter file :param str tempDir: path to test directory :return: path to counter file
testPromisesWithJobStoreFileObjects(caching=True)
Check whether FileID objects are being pickled properly when used as return values of functions. Then ensure that lambdas of promised FileID objects can be used to describe the requirements of a subsequent job. This type of operation will be used commonly in Toil scripts. :return: None
testPromisesWithNonCachingFileStore()
testPromiseRequirementRaceStatic()
Checks for a race condition when using promised requirements and child job functions.
toil.test.src.promisedRequirementTest.maxConcurrency(job,
cpuCount,
filename, coresPerJob)
Returns the max number of
concurrent tasks when using a PromisedRequirement instance
to allocate the number of cores per job.
Parameters
|
• |
cpuCount ( int ) -- number of available cpus |
||
|
• |
filename ( str ) -- path to counter file |
||
|
• |
coresPerJob ( int ) -- number of cores assigned to each job |
Return int max concurrency value
toil.test.src.promisedRequirementTest.getOne()
toil.test.src.promisedRequirementTest.getThirtyTwoMb()
toil.test.src.promisedRequirementTest.logDiskUsage(job,
funcName,
sleep=0)
Logs the job's disk usage to
master and sleeps for specified amount of time.
Returns
job function's disk usage
class
toil.test.src.promisedRequirementTest.SingleMachinePromisedRequirementsTest(methodName='runTest')
Bases: hidden
Tests against
the SingleMachine batch system
getBatchSystemName()
Return type
( str , AbstractBatchSystem )
tearDown()
Hook method for deconstructing the test fixture after testing it.
class
toil.test.src.promisedRequirementTest.MesosPromisedRequirementsTest(methodName='runTest')
Bases: hidden , toil.batchSystems.mesos.test.MesosTestSupport
Tests against
the Mesos batch system
getOptions(tempDir, caching=True)
Configures options for Toil workflow and makes job store. :param str tempDir: path to test directory :return: Toil options object
getBatchSystemName()
Return type
( str , AbstractBatchSystem )
tearDown()
Hook method for deconstructing the test fixture after testing it.
toil.test.src.promisesTest
Classes
Functions
Module Contents
class
toil.test.src.promisesTest.CachedUnpicklingJobStoreTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running tests you may optionally set the TOIL_TEST_TEMP environment variable to the path of a directory where you want temporary test files be placed. The directory will be created if it doesn't exist. The path may be relative in which case it will be assumed to be relative to the project root. If TOIL_TEST_TEMP is not defined, temporary files and directories will be created in the system's default location for such files and any temporary files or directories left over from tests will be removed automatically removed during tear down. Otherwise, left-over files will not be removed.
|
test() |
Runs two identical Toil workflows with different job store paths |
toil.test.src.promisesTest.parent(job)
toil.test.src.promisesTest.child()
class
toil.test.src.promisesTest.ChainedIndexedPromisesTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running tests you may optionally set the TOIL_TEST_TEMP environment variable to the path of a directory where you want temporary test files be placed. The directory will be created if it doesn't exist. The path may be relative in which case it will be assumed to be relative to the project root. If TOIL_TEST_TEMP is not defined, temporary files and directories will be created in the system's default location for such files and any temporary files or directories left over from tests will be removed automatically removed during tear down. Otherwise, left-over files will not be removed.
|
test() |
toil.test.src.promisesTest.a(job)
toil.test.src.promisesTest.b(job)
toil.test.src.promisesTest.c()
class
toil.test.src.promisesTest.PathIndexingPromiseTest(methodName='runTest')
Bases: toil.test.ToilTest
Test support for indexing promises of arbitrarily nested data structures of lists, dicts and tuples, or any other object supporting the __getitem__() protocol.
|
test() |
toil.test.src.promisesTest.d(job)
toil.test.src.promisesTest.e()
toil.test.src.realtimeLoggerTest
Classes
Module Contents
class
toil.test.src.realtimeLoggerTest.RealtimeLoggerTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
testRealtimeLogger()
class toil.test.src.realtimeLoggerTest.MessageDetector
Bases: logging.StreamHandler
Detect the
secret message and set a flag.
detected = False
overLogged = False
emit(record)
Emit a record.
If a formatter is specified, it is used to format the record. The record is then written to the stream with a trailing newline. If exception information is present, it is formatted using traceback.print_exception and appended to the stream. If the stream has an 'encoding' attribute, it is used to determine how to do the output to the stream.
class toil.test.src.realtimeLoggerTest.LogTest
Bases: toil.job.Job
Class
represents a unit of work in toil.
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
toil.test.src.regularLogTest
Attributes
Classes
Module Contents
toil.test.src.regularLogTest.logger
class
toil.test.src.regularLogTest.RegularLogTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
setUp()
Hook method for setting up the
test fixture before exercising it.
Return type
None
testLogToMaster()
testWriteLogs()
testWriteGzipLogs()
testMultipleLogToMaster()
testRegularLog()
toil.test.src.resourceTest
Classes
Functions
Module Contents
toil.test.src.resourceTest.tempFileContaining(content, suffix='')
Write a file with the given contents, and keep it on disk as long as the context is active. :param str content: The contents of the file. :param str suffix: The extension to use for the temporary file.
class toil.test.src.resourceTest.ResourceTest(methodName='runTest')
Bases: toil.test.ToilTest
Test module
descriptors and resources derived from them.
testStandAlone()
testPackage()
testVirtualEnv()
testStandAloneInPackage()
testBuiltIn()
testNonPyStandAlone()
Asserts that Toil enforces the user script to have a .py or .pyc extension because that's the only way auto-deployment can re-import the module on a worker. See
https://github.com/BD2KGenomics/toil/issues/631 and - https://github.com/BD2KGenomics/toil/issues/858
toil.test.src.restartDAGTest
Attributes
Classes
Functions
Module Contents
toil.test.src.restartDAGTest.logger
class
toil.test.src.restartDAGTest.RestartDAGTest(methodName='runTest')
Bases: toil.test.ToilTest
Tests that
restarted job DAGs don't run children of jobs that failed in
the first run till the parent completes successfully in the
restart.
setUp()
Hook method for setting up the test fixture before exercising it.
tearDown()
Hook method for deconstructing the test fixture after testing it.
testRestartedWorkflowSchedulesCorrectJobsOnFailedParent()
testRestartedWorkflowSchedulesCorrectJobsOnKilledParent()
toil.test.src.restartDAGTest.passingFn(job, fileName=None)
This function is guaranteed to
pass as it does nothing out of the ordinary. If fileName is
provided, it will be created.
Parameters
fileName ( str ) -- The name of a file that must be created if provided.
toil.test.src.restartDAGTest.failingFn(job, failType, fileName)
This function is guaranteed to
fail via a raised assertion, or an os.kill
Parameters
|
• |
job -- Job |
||
|
• |
failType ( str ) -- 'raise' or 'kill |
||
|
• |
fileName ( str ) -- The name of a file that must be created. |
toil.test.src.resumabilityTest
Classes
Functions
Module Contents
class
toil.test.src.resumabilityTest.ResumabilityTest(methodName='runTest')
Bases: toil.test.ToilTest
https://github.com/BD2KGenomics/toil/issues/808
|
test() |
Tests that a toil workflow that fails once can be resumed without a NoSuchJobException. |
test_chaining()
Tests that a job which is chained to and fails can resume and succeed.
toil.test.src.resumabilityTest.parent(job)
Set up a bunch of dummy child jobs, and a bad job that needs to be restarted as the follow on.
toil.test.src.resumabilityTest.chaining_parent(job)
Set up a failing job to chain to.
toil.test.src.resumabilityTest.goodChild(job)
Does nothing.
toil.test.src.resumabilityTest.badChild(job)
Fails the first time it's run, succeeds the second time.
toil.test.src.retainTempDirTest
Classes
Functions
Module Contents
class
toil.test.src.retainTempDirTest.CleanWorkDirTest(methodName='runTest')
Bases: toil.test.ToilTest
Tests testing
:class:toil.fileStores.abstractFileStore.AbstractFileStore
setUp()
Hook method for setting up the test fixture before exercising it.
tearDown()
Hook method for deconstructing the test fixture after testing it.
testNever()
testAlways()
testOnErrorWithError()
testOnErrorWithNoError()
testOnSuccessWithError()
testOnSuccessWithSuccess()
toil.test.src.retainTempDirTest.tempFileTestJob(job)
toil.test.src.retainTempDirTest.tempFileTestErrorJob(job)
toil.test.src.systemTest
Classes
Module Contents
class toil.test.src.systemTest.SystemTest(methodName='runTest')
Bases: toil.test.ToilTest
Test various
assumptions about the operating system's behavior.
testAtomicityOfNonEmptyDirectoryRenames()
toil.test.src.threadingTest
Attributes
Classes
Module Contents
toil.test.src.threadingTest.log
class
toil.test.src.threadingTest.ThreadingTest(methodName='runTest')
Bases: toil.test.ToilTest
Test Toil
threading/synchronization tools.
testGlobalMutexOrdering()
testLastProcessStanding()
toil.test.src.toilContextManagerTest
Classes
Functions
Module Contents
class
toil.test.src.toilContextManagerTest.ToilContextManagerTest(methodName='runTest')
Bases: toil.test.ToilTest
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
setUp()
Hook method for setting up the test fixture before exercising it.
tearDown()
Hook method for deconstructing the test fixture after testing it.
testContextManger()
testNoContextManger()
testExportAfterFailedExport()
class toil.test.src.toilContextManagerTest.HelloWorld
Bases: toil.job.Job
Class
represents a unit of work in toil.
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
toil.test.src.toilContextManagerTest.childFn(job)
class
toil.test.src.toilContextManagerTest.FollowOn(fileId)
Bases: toil.job.Job
Class represents a unit of work in toil.
|
fileId |
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
toil.test.src.userDefinedJobArgTypeTest
Classes
Functions
Module Contents
class
toil.test.src.userDefinedJobArgTypeTest.UserDefinedJobArgTypeTest(methodName='runTest')
Bases: toil.test.ToilTest
Test for issue #423 (Toil can't unpickle classes defined in user scripts) and variants thereof.
https://github.com/BD2KGenomics/toil/issues/423
setUp()
Hook method for setting up the test fixture before exercising it.
testJobFunction()
Test with first job being a function
testJobClass()
Test with first job being an instance of a class
testJobFunctionFromMain()
Test with first job being a function defined in __main__
testJobClassFromMain()
Test with first job being an instance of a class defined in __main__
class toil.test.src.userDefinedJobArgTypeTest.JobClass(level, foo)
Bases: toil.job.Job
Class represents a unit of work in toil.
|
level |
||
|
foo |
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
toil.test.src.userDefinedJobArgTypeTest.jobFunction(job,
level, foo)
class toil.test.src.userDefinedJobArgTypeTest.Foo
original_id
assertIsCopy()
toil.test.src.userDefinedJobArgTypeTest.main()
toil.test.src.workerTest
Classes
Module Contents
class toil.test.src.workerTest.WorkerTests(methodName='runTest')
Bases: toil.test.ToilTest
Test
miscellaneous units of the worker.
setUp()
Hook method for setting up the test fixture before exercising it.
testNextChainable()
Make sure chainable/non-chainable jobs are identified correctly.
toil.test.utils
Submodules
toil.test.utils.toilDebugTest
Attributes
Classes
Functions
Module Contents
toil.test.utils.toilDebugTest.logger
toil.test.utils.toilDebugTest.workflow_debug_jobstore()
Return type
str
toil.test.utils.toilDebugTest.testJobStoreContents()
Test toilDebugFile.printContentsOfJobStore().
Runs a workflow that imports 'B.txt' and 'mkFile.py' into the jobStore. 'A.txt', 'C.txt', 'ABC.txt' are then created. This checks to make sure these contents are found in the jobStore and printed.
toil.test.utils.toilDebugTest.fetchFiles(symLink,
jobStoreDir,
outputDir)
Fn for testFetchJobStoreFiles() and testFetchJobStoreFilesWSymlinks().
Runs a workflow
that imports 'B.txt' and 'mkFile.py' into the jobStore.
'A.txt', 'C.txt', 'ABC.txt' are then created. This test then
attempts to get a list of these files and copy them over
into our output diectory from the jobStore, confirm that
they are present, and then delete them.
Parameters
|
• |
symLink ( bool ) |
|||
|
• |
jobStoreDir ( str ) |
|||
|
• |
outputDir ( str ) |
toil.test.utils.toilDebugTest.testFetchJobStoreFiles()
Test
toilDebugFile.fetchJobStoreFiles() symlinks.
Return type
None
class toil.test.utils.toilDebugTest.DebugJobTest(methodName='runTest')
Bases: toil.test.ToilTest
Test the toil
debug-job command.
test_run_job()
Make sure that we can use toil debug-job to try and run a job in-process.
test_print_job_info()
Make sure that we can use --printJobInfo to get information on a job from a job store.
test_retrieve_task_directory()
Make sure that we can use --retrieveTaskDirectory to get the input files for a job.
toil.test.utils.toilKillTest
Attributes
Classes
Module Contents
toil.test.utils.toilKillTest.logger
toil.test.utils.toilKillTest.pkg_root
class toil.test.utils.toilKillTest.ToilKillTest(*args,
**kwargs)
Bases: toil.test.ToilTest
A set of test
cases for "toil kill".
job_store
setUp()
Shared test variables.
tearDown()
Default tearDown for unittest.
test_cwl_toil_kill()
Test "toil kill" on a CWL workflow with a 100 second sleep.
class
toil.test.utils.toilKillTest.ToilKillTestWithAWSJobStore(*args,
**kwargs)
Bases: ToilKillTest
A set of test
cases for "toil kill" using the AWS job store.
setUp()
Shared test variables.
toil.test.utils.utilsTest
Attributes
Classes
Functions
Module Contents
toil.test.utils.utilsTest.pkg_root
toil.test.utils.utilsTest.logger
class
toil.test.utils.utilsTest.UtilsTest(methodName='runTest')
Bases: toil.test.ToilTest
Tests the
utilities that toil ships with, e.g. stats and status, in
conjunction with restart functionality.
setUp()
Hook method for setting up the test fixture before exercising it.
tearDown()
Hook method for deconstructing the test fixture after testing it.
property toilMain
property cleanCommand
property statsCommand
statusCommand(failIfNotComplete=False)
test_config_functionality()
Ensure that creating and reading back the config file works
testAWSProvisionerUtils()
Runs a number of the cluster utilities in sequence.
Launches a
cluster with custom tags. Verifies the tags exist. ssh's
into the cluster. Does some weird string comparisons. Makes
certain that TOIL_WORKDIR is set as expected in the ssh'ed
cluster. Rsyncs a file and verifies it exists on the leader.
Destroys the cluster.
Returns
testUtilsSort()
Tests the status and stats commands of the toil command line utility using the sort example with the --restart flag.
testUtilsStatsSort()
Tests the stats commands on a complete run of the stats test.
testUnicodeSupport()
testMultipleJobsPerWorkerStats()
Tests case where multiple jobs are run on 1 worker to ensure that all jobs report back their data
check_status(status,
status_fn, process=None, seconds=20)
testGetPIDStatus()
Test that ToilStatus.getPIDStatus() behaves as expected.
testGetStatusFailedToilWF()
Test that ToilStatus.getStatus() behaves as expected with a failing Toil workflow. While this workflow could be called by importing and evoking its main function, doing so would remove the opportunity to test the 'RUNNING' functionality of getStatus().
testGetStatusFailedCWLWF()
Test that ToilStatus.getStatus() behaves as expected with a failing CWL workflow.
testGetStatusSuccessfulCWLWF()
Test that ToilStatus.getStatus() behaves as expected with a successful CWL workflow.
testPrintJobLog(mock_print)
Test that ToilStatus.printJobLog() reads the log from a failed command without error.
testRestartAttribute()
Test that the job store is only destroyed when we observe a successful workflow run. The following simulates a failing workflow that attempts to resume without restart(). In this case, the job store should not be destroyed until restart() is called.
toil.test.utils.utilsTest.printUnicodeCharacter()
class toil.test.utils.utilsTest.RunTwoJobsPerWorker
Bases: toil.job.Job
Runs child job
with same resources as self in an attempt to chain the jobs
on the same worker
run(fileStore)
Override this function to
perform work and dynamically create successor jobs.
Parameters
fileStore -- Used to create local and globally sharable temporary files and to send log messages to the leader process.
Returns
The return value of the function can be passed to other jobs by means of toil.job.Job.rv() .
toil.test.wdl
Submodules
toil.test.wdl.wdltoil_test
Attributes
Classes
Module Contents
toil.test.wdl.wdltoil_test.logger
class
toil.test.wdl.wdltoil_test.BaseWDLTest(methodName='runTest')
Bases: toil.test.ToilTest
Base test class
for WDL tests.
setUp()
Runs anew before each test to
create farm fresh temp dirs.
Return type
None
tearDown()
Hook method for deconstructing
the test fixture after testing it.
Return type
None
toil.test.wdl.wdltoil_test.WDL_CONFORMANCE_TEST_REPO
=
'https://github.com/DataBiosphere/wdl-conformance-tests.git'
toil.test.wdl.wdltoil_test.WDL_CONFORMANCE_TEST_COMMIT =
'baf44bcc7e6f6927540adf77d91b26a5558ae4b7'
toil.test.wdl.wdltoil_test.WDL_CONFORMANCE_TESTS_UNSUPPORTED_BY_TOIL
=
[16, 21, 64, 77]
toil.test.wdl.wdltoil_test.WDL_UNIT_TESTS_UNSUPPORTED_BY_TOIL
= [14,
19, 52, 58, 59, 66, 67, 68, 69, 87, 97, 105, 107, 108, 109,
110, 120,
131, 134, 144]
class
toil.test.wdl.wdltoil_test.WDLConformanceTests(methodName='runTest')
Bases: BaseWDLTest
WDL conformance
tests for Toil.
wdl_dir = 'wdl-conformance-tests'
classmethod setUpClass()
Hook method for setting up
class fixture before running tests in the class.
Return type
None
check(p)
Make sure a call completed or
explain why it failed.
Parameters
p ( subprocess.CompletedProcess )
Return type
None
test_unit_tests_v11()
test_conformance_tests_v10()
test_conformance_tests_v11()
test_conformance_tests_integration()
classmethod tearDownClass()
Hook method for deconstructing
the class fixture after running all tests in the class.
Return type
None
class toil.test.wdl.wdltoil_test.WDLTests(methodName='runTest')
Bases: BaseWDLTest
Tests for
Toil's MiniWDL-based implementation.
classmethod setUpClass()
Runs once for all tests.
Return type
None
test_MD5sum()
Test if Toil produces the same outputs as known good outputs for WDL's GATK tutorial #1.
test_url_to_file()
Test if web URL strings can be coerced to usable Files.
test_wait()
Test if Bash "wait" works in WDL scripts.
test_all_call_outputs()
Test if Toil can collect all call outputs from a workflow that doesn't expose them.
test_croo_detection()
Test if Toil can detect and do something sensible with Cromwell Output Organizer workflows.
test_caching()
Test if Toil can cache task runs.
test_url_to_optional_file()
Test if missing and error-producing URLs are handled correctly for optional File? values.
test_missing_output_directory()
Test if Toil can run a WDL workflow into a new directory.
test_miniwdl_self_test(extra_args=None)
Test if the MiniWDL self test
runs and produces the expected output.
Parameters
extra_args ( Optional[list[str]] )
Return type
None
test_miniwdl_self_test_by_reference()
Test if the MiniWDL self test
works when passing input files by URL reference.
Return type
None
test_dockstore_trs(extra_args=None)
Parameters
extra_args ( Optional[list[str]] )
Return type
None
test_giraffe_deepvariant()
Test if Giraffe and GPU DeepVariant run. This could take 25 minutes.
test_giraffe()
Test if Giraffe runs. This could take 12 minutes. Also we scale it down but it still demands lots of memory.
test_gs_uri()
Test if Toil can access Google Storage URIs.
class
toil.test.wdl.wdltoil_test.WDLToilBenchTests(methodName='runTest')
Bases: toil.test.ToilTest
Tests for
Toil's MiniWDL-based implementation that don't run
workflows.
test_coalesce()
Test if WDLSectionJob can coalesce WDL decls.
White box test; will need to be changed or removed if the WDL interpreter changes.
make_string_expr(to_parse)
Parse pseudo-WDL for testing
whitespace removal.
Parameters
to_parse ( str )
Return type
WDL.Expr.String
test_remove_common_leading_whitespace()
Make sure leading whitespace removal works properly.
test_choose_human_readable_directory()
Test to make sure that we pick sensible but non-colliding directories to put files in.
test_uri_packing()
Test to make sure Toil URI packing brings through the required information.
test_disk_parse()
Test to make sure the disk parsing is correct
toil.test.wdl.wdltoil_test_kubernetes
Classes
Module Contents
class
toil.test.wdl.wdltoil_test_kubernetes.WDLKubernetesClusterTest(name)
Bases: toil.test.provisioners.clusterTest.AbstractClusterTest
Ensure WDL
works on the Kubernetes batchsystem.
clusterName
leaderNodeType = 't2.medium'
instanceTypes = ['t2.medium']
clusterType = 'kubernetes'
setUp()
Set up for the test. Must be
overridden to call this method and set self.jobStore.
Return type
None
launchCluster()
Return type
None
test_wdl_kubernetes_cluster()
Test that a wdl workflow works on a kubernetes cluster. Launches a cluster with 1 worker. This runs a wdl workflow that performs an image pull on the worker. :return:
Attributes
Classes
Functions
Package Contents
toil.test.logger
toil.test.get_data(filename)
Returns an absolute path for a
file from this package.
Parameters
filename ( str )
Return type
str
class toil.test.ToilTest(methodName='runTest')
Bases: unittest.TestCase
A common base class for Toil tests.
Please have every test case directly or indirectly inherit this one.
When running
tests you may optionally set the TOIL_TEST_TEMP environment
variable to the path of a directory where you want temporary
test files be placed. The directory will be created if it
doesn't exist. The path may be relative in which case it
will be assumed to be relative to the project root. If
TOIL_TEST_TEMP is not defined, temporary files and
directories will be created in the system's default location
for such files and any temporary files or directories left
over from tests will be removed automatically removed during
tear down. Otherwise, left-over files will not be removed.
setup_method(method)
Parameters
method ( Any )
Return type
None
classmethod setUpClass()
Hook method for setting up
class fixture before running tests in the class.
Return type
None
classmethod tearDownClass()
Hook method for deconstructing
the class fixture after running all tests in the class.
Return type
None
setUp()
Hook method for setting up the
test fixture before exercising it.
Return type
None
tearDown()
Hook method for deconstructing
the test fixture after testing it.
Return type
None
classmethod awsRegion()
Pick an appropriate AWS region.
Use us-west-2
unless running on EC2, in which case use the region in which
the instance is located
Return type
str
toil.test.MT
toil.test.get_temp_file(suffix='', rootDir=None)
Return a string representing a
temporary file, that must be manually deleted.
Parameters
|
• |
suffix ( str ) |
|||
|
• |
rootDir ( Optional[str] ) |
Return type
str
toil.test.needs_env_var(var_name, comment=None)
Use as a decorator before test
classes or methods to run only if the given environment
variable is set. Can include a comment saying what the
variable should be set to.
Parameters
|
• |
var_name ( str ) |
|||
|
• |
comment ( Optional[str] ) |
Return type
Callable[[MT], MT]
toil.test.needs_rsync3(test_item)
Decorate classes or methods that depend on any features from rsync version 3.0.0+.
Necessary
because
utilsTest.testAWSProvisionerUtils()
uses
option
--protect-args
which is only available in
rsync 3
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_online(test_item)
Use as a decorator before test
classes or methods to run only if we are meant to talk to
the Internet.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_aws_s3(test_item)
Use as a decorator before test
classes or methods to run only if AWS S3 is usable.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_aws_ec2(test_item)
Use as a decorator before test
classes or methods to run only if AWS EC2 is usable.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_aws_batch(test_item)
Use as a decorator before test
classes or methods to run only if AWS Batch is usable.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_google_storage(test_item)
Use as a decorator before test
classes or methods to run only if Google Cloud is installed
and we ought to be able to access public Google Storage
URIs.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_google_project(test_item)
Use as a decorator before test
classes or methods to run only if we have a Google Cloud
project set.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_gridengine(test_item)
Use as a decorator before test
classes or methods to run only if GridEngine is installed.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_torque(test_item)
Use as a decorator before test
classes or methods to run only if PBS/Torque is installed.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_kubernetes_installed(test_item)
Use as a decorator before test
classes or methods to run only if Kubernetes is installed.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_kubernetes(test_item)
Use as a decorator before test
classes or methods to run only if Kubernetes is installed
and configured.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_mesos(test_item)
Use as a decorator before test
classes or methods to run only if Mesos is installed.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_slurm(test_item)
Use as a decorator before test
classes or methods to run only if Slurm is installed.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_htcondor(test_item)
Use a decorator before test
classes or methods to run only if the HTCondor is installed.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_lsf(test_item)
Use as a decorator before test
classes or methods to only run them if LSF is installed.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_java(test_item)
Use as a test decorator to run
only if java is installed.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_docker(test_item)
Use as a decorator before test
classes or methods to only run them if docker is installed
and docker-based tests are enabled.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_singularity(test_item)
Use as a decorator before test
classes or methods to only run them if singularity is
installed.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_singularity_or_docker(test_item)
Use as a decorator before test
classes or methods to only run them if docker is installed
and docker-based tests are enabled, or if Singularity is
installed.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_local_cuda(test_item)
Use as a decorator before test
classes or methods to only run them if a CUDA setup legible
to cwltool (i.e. providing userspace nvidia-smi) is present.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_docker_cuda(test_item)
Use as a decorator before test
classes or methods to only run them if a CUDA setup is
available through Docker.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_encryption(test_item)
Use as a decorator before test
classes or methods to only run them if PyNaCl is installed
and configured.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_cwl(test_item)
Use as a decorator before test
classes or methods to only run them if CWLTool is installed
and configured.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_wdl(test_item)
Use as a decorator before test
classes or methods to only run them if miniwdl is installed
and configured.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_server(test_item)
Use as a decorator before test
classes or methods to only run them if Connexion is
installed.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_celery_broker(test_item)
Use as a decorator before test
classes or methods to run only if RabbitMQ is set up to take
Celery jobs.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_wes_server(test_item)
Use as a decorator before test
classes or methods to run only if a WES server is available
to run against.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_local_appliance(test_item)
Use as a decorator before test
classes or methods to only run them if the Toil appliance
Docker image is downloaded.
Parameters
test_item ( MT )
Return type
MT
toil.test.needs_fetchable_appliance(test_item)
Use as a decorator before test
classes or methods to only run them if the Toil appliance
Docker image is able to be downloaded from the Internet.
Parameters
test_item ( MT )
Return type
MT
toil.test.integrative(test_item)
Use this to decorate integration tests so as to skip them during regular builds.
We define
integration tests as A) involving other, non-Toil software
components that we develop and/or B) having a higher cost
(time or money).
Parameters
test_item ( MT )
Return type
MT
toil.test.slow(test_item)
Use this decorator to identify
tests that are slow and not critical. Skip if
TOIL_TEST_QUICK is true.
Parameters
test_item ( MT )
Return type
MT
toil.test.methodNamePartRegex
toil.test.timeLimit(seconds)
Use to limit the execution time of a function.
Raises an
exception if the execution of the function takes more than
the specified amount of time. See <-
http://stackoverflow.com/a/601168
>.
Parameters
seconds ( int ) -- maximum allowable time, in seconds
Return type
collections.abc.Generator [None, None, None]
>>>
import time
>>> with timeLimit(2):
... time.sleep(1)
>>> import time
>>> with timeLimit(1):
... time.sleep(2)
Traceback (most recent call last):
...
RuntimeError: Timed out
toil.test.make_tests(generalMethod, targetClass, **kwargs)
This method dynamically generates test methods using the generalMethod as a template. Each generated function is the result of a unique combination of parameters applied to the generalMethod. Each of the parameters has a corresponding string that will be used to name the method. These generated functions are named in the scheme: test_[generalMethodName]___[ firstParamaterName]_[someValueName]__[secondParamaterName]_...
The arguments following the generalMethodName should be a series of one or more dictionaries of the form {str : type, ...} where the key represents the name of the value. The names will be used to represent the permutation of values passed for each parameter in the generalMethod.
The generated
method names will list the parameters in lexicographic order
by parameter name.
Parameters
|
• |
generalMethod -- A method that will be parameterized with values passed as kwargs. Note that the generalMethod must be a regular method. |
||
|
• |
targetClass -- This represents the class to which the generated test methods will be bound. If no targetClass is specified the class of the generalMethod is assumed the target. |
||
|
• |
kwargs -- a series of dictionaries defining values, and their respective names where each keyword is the name of a parameter in generalMethod. |
>>>
class Foo:
... def has(self, num, letter):
... return num, letter
...
... def hasOne(self, num):
... return num
>>>
class Bar(Foo):
... pass
>>> make_tests(Foo.has, Bar, num={'one':1, 'two':2}, letter={'a':'a', 'b':'b'})
>>> b = Bar()
Note that num comes lexicographically before letter and so appears first in the generated method names.
>>> assert b.test_has__letter_a__num_one() == b.has(1, 'a')
>>> assert b.test_has__letter_b__num_one() == b.has(1, 'b')
>>> assert b.test_has__letter_a__num_two() == b.has(2, 'a')
>>> assert b.test_has__letter_b__num_two() == b.has(2, 'b')
>>> f = Foo()
>>>
hasattr(f, 'test_has__num_one__letter_a') # should be false
because Foo has no test methods
False
class toil.test.ApplianceTestSupport(methodName='runTest')
Bases: ToilTest
A Toil test that runs a user script on a minimal cluster of appliance containers.
i.e. one leader
container and one worker container.
class Appliance(outer, mounts, cleanMounts=False)
Bases: toil.lib.threading.ExceptionalThread
A thread whose join() method re-raises exceptions raised during run(). While join() is idempotent, the exception is only during the first invocation of join() that successfully joined the thread. If join() times out, no exception will be re reraised even though an exception might already have occurred in run().
When subclassing this thread, override tryRun() instead of run().
>>>
def f():
... assert 0
>>> t = ExceptionalThread(target=f)
>>> t.start()
>>> t.join()
Traceback (most recent call last):
...
AssertionError
>>>
class MyThread(ExceptionalThread):
... def tryRun( self ):
... assert 0
>>> t = MyThread()
>>> t.start()
>>> t.join()
Traceback (most recent call last):
...
AssertionError
Parameters
|
• |
outer ( ApplianceTestSupport ) |
|||
|
• |
mounts ( dict[str, str] ) |
|||
|
• |
cleanMounts ( bool ) |
|||
|
lock |
||||
|
outer |
||||
|
mounts |
cleanMounts
containerName
popen:
subprocess.Popen
[
bytes
]
|
None
= None
__enter__()
Return type
Appliance
__exit__(exc_type, exc_val, exc_tb)
Parameters
|
• |
exc_type ( type[BaseException] ) |
|||
|
• |
exc_val ( Exception ) |
|||
|
• |
exc_tb ( Any ) |
Return type
Literal[False]
tryRun()
Return type
None
runOnAppliance(*args, **kwargs)
Parameters
|
• |
args ( str ) |
|||
|
• |
kwargs ( Any ) |
Return type
None
writeToAppliance(path, contents)
Parameters
|
• |
path ( str ) |
|||
|
• |
contents ( Any ) |
Return type
None
deployScript(path, packagePath, script)
Deploy a Python module on the
appliance.
Parameters
|
• |
path ( str ) -- the path (absolute or relative to the WORDIR of the appliance container) to the root of the package hierarchy where the given module should be placed. The given directory should be on the Python path. |
||
|
• |
packagePath ( str ) -- the desired fully qualified module name (dotted form) of the module |
||
|
• |
script ( str|callable ) -- the contents of the Python module. If a callable is given, its source code will be extracted. This is a convenience that lets you embed user scripts into test code as nested function. |
Return type
None
class LeaderThread(outer, mounts, cleanMounts=False)
Bases: Appliance
A thread whose join() method re-raises exceptions raised during run(). While join() is idempotent, the exception is only during the first invocation of join() that successfully joined the thread. If join() times out, no exception will be re reraised even though an exception might already have occurred in run().
When subclassing this thread, override tryRun() instead of run().
>>>
def f():
... assert 0
>>> t = ExceptionalThread(target=f)
>>> t.start()
>>> t.join()
Traceback (most recent call last):
...
AssertionError
>>>
class MyThread(ExceptionalThread):
... def tryRun( self ):
... assert 0
>>> t = MyThread()
>>> t.start()
>>> t.join()
Traceback (most recent call last):
...
AssertionError
Parameters
|
• |
outer ( ApplianceTestSupport ) |
|||
|
• |
mounts ( dict[str, str] ) |
|||
|
• |
cleanMounts ( bool ) |
class WorkerThread(outer, mounts, numCores)
Bases: Appliance
A thread whose join() method re-raises exceptions raised during run(). While join() is idempotent, the exception is only during the first invocation of join() that successfully joined the thread. If join() times out, no exception will be re reraised even though an exception might already have occurred in run().
When subclassing this thread, override tryRun() instead of run().
>>>
def f():
... assert 0
>>> t = ExceptionalThread(target=f)
>>> t.start()
>>> t.join()
Traceback (most recent call last):
...
AssertionError
>>>
class MyThread(ExceptionalThread):
... def tryRun( self ):
... assert 0
>>> t = MyThread()
>>> t.start()
>>> t.join()
Traceback (most recent call last):
...
AssertionError
Parameters
|
• |
outer ( ApplianceTestSupport ) |
|||
|
• |
mounts ( dict[str, str] ) |
|||
|
• |
numCores ( int ) |
numCores
toil.toilState
Attributes
Classes
Module Contents
toil.toilState.logger
class toil.toilState.ToilState(jobStore)
Holds the leader's scheduling information.
But onlt that which does not need to be persisted back to the JobStore (such as information on completed and outstanding predecessors)
Holds the true single copies of all JobDescription objects that the Leader and ServiceManager will use. The leader and service manager shouldn't do their own load() and update() calls on the JobStore; they should go through this class.
Everything in the leader should reference JobDescriptions by ID.
Only holds
JobDescription objects, not Job objects, and those
JobDescription objects only exist in single copies.
Parameters
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore )
|
bus |
successor_to_predecessors:
dict
[
str
,
set
[
str
]]
successorCounts:
dict
[
str
,
int
]
service_to_client:
dict
[
str
,
str
]
servicesIssued:
dict
[
str
,
set
[
str
]]
jobs_issued:
set
[
str
]
totalFailedJobs:
set
[
str
]
hasFailedSuccessors:
set
[
str
]
failedSuccessors:
set
[
str
]
jobsToBeScheduledWithMultiplePredecessors:
set
[
str
]
load_workflow(rootJob, jobCache=None)
Load the workflow rooted at the given job.
If jobs are loaded that have updated and need to be dealt with by the leader, JobUpdatedMessage messages will be sent to the message bus.
The jobCache is
a map from jobStoreID to JobDescription or None. Is used to
speed up the building of the state when loading initially
from the JobStore, and is not preserved.
Parameters
|
• |
rootJob ( toil.job.JobDescription ) -- The description for the root job of the workflow being run. |
||
|
• |
jobCache ( Optional[dict[str, toil.job.JobDescription]] ) -- A dict to cache downloaded job descriptions in, keyed by ID. |
Return type
None
job_exists(job_id)
Test if the givin job exists now.
Returns True if the given job exists right now, and false if it hasn't been created or it has been deleted elsewhere.
Doesn't
guarantee that the job will or will not be gettable, if
racing another process, or if it is still cached.
Parameters
job_id ( str )
Return type
bool
get_job(job_id)
Get the one true copy of the
JobDescription with the given ID.
Parameters
job_id ( str )
Return type
toil.job.JobDescription
commit_job(job_id)
Save back any modifications made to a JobDescription.
(one retrieved
from get_job())
Parameters
job_id ( str )
Return type
None
delete_job(job_id)
Destroy a JobDescription.
May raise an
exception if the job could not be cleaned up (i.e. files
belonging to it failed to delete).
Parameters
job_id ( str )
Return type
None
reset_job(job_id)
Discard any local modifications to a JobDescription.
Will make
modifications from other hosts visible.
Parameters
job_id ( str )
Return type
None
reset_job_expecting_change(job_id, timeout)
Discard any local modifications to a JobDescription.
Will make modifications from other hosts visible.
Will wait for up to timeout seconds for a modification (or deletion) from another host to actually be visible.
Always replaces the JobDescription with what is stored in the job store, even if no modification ends up being visible.
Returns True if
an update was detected in time, and False otherwise.
Parameters
|
• |
job_id ( str ) |
|||
|
• |
timeout ( float ) |
Return type
bool
successors_pending(predecessor_id, count)
Remember that the given job has the given number more pending successors.
(that have not
yet succeeded or failed.)
Parameters
|
• |
predecessor_id ( str ) |
|||
|
• |
count ( int ) |
Return type
None
successor_returned(predecessor_id)
Remember that the given job has one fewer pending successors.
(because one
has succeeded or failed.)
Parameters
predecessor_id ( str )
Return type
None
count_pending_successors(predecessor_id)
Count number of pending successors of the given job.
Pending
successors are those which have not yet succeeded or failed.
Parameters
predecessor_id ( str )
Return type
int
toil.utils
Submodules
toil.utils.toilClean
Delete a job store used by a previous Toil workflow invocation.
Attributes
Functions
Module Contents
toil.utils.toilClean.logger
toil.utils.toilClean.main()
Return type
None
toil.utils.toilConfig
Create a config file with all default Toil options.
Attributes
Functions
Module Contents
toil.utils.toilConfig.logger
toil.utils.toilConfig.main()
Return type
None
toil.utils.toilDebugFile
Debug tool for copying files contained in a toil jobStore.
Attributes
Functions
Module Contents
toil.utils.toilDebugFile.logger
toil.utils.toilDebugFile.fetchJobStoreFiles(jobStore,
options)
Takes a list of file names as
glob patterns, searches for these within a given directory,
and attempts to take all of the files found and copy them
into options.localFilePath.
Parameters
|
• |
jobStore ( toil.jobStores.fileJobStore.FileJobStore ) -- A fileJobStore object. |
||
|
• |
options.fetch -- List of file glob patterns to search for in the jobStore and copy into options.localFilePath. |
||
|
• |
options.localFilePath -- Local directory to copy files into. |
||
|
• |
options.jobStore -- The path to the jobStore directory. |
||
|
• |
options ( argparse.Namespace ) |
Return type
None
toil.utils.toilDebugFile.printContentsOfJobStore(job_store,
job_id=None)
Fetch a list of all files
contained in the job store if nameOfJob is not declared,
otherwise it only prints out the names of files for that
specific job for which it can find a match. Also creates a
log file of these file names in the current directory.
Parameters
|
• |
job_store ( toil.jobStores.fileJobStore.FileJobStore ) -- Job store to ask for files from. |
||
|
• |
job_id ( Optional[str] ) -- Default is None, which prints out all files in the jobStore. If specified, it will print all jobStore files that have been written to the jobStore by that job. |
Return type
None
toil.utils.toilDebugFile.main()
Return type
None
toil.utils.toilDebugJob
Debug tool for running a toil job locally.
Attributes
Functions
Module Contents
toil.utils.toilDebugJob.logger
toil.utils.toilDebugJob.main()
Return type
None
toil.utils.toilDestroyCluster
Terminates the specified cluster and associated resources.
Attributes
Functions
Module Contents
toil.utils.toilDestroyCluster.logger
toil.utils.toilDestroyCluster.main()
Return type
None
toil.utils.toilKill
Kills rogue toil processes.
Attributes
Functions
Module Contents
toil.utils.toilKill.logger
toil.utils.toilKill.main()
Return type
None
toil.utils.toilLaunchCluster
Launches a toil leader instance with the specified provisioner.
Attributes
Functions
Module Contents
toil.utils.toilLaunchCluster.build_tag_dict_from_env:
dict
[
str
,
str
]
toil.utils.toilLaunchCluster.logger
toil.utils.toilLaunchCluster.create_tags_dict(tags)
Parameters
tags ( list[str] )
Return type
dict [ str , str ]
toil.utils.toilLaunchCluster.main()
Return type
None
toil.utils.toilMain
Functions
Module Contents
toil.utils.toilMain.main()
Return type
None
toil.utils.toilMain.get_or_die(module, name)
Get an object from a module or
complain that it is missing.
Parameters
|
• |
module ( types.ModuleType ) |
|||
|
• |
name ( str ) |
Return type
Any
toil.utils.toilMain.loadModules()
Return type
dict [ str , types.ModuleType ]
toil.utils.toilMain.printHelp(modules)
Parameters
modules ( dict[str, types.ModuleType] )
Return type
None
toil.utils.toilMain.printVersion()
Return type
None
toil.utils.toilRsyncCluster
Rsyncs into the toil appliance container running on the leader of the cluster.
Attributes
Functions
Module Contents
toil.utils.toilRsyncCluster.logger
toil.utils.toilRsyncCluster.main()
Return type
None
toil.utils.toilServer
CLI entry for the Toil servers.
Attributes
Functions
Module Contents
toil.utils.toilServer.logger
toil.utils.toilServer.main()
Return type
None
toil.utils.toilSshCluster
SSH into the toil appliance container running on the leader of the cluster.
Attributes
Functions
Module Contents
toil.utils.toilSshCluster.logger
toil.utils.toilSshCluster.main()
Return type
None
toil.utils.toilStats
Reports statistical data about a given Toil workflow.
Attributes
Classes
Functions
Module Contents
toil.utils.toilStats.logger
toil.utils.toilStats.CATEGORIES = ['time', 'clock', 'wait',
'memory',
'disk']
toil.utils.toilStats.CATEGORY_UNITS
toil.utils.toilStats.TITLES
toil.utils.toilStats.TIME_CATEGORIES
toil.utils.toilStats.SPACE_CATEGORIES
toil.utils.toilStats.COMPUTED_CATEGORIES
toil.utils.toilStats.LONG_FORMS
class toil.utils.toilStats.ColumnWidths
Convenience object that stores
the width of columns for printing. Helps make things pretty.
categories
fields_count = ['count', 'min', 'med', 'ave', 'max',
'total']
fields = ['min', 'med', 'ave', 'max', 'total']
data:
dict
[
str
,
int
]
title(category)
Return the total printed length
of this category item.
Parameters
category ( str )
Return type
int
get_width(category, field)
Parameters
|
• |
category ( str ) |
|||
|
• |
field ( str ) |
Return type
int
set_width(category, field, width)
Parameters
|
• |
category ( str ) |
|||
|
• |
field ( str ) |
|||
|
• |
width ( int ) |
Return type
None
report()
Return type
None
toil.utils.toilStats.pad_str(s, field=None)
Pad the beginning of a string
with spaces, if necessary.
Parameters
|
• |
s ( str ) |
|||
|
• |
field ( Optional[int] ) |
Return type
str
toil.utils.toilStats.pretty_space(k, field=None, alone=False)
Given input k as kibibytes,
return a nicely formatted string.
Parameters
|
• |
k ( float ) |
|||
|
• |
field ( Optional[int] ) |
|||
|
• |
alone ( bool ) |
Return type
str
toil.utils.toilStats.pretty_time(t, field=None, unit='s', alone=False)
Given input t as seconds,
return a nicely formatted string.
Parameters
|
• |
t ( float ) |
|||
|
• |
field ( Optional[int] ) |
|||
|
• |
unit ( str ) |
|||
|
• |
alone ( bool ) |
Return type
str
toil.utils.toilStats.report_unit(unit)
Format a unit name for display.
Parameters
unit ( str )
Return type
str
toil.utils.toilStats.report_time(t,
options, field=None, unit='s',
alone=False)
Given t seconds, report back
the correct format as string.
Parameters
|
• |
t ( float ) |
|||
|
• |
options ( argparse.Namespace ) |
|||
|
• |
field ( Optional[int] ) |
|||
|
• |
unit ( str ) |
|||
|
• |
alone ( bool ) |
Return type
str
toil.utils.toilStats.report_space(k,
options, field=None, unit='KiB',
alone=False)
Given k kibibytes, report back the correct format as string.
If unit is set
to B, convert to KiB first.
Parameters
|
• |
k ( float ) |
|||
|
• |
options ( argparse.Namespace ) |
|||
|
• |
field ( Optional[int] ) |
|||
|
• |
unit ( str ) |
|||
|
• |
alone ( bool ) |
Return type
str
toil.utils.toilStats.report_number(n, field=None, nan_value='NaN')
Given a number, report back the correct format as string.
If it is a NaN
or None, use nan_value to represent it instead.
Parameters
|
• |
n ( Union[int, float, None] ) |
|||
|
• |
field ( Optional[int] ) |
|||
|
• |
nan_value ( str ) |
Return type
str
toil.utils.toilStats.report(v,
category, options, field=None,
alone=False)
Report a value of the given category formatted as a string.
Uses the given field width if set.
If alone is
set, the field is being formatted outside a table and might
need a unit.
Parameters
|
• |
v ( float ) |
|||
|
• |
category ( str ) |
|||
|
• |
options ( argparse.Namespace ) |
|||
|
• |
field ( Optional[int] ) |
Return type
str
toil.utils.toilStats.sprint_tag(key, tag, options, columnWidths=None)
Generate a pretty-print ready
string from a JTTag().
Parameters
|
• |
key ( str ) |
|||
|
• |
tag ( toil.lib.expando.Expando ) |
|||
|
• |
options ( argparse.Namespace ) |
|||
|
• |
columnWidths ( Optional[ColumnWidths] ) |
Return type
str
toil.utils.toilStats.decorate_title(category, title, options)
Add extra parts to the category titles.
Add units to
title if they won't appear in the formatted values. Add a
marker to TITLE if the TITLE is sorted on.
Parameters
|
• |
category ( str ) |
|||
|
• |
title ( str ) |
|||
|
• |
options ( argparse.Namespace ) |
Return type
str
toil.utils.toilStats.decorate_subheader(category,
columnWidths,
options)
Add a marker to the correct
field if the TITLE is sorted on.
Parameters
|
• |
category ( str ) |
|||
|
• |
columnWidths ( ColumnWidths ) |
|||
|
• |
options ( argparse.Namespace ) |
Return type
str
toil.utils.toilStats.get(tree, name)
Return a float value attribute
NAME from TREE.
Parameters
|
• |
tree ( toil.lib.expando.Expando ) |
|||
|
• |
name ( str ) |
Return type
float
toil.utils.toilStats.sort_jobs(jobTypes, options)
Return a jobTypes all sorted.
Parameters
|
• |
jobTypes ( list[Any] ) |
|||
|
• |
options ( argparse.Namespace ) |
Return type
list [Any]
toil.utils.toilStats.report_pretty_data(root,
worker, job, job_types,
options)
Print the important bits out.
Parameters
|
• |
root ( toil.lib.expando.Expando ) |
|||
|
• |
worker ( toil.lib.expando.Expando ) |
|||
|
• |
job ( toil.lib.expando.Expando ) |
|||
|
• |
job_types ( list[Any] ) |
|||
|
• |
options ( argparse.Namespace ) |
Return type
str
toil.utils.toilStats.compute_column_widths(job_types,
worker, job,
options)
Return a ColumnWidths() object
with the correct max widths.
Parameters
|
• |
job_types ( list[Any] ) |
|||
|
• |
worker ( toil.lib.expando.Expando ) |
|||
|
• |
job ( toil.lib.expando.Expando ) |
|||
|
• |
options ( argparse.Namespace ) |
Return type
ColumnWidths
toil.utils.toilStats.update_column_widths(tag, cw, options)
Update the column width
attributes for this tag's fields.
Parameters
|
• |
tag ( toil.lib.expando.Expando ) |
|||
|
• |
cw ( ColumnWidths ) |
|||
|
• |
options ( argparse.Namespace ) |
Return type
None
toil.utils.toilStats.build_element(element, items, item_name, defaults)
Create an element for output.
Parameters
|
• |
element ( toil.lib.expando.Expando ) |
|||
|
• |
items ( list[toil.job.Job] ) |
|||
|
• |
item_name ( str ) |
|||
|
• |
defaults ( dict[str, float] ) |
Return type
toil.lib.expando.Expando
toil.utils.toilStats.create_summary(element,
containingItems,
containingItemName, count_contained)
Figure out how many jobs (or contained items) ran on each worker (or containing item).
Stick a bunch
of xxx_number_per_xxx stats into element to describe this.
Parameters
|
• |
count_contained ( Callable[[toil.lib.expando.Expando], int] ) -- function that maps from containing item to number of contained items. |
||
|
• |
element ( toil.lib.expando.Expando ) |
||
|
• |
containingItems ( list[toil.lib.expando.Expando] ) |
||
|
• |
containingItemName ( str ) |
Return type
None
toil.utils.toilStats.get_stats(jobStore)
Sum together all the stats information in the job store.
Produces one
object containing lists of the values from all the summed
objects.
Parameters
jobStore (- toil.jobStores.abstractJobStore.AbstractJobStore )
Return type
toil.lib.expando.Expando
toil.utils.toilStats.process_data(config, stats)
Collate the stats and report
Parameters
|
• |
config ( toil.common.Config ) |
|||
|
• |
stats ( toil.lib.expando.Expando ) |
Return type
toil.lib.expando.Expando
toil.utils.toilStats.report_data(tree, options)
Parameters
|
• |
tree ( toil.lib.expando.Expando ) |
|||
|
• |
options ( argparse.Namespace ) |
Return type
None
toil.utils.toilStats.sort_category_choices
toil.utils.toilStats.sort_field_choices = ['min', 'med',
'ave', 'max',
'total']
toil.utils.toilStats.add_stats_options(parser)
Parameters
parser ( argparse.ArgumentParser )
Return type
None
toil.utils.toilStats.main()
Reports stats on the workflow,
use with --stats option to toil.
Return type
None
toil.utils.toilStatus
Tool for reporting on job status.
Attributes
Classes
Functions
Module Contents
toil.utils.toilStatus.logger
class toil.utils.toilStatus.ToilStatus(jobStoreName,
specifiedJobs=None)
Tool for reporting on job
status.
Parameters
|
• |
jobStoreName ( str ) |
|||
|
• |
specifiedJobs ( Optional[list[str]] ) |
jobStoreName
jobStore
message_bus_path
print_dot_chart()
Print a dot output graph
representing the workflow.
Return type
None
printJobLog()
Takes a list of jobs, finds
their log files, and prints them to the terminal.
Return type
None
printJobChildren()
Takes a list of jobs, and
prints their successors.
Return type
None
printAggregateJobStats(properties, childNumber)
Prints each job's ID, log file,
remaining tries, and other properties.
Parameters
|
• |
properties ( list[set[str]] ) -- A set of string flag names for each job in self.jobsToReport. |
||
|
• |
childNumber ( list[int] ) -- A list of child counts for each job in self.jobsToReport. |
Return type
None
report_on_jobs()
Gathers information about jobs
such as its child jobs and status.
Returns jobStats
Dict containing some lists of jobs by category, and some lists of job properties for each job in self.jobsToReport.
Return type
dict [ str , Any]
static getPIDStatus(jobStoreName)
Determine the status of a process with a particular local pid.
Checks to see
if a process exists or not.
Returns
A string indicating the status of the PID of the workflow as stored in the jobstore.
Return type
str
Parameters
jobStoreName ( str )
static getStatus(jobStoreName)
Determine the status of a workflow.
If the jobstore does not exist, this returns 'QUEUED', assuming it has not been created yet.
Checks for the
existence of files created in the toil.Leader.run(). In
toil.Leader.run(), if a workflow completes with failed jobs,
'failed.log' is created, otherwise 'succeeded.log' is
written. If neither of these exist, the leader is still
running jobs.
Returns
A string indicating the status of the workflow. ['COMPLETED', 'RUNNING', 'ERROR', 'QUEUED']
Return type
str
Parameters
jobStoreName ( str )
print_running_jobs()
Prints a list of the currently
running jobs
Return type
None
fetchRootJob()
Fetches the root job from the jobStore that provides context for all other jobs.
Exactly the same as the jobStore.loadRootJob() function, but with a different exit message if the root job is not found (indicating the workflow ran successfully to completion and certain stats cannot be gathered from it meaningfully such as which jobs are left to run).
|
Raises |
JobException -- if the root job does not exist. |
Return type
toil.job.JobDescription
fetchUserJobs(jobs)
Takes a user input array of
jobs, verifies that they are in the jobStore and returns the
array of jobsToReport.
Parameters
jobs ( list ) -- A list of jobs to be verified.
Returns jobsToReport
A list of jobs which are verified to be in the jobStore.
Return type
list [ toil.job.JobDescription ]
traverseJobGraph(rootJob,
jobsToReport=None,
foundJobStoreIDs=None)
Find all current jobs in the
jobStore and return them as an Array.
Parameters
|
• |
rootJob ( toil.job.JobDescription ) -- The root job of the workflow. |
||
|
• |
jobsToReport ( list ) -- A list of jobNodes to be added to and returned. |
||
|
• |
foundJobStoreIDs ( set ) -- A set of jobStoreIDs used to keep track of jobStoreIDs encountered in traversal. |
Returns jobsToReport
The list of jobs currently in the job graph.
Return type
list [ toil.job.JobDescription ]
toil.utils.toilStatus.main()
Reports the state of a Toil
workflow.
Return type
None
toil.utils.toilUpdateEC2Instances
Updates Toil's internal list of EC2 instance types.
Attributes
Functions
Module Contents
toil.utils.toilUpdateEC2Instances.logger
toil.utils.toilUpdateEC2Instances.internet_connection()
Returns True if there is an
internet connection present, and False otherwise.
Return type
bool
toil.utils.toilUpdateEC2Instances.main()
Return type
None
toil.version
Attributes
Module Contents
toil.version.baseVersion
= '8.0.0'
toil.version.cgcloudVersion = '1.6.0a1.dev393'
toil.version.version =
'8.0.0-d2ae0ea9ab49f238670dbf6aafd20de7afdd8514'
toil.version.cacheTag = 'cache-local-py3.12'
toil.version.mainCacheTag = 'cache-master-py3.12'
toil.version.distVersion = '8.0.0'
toil.version.exactPython = 'python3'
toil.version.python = 'python3'
toil.version.dockerTag =
'8.0.0-d2ae0ea9ab49f238670dbf6aafd20de7afdd8514-py3.11'
toil.version.currentCommit =
'd2ae0ea9ab49f238670dbf6aafd20de7afdd8514'
toil.version.dockerRegistry = 'quay.io/ucsc_cgl'
toil.version.dockerName = 'toil'
toil.version.dirty = False
toil.version.cwltool_version = '3.1.20250110105449'
toil.wdl
Submodules
toil.wdl.utils
Functions
Module Contents
toil.wdl.utils.get_version(iterable)
Get the version of the WDL
document.
Parameters
iterable ( collections.abc.Iterable[str] ) -- An iterable that contains the lines of a WDL document.
Returns
The WDL version used in the workflow.
Return type
str
toil.wdl.wdltoil
Attributes
Exceptions
Classes
Functions
Module Contents
toil.wdl.wdltoil.logger
class toil.wdl.wdltoil.ReadableFileObj
Bases: Protocol
Protocol that is more specific than what file_digest takes as an argument. Also guarantees a read() method.
Would extend
the protocol from Typeshed for hashlib but those are only
declared for 3.11+.
readinto(buf, /)
Parameters
buf ( bytearray )
Return type
int
readable()
Return type
bool
read(number)
Parameters
number ( int )
Return type
bytes
class toil.wdl.wdltoil.FileDigester
Bases: Protocol
Protocol for
the features we need from hashlib.file_digest.
__call__(__f, __alg_name)
Parameters
|
• |
__f ( ReadableFileObj ) |
|||
|
• |
__alg_name ( str ) |
Return type
hashlib._Hash
toil.wdl.wdltoil.file_digest:
FileDigester
toil.wdl.wdltoil.WDLContext
exception
toil.wdl.wdltoil.InsufficientMountDiskSpace(mount_targets,
desired_bytes, available_bytes)
Bases: Exception
Common base
class for all non-exit exceptions.
Parameters
|
• |
mount_targets ( list[str] ) |
|||
|
• |
desired_bytes ( int ) |
|||
|
• |
available_bytes ( int ) |
toil.wdl.wdltoil.wdl_error_reporter(task,
exit=False,
log=logger.critical)
Run code in a context where WDL
errors will be reported with pretty formatting.
Parameters
|
• |
task ( str ) |
|||
|
• |
exit ( bool ) |
|||
|
• |
log ( Callable[[str], None] ) |
Return type
Generator[None]
toil.wdl.wdltoil.F
toil.wdl.wdltoil.report_wdl_errors(task, exit=False,
log=logger.critical)
Create a decorator to report WDL errors with the given task message.
Decorator can
then be applied to a function, and if a WDL error happens it
will say that it could not {task}.
Parameters
|
• |
task ( str ) |
|||
|
• |
exit ( bool ) |
|||
|
• |
log ( Callable[[str], None] ) |
Return type
Callable[[F], F]
toil.wdl.wdltoil.remove_common_leading_whitespace(expression,
tolerate_blanks=True, tolerate_dedents=False,
tolerate_all_whitespace=True, debug=False)
Remove "common leading whitespace" as defined in the WDL 1.1 spec.
See <- https://github.com/openwdl/wdl/blob/main/versions/1.1/SPEC.md#stripping-leading-whitespace >.
Operates on a
WDL.Expr.String expression that has already been parsed.
Parameters
|
• |
tolerate_blanks ( bool ) -- If True, don't allow totally blank lines to zero the common whitespace. |
||
|
• |
tolerate_dedents ( bool ) -- If True, remove as much of the whitespace on the first indented line as is found on subesquent lines, regardless of whether later lines are out-dented relative to it. |
||
|
• |
tolerate_all_whitespace ( bool ) -- If True, don't allow all-whitespace lines to reduce the common whitespace prefix. |
||
|
• |
debug ( bool ) -- If True, the function will show its work by logging at debug level. |
||
|
• |
expression ( WDL.Expr.String ) |
Return type
WDL.Expr.String
async toil.wdl.wdltoil.toil_read_source(uri, path, importer)
Implementation of a MiniWDL read_source function that can use any filename or URL supported by Toil.
Needs to be
async because MiniWDL will await its result.
Parameters
|
• |
uri ( str ) |
|||
|
• |
path ( list[str] ) |
|||
|
• |
importer ( WDL.Tree.Document | None ) |
Return type
WDL.Tree.ReadSourceResult
toil.wdl.wdltoil.virtualized_equal(value1, value2)
Check if two WDL values are equal when taking into account file virtualization.
Treats
virtualized and non-virtualized Files referring to the same
underlying file as equal.
Parameters
|
• |
value1 ( WDL.Value.Base ) -- WDL value |
|||
|
• |
value2 ( WDL.Value.Base ) -- WDL value |
Returns
Whether the two values are equal with file virtualization accounted for
Return type
bool
toil.wdl.wdltoil.WDLBindings
toil.wdl.wdltoil.combine_bindings(all_bindings)
Combine variable bindings from
multiple predecessor tasks into one set for the current
task.
Parameters
all_bindings ( Sequence[WDLBindings] )
Return type
WDLBindings
toil.wdl.wdltoil.log_bindings(log_function, message, all_bindings)
Log bindings to the console,
even if some are still promises.
Parameters
|
• |
log_function ( Callable[Ellipsis, None] ) -- Function (like logger.info) to call to log data |
||
|
• |
message ( str ) -- Message to log before the bindings |
||
|
• |
all_bindings ( Sequence[toil.job.Promised[WDLBindings]] ) -- A list of bindings or promises for bindings, to log |
Return type
None
toil.wdl.wdltoil.get_supertype(types)
Get the supertype that can hold
values of all the given types.
Parameters
types ( Sequence[WDL.Type.Base] )
Return type
WDL.Type.Base
toil.wdl.wdltoil.for_each_node(root)
Iterate over all WDL workflow
nodes in the given node, including inputs, internal nodes of
conditionals and scatters, and gather nodes.
Parameters
root ( WDL.Tree.WorkflowNode )
Return type
Iterator[WDL.Tree.WorkflowNode]
toil.wdl.wdltoil.recursive_dependencies(root)
Get the combined workflow_node_dependencies of root and everything under it, which are not on anything in that subtree.
Useful because
section nodes can have internal nodes with dependencies not
reflected in those of the section node itself.
Parameters
root ( WDL.Tree.WorkflowNode )
Return type
set [ str ]
toil.wdl.wdltoil.parse_disks(spec, disks_spec)
Parse a WDL disk spec into a
disk mount specification. :param spec: Disks spec to parse
:param disks_spec: All disks spec as specified in the WDL
file. Only used for better error messages. :return:
Specified mount point (None if omitted or local-disk),
number of units, size of unit (ex GB)
Parameters
|
• |
spec ( str ) |
|||
|
• |
disks_spec ( list[WDL.Value.String] | str ) |
Return type
tuple [ str | None, float , str ]
toil.wdl.wdltoil.pack_toil_uri(file_id,
task_path, dir_id,
file_basename)
Encode a Toil file ID and metadata about who wrote it as a URI.
The URI will
start with the scheme in TOIL_URI_SCHEME.
Parameters
|
• |
file_id ( toil.fileStores.FileID ) |
|||
|
• |
task_path ( str ) |
|||
|
• |
dir_id ( uuid.UUID ) |
|||
|
• |
file_basename ( str ) |
Return type
str
toil.wdl.wdltoil.unpack_toil_uri(toil_uri)
Unpack a URI made by
make_toil_uri to retrieve the FileID and the basename (no
path prefix) that the file is supposed to have.
Parameters
toil_uri ( str )
Return type
tuple [ toil.fileStores.FileID , str , str , str ]
toil.wdl.wdltoil.SHARED_PATH_ATTR
= '_shared_fs_path'
toil.wdl.wdltoil.clone_metadata(old_file, new_file)
Copy all Toil metadata from one
WDL File to another.
Parameters
|
• |
old_file ( WDL.Value.File ) |
|||
|
• |
new_file ( WDL.Value.File ) |
Return type
None
toil.wdl.wdltoil.set_file_value(file, new_value)
Return a copy of a WDL File
with all metadata intact but the value changed.
Parameters
|
• |
file ( WDL.Value.File ) |
|||
|
• |
new_value ( str ) |
Return type
WDL.Value.File
toil.wdl.wdltoil.set_file_nonexistent(file, nonexistent)
Return a copy of a WDL File
with all metadata intact but the nonexistent flag set to the
given value.
Parameters
|
• |
file ( WDL.Value.File ) |
|||
|
• |
nonexistent ( bool ) |
Return type
WDL.Value.File
toil.wdl.wdltoil.get_file_nonexistent(file)
Return the nonexistent flag for
a file.
Parameters
file ( WDL.Value.File )
Return type
bool
toil.wdl.wdltoil.set_file_virtualized_value(file, virtualized_value)
Return a copy of a WDL File
with all metadata intact but the virtualized_value attribute
set to the given value.
Parameters
|
• |
file ( WDL.Value.File ) |
|||
|
• |
virtualized_value ( str ) |
Return type
WDL.Value.File
toil.wdl.wdltoil.get_file_virtualized_value(file)
Get the virtualized storage
location for a file.
Parameters
file ( WDL.Value.File )
Return type
Optional[ str ]
toil.wdl.wdltoil.get_shared_fs_path(file)
If a File has a shared filesystem path, get that path.
This will be
the path the File was initially imported from, or the path
that it has in the call cache.
Parameters
file ( WDL.Value.File )
Return type
Optional[ str ]
toil.wdl.wdltoil.set_shared_fs_path(file, path)
Return a copy of the given File associated with the given shared filesystem path.
This should be
the path it was initially imported from, or the path that it
has in the call cache.
Parameters
|
• |
file ( WDL.Value.File ) |
|||
|
• |
path ( str ) |
Return type
WDL.Value.File
toil.wdl.wdltoil.view_shared_fs_paths(bindings)
Given WDL bindings, return a
copy where all files have their shared filesystem paths as
their values.
Parameters
bindings ( WDL.Env.Bindings[WDL.Value.Base] )
Return type
WDL.Env.Bindings[WDL.Value.Base]
toil.wdl.wdltoil.poll_execution_cache(node, bindings)
Return the cached result of calling this workflow or task, and its key.
Returns None and the key if the cache has no result for us.
Deals in
un-namespaced bindings.
Parameters
|
• |
node ( Union[WDL.Tree.Workflow, WDL.Tree.Task] ) |
|||
|
• |
bindings ( WDLBindings ) |
Return type
tuple [WDLBindings | None, str ]
toil.wdl.wdltoil.fill_execution_cache(cache_key,
output_bindings,
file_store, wdl_options, miniwdl_logger=None,
miniwdl_config=None)
Cache the result of calling a workflow or task.
Deals in
un-namespaced bindings.
Returns
possibly modified bindings to continue on with, that may reference the cache.
Parameters
|
• |
cache_key ( str ) |
||
|
• |
output_bindings ( WDLBindings ) |
||
|
• |
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore ) |
||
|
• |
wdl_options ( WDLContext ) |
||
|
• |
miniwdl_logger ( Optional[logging.Logger] ) |
||
|
• |
miniwdl_config ( Optional[WDL.runtime.config.Loader] ) |
Return type
WDLBindings
toil.wdl.wdltoil.DirectoryNamingStateDict
toil.wdl.wdltoil.choose_human_readable_directory(root_dir,
source_task_path, parent_id, state)
Select a good directory to save files from a task and source directory in.
The directories
involved may not exist.
Parameters
|
• |
root_dir ( str ) -- Directory that the path will be under |
||
|
• |
source_task_path ( str ) -- The dotted WDL name of whatever generated the file. We assume this is an acceptable filename component. |
||
|
• |
parent_id ( str ) -- UUID of the directory that the file came from. All files with the same parent ID will be placed as siblings files in a shared parent directory. |
||
|
• |
state ( DirectoryNamingStateDict ) -- A state dict that must be passed to repeated calls. |
Return type
str
toil.wdl.wdltoil.evaluate_decls_to_bindings(decls,
all_bindings,
standard_library, include_previous=False,
drop_missing_files=False)
Evaluate decls with a given
bindings environment and standard library. Creates a new
bindings object that only contains the bindings from the
given decls. Guarantees that each decl in
decls
can
access the variables defined by the previous ones. :param
all_bindings: Environment to use when evaluating decls
:param decls: Decls to evaluate :param standard_library:
Standard library :param include_previous: Whether to include
the existing environment in the new returned environment.
This will be false for outputs where only defined decls
should be included :param drop_missing_files: Whether to
coerce nonexistent files to null. The coerced elements will
be checked that the transformation is valid. Currently
should only be enabled in output sections, see -
https://github.com/openwdl/wdl/issues/673#issuecomment-2248828116
:return: New bindings object
Parameters
|
• |
decls ( list[WDL.Tree.Decl] ) |
|||
|
• |
all_bindings ( WDL.Env.Bindings[WDL.Value.Base] ) |
|||
|
• |
standard_library ( ToilWDLStdLibBase ) |
|||
|
• |
include_previous ( bool ) |
|||
|
• |
drop_missing_files ( bool ) |
Return type
WDL.Env.Bindings[WDL.Value.Base]
class toil.wdl.wdltoil.NonDownloadingSize
Bases: WDL.StdLib._Size
WDL size() implementation that avoids downloading files.
MiniWDL's default size() implementation downloads the whole file to get its size. We want to be able to get file sizes from code running on the leader, where there may not be space to download the whole file. So we override the fancy class that implements it so that we can handle sizes for FileIDs using the FileID's stored size info.
toil.wdl.wdltoil.extract_workflow_inputs(environment)
Parameters
environment ( WDLBindings )
Return type
list [ str ]
toil.wdl.wdltoil.convert_files(environment,
file_to_id, file_to_data,
task_path)
Resolve relative-URI files in the given environment convert the file values to a new value made from a given mapping.
Will return
bindings with file values set to their corresponding
relative-URI.
Parameters
|
• |
environment ( WDLBindings ) -- Bindings to evaluate on |
|||
|
• |
file_to_id ( Dict[str, toil.fileStores.FileID] ) |
|||
|
• |
file_to_data ( Dict[str, toil.job.FileMetadata] ) |
|||
|
• |
task_path ( str ) |
Returns
new bindings object
Return type
WDLBindings
toil.wdl.wdltoil.convert_remote_files(environment,
file_source,
task_path, search_paths=None, import_remote_files=True,
execution_dir=None)
Resolve relative-URI files in the given environment and import all files.
Returns an
environment where each File's value is set to the URI it was
found at, its virtualized value is set to what it was loaded
into the filestore as (if applicable), and its shared
filesystem path is set if it came from the local filesystem.
Parameters
|
• |
environment ( WDLBindings ) -- Bindings to evaluate on |
||
|
• |
file_source (- toil.jobStores.abstractJobStore.AbstractJobStore ) -- Context to search for files with |
||
|
• |
task_path ( str ) -- Dotted WDL name of the user-level code doing the importing (probably the workflow name). |
||
|
• |
search_paths ( Optional[list[str]] ) -- If set, try resolving input location relative to the URLs or directories in this list. |
||
|
• |
import_remote_files ( bool ) -- If set, import files from remote locations. Else leave them as URI references. |
||
|
• |
execution_dir ( Optional[str] ) |
Return type
WDLBindings
class
toil.wdl.wdltoil.ToilWDLStdLibBase(file_store, wdl_options,
share_files_with=None)
Bases: WDL.StdLib.Base
Standard
library implementation for WDL as run on Toil.
Parameters
|
• |
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore ) |
||
|
• |
wdl_options ( WDLContext ) |
||
|
• |
share_files_with ( ToilWDLStdLibBase | None ) |
||
|
size |
property execution_dir: str | None
Return type
str | None
property task_path: str
Return type
str
get_local_paths()
Get all the local paths of
files devirtualized (or virtualized) through the stdlib.
Return type
list [ str ]
static
devirtualize_to(filename, dest_dir, file_source, state,
wdl_options, devirtualized_to_virtualized=None,
virtualized_to_devirtualized=None, export=None)
Download or export a WDL virtualized filename/URL to the given directory.
The destination directory must already exist. No other devirtualize_to call may be writing to it, including the case of another workflow writing the same task to the same place in the call cache at the same time.
Makes sure sibling files stay siblings and files with the same name don't clobber each other. Called from within this class for tasks, and statically at the end of the workflow for outputs.
Returns the local path to the file. If the file is already a local path, or if it already has an entry in virtualized_to_devirtualized, that path will be re-used instead of creating a new copy in dest_dir.
The input
filename could already be devirtualized. In this case, the
filename should not be added to the cache.
Parameters
|
• |
state ( DirectoryNamingStateDict ) -- State dict which must be shared among successive calls into a dest_dir. |
||
|
• |
wdl_options ( WDLContext ) -- WDL options to carry through. |
||
|
• |
export ( bool | None ) -- Always create exported copies of files rather than views that a FileStore might clean up. |
||
|
• |
filename ( str ) |
||
|
• |
dest_dir ( str ) |
||
|
• |
file_source (- toil.fileStores.abstractFileStore.AbstractFileStore | toil.common.Toil ) |
||
|
• |
devirtualized_to_virtualized ( dict[str, str] | None ) |
||
|
• |
virtualized_to_devirtualized ( dict[str, str] | None ) |
Return type
str
class toil.wdl.wdltoil.ToilWDLStdLibWorkflow(*args, **kwargs)
Bases: ToilWDLStdLibBase
Standard library implementation for workflow scope.
Handles
deduplicating files generated by write_* calls at workflow
scope with copies already in the call cache, so that tasks
that depend on them can also be fulfilled from the cache.
Parameters
|
• |
args ( Any ) |
|||
|
• |
kwargs ( Any ) |
class
toil.wdl.wdltoil.ToilWDLStdLibTaskCommand(file_store,
container,
wdl_options)
Bases: ToilWDLStdLibBase
Standard library implementation to use inside a WDL task command evaluation.
Expects all the
filenames in variable bindings to be container-side paths;
these are the "virtualized" filenames, while the
"devirtualized" filenames are host-side paths.
Parameters
|
• |
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore ) |
||
|
• |
container ( WDL.runtime.task_container.TaskContainer ) |
||
|
• |
wdl_options ( WDLContext ) |
container
class
toil.wdl.wdltoil.ToilWDLStdLibTaskOutputs(file_store,
stdout_path, stderr_path, file_to_mountpoint, wdl_options,
share_files_with=None)
Bases: ToilWDLStdLibBase , WDL.StdLib.TaskOutputs
Standard
library implementation for WDL as run on Toil, with
additional functions only allowed in task output sections.
Parameters
|
• |
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore ) |
||
|
• |
stdout_path ( str ) |
||
|
• |
stderr_path ( str ) |
||
|
• |
file_to_mountpoint ( dict[str, str] ) |
||
|
• |
wdl_options ( WDLContext ) |
||
|
• |
share_files_with ( ToilWDLStdLibBase | None ) |
stdout_used()
Return True if the standard
output was read by the WDL.
Return type
bool
stderr_used()
Return True if the standard
error was read by the WDL.
Return type
bool
toil.wdl.wdltoil.evaluate_named_expression(context,
name,
expected_type, expression, environment, stdlib)
Evaluate an expression when we
know the name of it.
Parameters
|
• |
context ( WDL.Error.SourceNode | WDL.Error.SourcePosition ) |
||
|
• |
name ( str ) |
||
|
• |
expected_type ( WDL.Type.Base | None ) |
||
|
• |
expression ( WDL.Expr.Base | None ) |
||
|
• |
environment ( WDLBindings ) |
||
|
• |
stdlib ( WDL.StdLib.Base ) |
Return type
WDL.Value.Base
toil.wdl.wdltoil.evaluate_decl(node, environment, stdlib)
Evaluate the expression of a
declaration node, or raise an error.
Parameters
|
• |
node ( WDL.Tree.Decl ) |
|||
|
• |
environment ( WDLBindings ) |
|||
|
• |
stdlib ( WDL.StdLib.Base ) |
Return type
WDL.Value.Base
toil.wdl.wdltoil.evaluate_call_inputs(context,
expressions,
environment, stdlib, inputs_dict=None)
Evaluate a bunch of expressions
with names, and make them into a fresh set of bindings.
inputs_dict
is a mapping of variable names to their
expected type for the input decls in a task.
Parameters
|
• |
context ( WDL.Error.SourceNode | WDL.Error.SourcePosition ) |
||
|
• |
expressions ( dict[str, WDL.Expr.Base] ) |
||
|
• |
environment ( WDLBindings ) |
||
|
• |
stdlib ( WDL.StdLib.Base ) |
||
|
• |
inputs_dict ( dict[str, WDL.Type.Base] | None ) |
Return type
WDLBindings
toil.wdl.wdltoil.evaluate_defaultable_decl(node, environment, stdlib)
If the name of the declaration
is already defined in the environment, return its value.
Otherwise, return the evaluated expression.
Parameters
|
• |
node ( WDL.Tree.Decl ) |
|||
|
• |
environment ( WDLBindings ) |
|||
|
• |
stdlib ( WDL.StdLib.Base ) |
Return type
WDL.Value.Base
toil.wdl.wdltoil.devirtualize_files(environment, stdlib)
Make sure all the File values
embedded in the given bindings point to files that are
actually available to command line commands. The same
virtual file always maps to the same devirtualized filename
even with duplicates
Parameters
|
• |
environment ( WDLBindings ) |
|||
|
• |
stdlib ( ToilWDLStdLibBase ) |
Return type
WDLBindings
toil.wdl.wdltoil.virtualize_files(environment,
stdlib,
enforce_existence=True)
Make sure all the File values
embedded in the given bindings point to files that are
usable from other machines.
Parameters
|
• |
environment ( WDLBindings ) |
|||
|
• |
stdlib ( ToilWDLStdLibBase ) |
|||
|
• |
enforce_existence ( bool ) |
Return type
WDLBindings
toil.wdl.wdltoil.add_paths(task_container, host_paths)
Based off of
WDL.runtime.task_container.add_paths from miniwdl Maps the
host path to the container paths
Parameters
|
• |
task_container ( WDL.runtime.task_container.TaskContainer ) |
|||
|
• |
host_paths ( Iterable[str] ) |
Return type
None
toil.wdl.wdltoil.drop_if_missing(file, standard_library)
Return None if a file doesn't exist, or its path if it does.
filename
represents a URI or file name belonging to a WDL value of
type value_type. work_dir represents the current working
directory of the job and is where all relative paths will be
interpreted from
Parameters
|
• |
file ( WDL.Value.File ) |
|||
|
• |
standard_library ( ToilWDLStdLibBase ) |
Return type
WDL.Value.File | None
toil.wdl.wdltoil.drop_missing_files(environment, standard_library)
Make sure all the File values embedded in the given bindings point to files that exist, or are null.
Files must not
be virtualized.
Parameters
|
• |
environment ( WDLBindings ) |
|||
|
• |
standard_library ( ToilWDLStdLibBase ) |
Return type
WDLBindings
toil.wdl.wdltoil.get_file_paths_in_bindings(environment)
Get the paths of all files in the bindings. Doesn't guarantee that duplicates are removed.
TODO:
Duplicative with WDL.runtime.task._fspaths, except that is
internal and supports Directory objects.
Parameters
environment ( WDLBindings )
Return type
list [ str ]
toil.wdl.wdltoil.map_over_files_in_bindings(environment, transform)
Run all File values embedded in the given bindings through the given transformation function.
The transformation function must not mutate the original File.
TODO: Replace
with WDL.Value.rewrite_env_paths or WDL.Value.rewrite_files
Parameters
|
• |
environment ( WDLBindings ) |
||
|
• |
transform ( Callable[[WDL.Value.File], WDL.Value.File | None] ) |
Return type
WDLBindings
toil.wdl.wdltoil.map_over_files_in_binding(binding, transform)
Run all File values' types and values embedded in the given binding's value through the given transformation function.
The
transformation function must not mutate the original File.
Parameters
|
• |
binding ( WDL.Env.Binding[WDL.Value.Base] ) |
||
|
• |
transform ( Callable[[WDL.Value.File], WDL.Value.File | None] ) |
Return type
WDL.Env.Binding[WDL.Value.Base]
toil.wdl.wdltoil.map_over_typed_files_in_value(value, transform)
Run all File values embedded in the given value through the given transformation function.
The transformation function must not mutate the original File.
If the transform returns None, the file value is changed to Null.
The transform has access to the type information for the value, so it knows if it may return None, depending on if the value is optional or not.
The transform
is
allowed
to return None only if the mapping result
won't actually be used, to allow for scans. So error
checking needs to be part of the transform itself.
Parameters
|
• |
value ( WDL.Value.Base ) |
||
|
• |
transform ( Callable[[WDL.Value.File], WDL.Value.File | None] ) |
Return type
WDL.Value.Base
toil.wdl.wdltoil.ensure_null_files_are_nullable(value,
original_value,
expected_type)
Run through all nested values embedded in the given value and check that the null values are valid.
If a null value is found that does not have a valid corresponding expected_type, raise an error
(This is currently only used to check that null values arising from File coercion are in locations with a nullable File? type. If this is to be used elsewhere, the error message should be changed to describe the appropriate types and not just talk about files.)
For example: If
one of the nested values is null but the equivalent nested
expected_type is not optional, a FileNotFoundError will be
raised :param value: WDL base value to check. This is the
WDL value that has been transformed and has the null
elements :param original_value: The original WDL base value
prior to the transformation. Only used for error messages
:param expected_type: The WDL type of the value
Parameters
|
• |
value ( WDL.Value.Base ) |
|||
|
• |
original_value ( WDL.Value.Base ) |
|||
|
• |
expected_type ( WDL.Type.Base ) |
Return type
None
class toil.wdl.wdltoil.WDLBaseJob(wdl_options, **kwargs)
Bases: toil.job.Job
Base job class for all WDL-related jobs.
Responsible for post-processing returned bindings, to do things like add in null values for things not defined in a section. Post-processing operations can be added onto any job before it is saved, and will be applied as long as the job's run method calls postprocess().
Also
responsible for remembering the Toil WDL configuration keys
and values.
Parameters
|
• |
wdl_options ( WDLContext ) |
|||
|
• |
kwargs ( Any ) |
run(file_store)
Run a WDL-related job.
Remember to
decorate non-trivial overrides with
report_wdl_errors()
.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
Any
then_underlay(underlay)
Apply an underlay of backup
bindings to the result.
Parameters
underlay ( toil.job.Promised[WDLBindings] )
Return type
None
then_remove(remove)
Remove the given bindings from
the result.
Parameters
remove ( toil.job.Promised[WDLBindings] )
Return type
None
then_namespace(namespace)
Put the result bindings into a
namespace.
Parameters
namespace ( str )
Return type
None
then_overlay(overlay)
Overlay the given bindings on
top of the (possibly namespaced) result.
Parameters
overlay ( toil.job.Promised[WDLBindings] )
Return type
None
postprocess(bindings)
Apply queued changes to bindings.
Should be
applied by subclasses' run() implementations to their return
values.
Parameters
bindings ( WDLBindings )
Return type
WDLBindings
defer_postprocessing(other)
Give our postprocessing steps to a different job.
Use this when
you are returning a promise for bindings, on the job that
issues the promise.
Parameters
other ( WDLBaseJob )
Return type
None
class
toil.wdl.wdltoil.WDLTaskWrapperJob(task, prev_node_results,
task_id, wdl_options, **kwargs)
Bases: WDLBaseJob
Job that determines the resources needed to run a WDL job.
Responsible for evaluating the input declarations for unspecified inputs, evaluating the runtime section, and scheduling or chaining to the real WDL job.
All bindings
are in terms of task-internal names.
Parameters
|
• |
task ( WDL.Tree.Task ) |
|||
|
• |
prev_node_results ( Sequence[toil.job.Promised[WDLBindings]] ) |
|||
|
• |
task_id ( list[str] ) |
|||
|
• |
wdl_options ( WDLContext ) |
|||
|
• |
kwargs ( Any ) |
run(file_store)
Evaluate inputs and runtime and
schedule the task.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
toil.job.Promised[WDLBindings]
class
toil.wdl.wdltoil.WDLTaskJob(task, task_internal_bindings,
runtime_bindings, task_id, mount_spec, wdl_options,
cache_key=None,
**kwargs)
Bases: WDLBaseJob
Job that runs a WDL task.
Responsible for re-evaluating input declarations for unspecified inputs, evaluating the runtime section, re-scheduling if resources are not available, running any command, and evaluating the outputs.
All bindings
are in terms of task-internal names.
Parameters
|
• |
task ( WDL.Tree.Task ) |
||
|
• |
task_internal_bindings ( toil.job.Promised[WDLBindings] ) |
||
|
• |
runtime_bindings ( toil.job.Promised[WDLBindings] ) |
||
|
• |
task_id ( list[str] ) |
||
|
• |
mount_spec ( dict[str | None, int] ) |
||
|
• |
wdl_options ( WDLContext ) |
||
|
• |
cache_key ( str | None ) |
||
|
• |
kwargs ( Any ) |
INJECTED_MESSAGE_DIR =
'.toil_wdl_runtime'
add_injections(command_string, task_container)
Inject extra Bash code from the Toil WDL runtime into the command for the container.
Currently
doesn't implement the MiniWDL plugin system, but does add
resource usage monitoring to Docker containers.
Parameters
|
• |
command_string ( str ) |
|||
|
• |
task_container ( WDL.runtime.task_container.TaskContainer ) |
Return type
str
handle_injection_messages(outputs_library)
Handle any data received from
injected runtime code in the container.
Parameters
outputs_library ( ToilWDLStdLibTaskOutputs )
Return type
None
handle_message_file(file_path)
Handle a message file received from in-container injected code.
Takes the
host-side path of the file.
Parameters
file_path ( str )
Return type
None
can_fake_root()
Determine if --fakeroot is
likely to work for Singularity.
Return type
bool
can_mount_proc()
Determine if --containall will work for Singularity. On Kubernetes, this will result in operation not permitted See: https://github.com/apptainer/singularity/issues/5857
So if
Kubernetes is detected, return False :return: bool
Return type
bool
ensure_mount_point(file_store, mount_spec)
Ensure the mount point sources are available.
Will check if the mount point source has the requested amount of space available.
Note: We are
depending on Toil's job scheduling backend to error when the
sum of multiple mount points disk requests is greater than
the total available For example, if a task has two mount
points request 100 GB each but there is only 100 GB
available, the df check may pass but Toil should fail to
schedule the jobs internally
Parameters
|
• |
mount_spec ( dict[str | None, int] ) -- Mount specification from the disks attribute in the WDL task. Is a dict where key is the mount point target and value is the size |
||
|
• |
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore ) -- File store to create a tmp directory for the mount point source |
Returns
Dict mapping mount point target to mount point source
Return type
dict [ str , str ]
run(file_store)
Actually run the task.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
toil.job.Promised[WDLBindings]
class
toil.wdl.wdltoil.WDLWorkflowNodeJob(node, prev_node_results,
wdl_options, **kwargs)
Bases: WDLBaseJob
Job that
evaluates a WDL workflow node.
Parameters
|
• |
node ( WDL.Tree.WorkflowNode ) |
|||
|
• |
prev_node_results ( Sequence[toil.job.Promised[WDLBindings]] ) |
|||
|
• |
wdl_options ( WDLContext ) |
|||
|
• |
kwargs ( Any ) |
run(file_store)
Actually execute the workflow
node.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
toil.job.Promised[WDLBindings]
class
toil.wdl.wdltoil.WDLWorkflowNodeListJob(nodes,
prev_node_results,
wdl_options, **kwargs)
Bases: WDLBaseJob
Job that
evaluates a list of WDL workflow nodes, which are in the
same scope and in a topological dependency order, and which
do not call out to any other workflows or tasks or sections.
Parameters
|
• |
nodes ( list[WDL.Tree.WorkflowNode] ) |
|||
|
• |
prev_node_results ( Sequence[toil.job.Promised[WDLBindings]] ) |
|||
|
• |
wdl_options ( WDLContext ) |
|||
|
• |
kwargs ( Any ) |
run(file_store)
Actually execute the workflow
nodes.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
toil.job.Promised[WDLBindings]
class
toil.wdl.wdltoil.WDLCombineBindingsJob(prev_node_results,
**kwargs)
Bases: WDLBaseJob
Job that
collects the results from WDL workflow nodes and combines
their environment changes.
Parameters
|
• |
prev_node_results ( Sequence[toil.job.Promised[WDLBindings]] ) |
|||
|
• |
kwargs ( Any ) |
run(file_store)
Aggregate incoming results.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
WDLBindings
class toil.wdl.wdltoil.WDLWorkflowGraph(nodes)
Represents a graph of WDL WorkflowNodes.
Operates at a certain level of instantiation (i.e. sub-sections are represented by single nodes).
Assumes all
relevant nodes are provided; dependencies outside the
provided nodes are assumed to be satisfied already.
Parameters
nodes ( Sequence[WDL.Tree.WorkflowNode] )
real_id(node_id)
Map multiple IDs for what we consider the same node to one ID.
This
elides/resolves gathers.
Parameters
node_id ( str )
Return type
str
is_decl(node_id)
Return True if a node
represents a WDL declaration, and false otherwise.
Parameters
node_id ( str )
Return type
bool
get(node_id)
Get a node by ID.
Parameters
node_id ( str )
Return type
WDL.Tree.WorkflowNode
get_dependencies(node_id)
Get all the nodes that a node depends on, recursively (into the node if it has a body) but not transitively.
Produces
dependencies after resolving gathers and internal-to-section
dependencies, on nodes that are also in this graph.
Parameters
node_id ( str )
Return type
set [ str ]
get_transitive_dependencies(node_id)
Get all the nodes that a node
depends on, transitively.
Parameters
node_id ( str )
Return type
set [ str ]
topological_order()
Get a topological order of the
nodes, based on their dependencies.
Return type
list [ str ]
leaves()
Get all the workflow node IDs
that have no dependents in the graph.
Return type
list [ str ]
class toil.wdl.wdltoil.WDLSectionJob(wdl_options, **kwargs)
Bases: WDLBaseJob
Job that can
create more graph for a section of the workflow.
Parameters
|
• |
wdl_options ( WDLContext ) |
|||
|
• |
kwargs ( Any ) |
static coalesce_nodes(order, section_graph)
Given a topological order of
WDL workflow node IDs, produce a list of lists of IDs, still
in topological order, where each list of IDs can be run
under a single Toil job.
Parameters
|
• |
order ( list[str] ) |
|||
|
• |
section_graph ( WDLWorkflowGraph ) |
Return type
list [ list [ str ]]
create_subgraph(nodes,
gather_nodes, environment,
local_environment=None, subscript=None)
Make a Toil job to evaluate a
subgraph inside a workflow or workflow section.
Returns
a child Job that will return the aggregated environment after running all the things in the section.
Parameters
|
• |
gather_nodes ( Sequence[WDL.Tree.Gather] ) -- Names exposed by these will always be defined with something, even if the code that defines them does not actually run. |
||
|
• |
environment ( WDLBindings ) -- Bindings in this environment will be used to evaluate the subgraph and will be passed through. |
||
|
• |
local_environment ( WDLBindings | None ) -- Bindings in this environment will be used to evaluate the subgraph but will go out of scope at the end of the section. |
||
|
• |
subscript ( int | None ) -- If the subgraph is being evaluated multiple times, this should be a disambiguating integer for logging. |
||
|
• |
nodes ( Sequence[WDL.Tree.WorkflowNode] ) |
Return type
WDLBaseJob
make_gather_bindings(gathers, undefined)
Given a collection of Gathers, create bindings from every identifier gathered, to the given "undefined" placeholder (which would be Null for a single execution of the body, or an empty array for a completely unexecuted scatter).
These bindings can be overlaid with bindings from the actual execution, so that references to names defined in unexecuted code get a proper default undefined value, and not a KeyError at runtime.
The information to do this comes from MiniWDL's "gathers" system: <- https://miniwdl.readthedocs.io/en/latest/WDL.html#WDL.Tree.WorkflowSection.gathers >
TODO: This
approach will scale O(nˆ2) when run on n nested
conditionals, because generating these bindings for the
outer conditional will visit all the bindings from the inner
ones.
Parameters
|
• |
gathers ( Sequence[WDL.Tree.Gather] ) |
|||
|
• |
undefined ( WDL.Value.Base ) |
Return type
WDLBindings
class
toil.wdl.wdltoil.WDLScatterJob(scatter, prev_node_results,
wdl_options, **kwargs)
Bases: WDLSectionJob
Job that
evaluates a scatter in a WDL workflow. Runs the body for
each value in an array, and makes arrays of the new bindings
created in each instance of the body. If an instance of the
body doesn't create a binding, it gets a null value in the
corresponding array.
Parameters
|
• |
scatter ( WDL.Tree.Scatter ) |
|||
|
• |
prev_node_results ( Sequence[toil.job.Promised[WDLBindings]] ) |
|||
|
• |
wdl_options ( WDLContext ) |
|||
|
• |
kwargs ( Any ) |
run(file_store)
Run the scatter.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
toil.job.Promised[WDLBindings]
class
toil.wdl.wdltoil.WDLArrayBindingsJob(input_bindings,
base_bindings, **kwargs)
Bases: WDLBaseJob
Job that takes all new bindings created in an array of input environments, relative to a base environment, and produces bindings where each new binding name is bound to an array of the values in all the input environments.
Useful for
producing the results of a scatter.
Parameters
|
• |
input_bindings ( Sequence[toil.job.Promised[WDLBindings]] ) |
|||
|
• |
base_bindings ( WDLBindings ) |
|||
|
• |
kwargs ( Any ) |
run(file_store)
Actually produce the
array-ified bindings now that promised values are available.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
WDLBindings
class
toil.wdl.wdltoil.WDLConditionalJob(conditional,
prev_node_results, wdl_options, **kwargs)
Bases: WDLSectionJob
Job that
evaluates a conditional in a WDL workflow.
Parameters
|
• |
conditional ( WDL.Tree.Conditional ) |
|||
|
• |
prev_node_results ( Sequence[toil.job.Promised[WDLBindings]] ) |
|||
|
• |
wdl_options ( WDLContext ) |
|||
|
• |
kwargs ( Any ) |
run(file_store)
Run the conditional.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
toil.job.Promised[WDLBindings]
class
toil.wdl.wdltoil.WDLWorkflowJob(workflow, prev_node_results,
workflow_id, wdl_options, **kwargs)
Bases: WDLSectionJob
Job that
evaluates an entire WDL workflow.
Parameters
|
• |
workflow ( WDL.Tree.Workflow ) |
|||
|
• |
prev_node_results ( Sequence[toil.job.Promised[WDLBindings]] ) |
|||
|
• |
workflow_id ( list[str] ) |
|||
|
• |
wdl_options ( WDLContext ) |
|||
|
• |
kwargs ( Any ) |
run(file_store)
Run the workflow. Return the
result of the workflow.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
toil.job.Promised[WDLBindings]
class
toil.wdl.wdltoil.WDLOutputsJob(workflow, bindings,
wdl_options,
cache_key=None, **kwargs)
Bases: WDLBaseJob
Job which evaluates an outputs section for a workflow.
Returns an
environment with just the outputs bound, in no namespace.
Parameters
|
• |
workflow ( WDL.Tree.Workflow ) |
|||
|
• |
bindings ( toil.job.Promised[WDLBindings] ) |
|||
|
• |
wdl_options ( WDLContext ) |
|||
|
• |
cache_key ( str | None ) |
|||
|
• |
kwargs ( Any ) |
run(file_store)
Make bindings for the outputs.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
WDLBindings
class
toil.wdl.wdltoil.WDLStartJob(target, inputs, wdl_options,
**kwargs)
Bases: WDLSectionJob
Job that
evaluates an entire WDL workflow, and returns the workflow
outputs namespaced with the workflow name. Inputs may or may
not be namespaced with the workflow name; both forms are
accepted.
Parameters
|
• |
target ( WDL.Tree.Workflow | WDL.Tree.Task ) |
|||
|
• |
inputs ( toil.job.Promised[WDLBindings] ) |
|||
|
• |
wdl_options ( WDLContext ) |
|||
|
• |
kwargs ( Any ) |
run(file_store)
Actually build the subgraph.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
toil.job.Promised[WDLBindings]
class
toil.wdl.wdltoil.WDLInstallImportsJob(task_path, inputs,
import_data, **kwargs)
Bases: toil.job.Job
Class
represents a unit of work in toil.
Parameters
|
• |
task_path ( str ) |
||
|
• |
inputs ( WDLBindings ) |
||
|
• |
import_data ( toil.job.Promised[Tuple[Dict[str, toil.fileStores.FileID], Dict[str, toil.job.FileMetadata]]] ) |
||
|
• |
kwargs ( Any ) |
run(file_store)
Convert the filenames in the
workflow inputs ito the URIs :return: Promise of transformed
workflow inputs
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
toil.job.Promised[WDLBindings]
class
toil.wdl.wdltoil.WDLImportWrapper(target, inputs,
wdl_options,
inputs_search_path, import_remote_files,
import_workers_threshold,
import_workers_disk, **kwargs)
Bases: WDLSectionJob
Job to organize importing files on workers instead of the leader. Responsible for extracting filenames and metadata, calling ImportsJob, applying imports to input bindings, and scheduling the start workflow job
This class is
only used when runImportsOnWorkers is enabled.
Parameters
|
• |
target ( Union[WDL.Tree.Workflow, WDL.Tree.Task] ) |
|||
|
• |
inputs ( WDLBindings ) |
|||
|
• |
wdl_options ( WDLContext ) |
|||
|
• |
inputs_search_path ( list[str] ) |
|||
|
• |
import_remote_files ( bool ) |
|||
|
• |
import_workers_threshold ( toil.job.ParseableIndivisibleResource ) |
|||
|
• |
import_workers_disk ( toil.job.ParseableIndivisibleResource ) |
|||
|
• |
kwargs ( Any ) |
run(file_store)
Run a WDL-related job.
Remember to
decorate non-trivial overrides with
report_wdl_errors()
.
Parameters
file_store (- toil.fileStores.abstractFileStore.AbstractFileStore )
Return type
toil.job.Promised[WDLBindings]
toil.wdl.wdltoil.make_root_job(target,
inputs, inputs_search_path,
toil, wdl_options, options)
Parameters
|
• |
target ( WDL.Tree.Workflow | WDL.Tree.Task ) |
|||
|
• |
inputs ( WDLBindings ) |
|||
|
• |
inputs_search_path ( list[str] ) |
|||
|
• |
toil ( toil.common.Toil ) |
|||
|
• |
wdl_options ( WDLContext ) |
|||
|
• |
options ( configargparse.Namespace ) |
Return type
WDLSectionJob
toil.wdl.wdltoil.main()
A Toil workflow to interpret
WDL input files.
Return type
None
toil.worker
Attributes
Classes
Functions
Module Contents
toil.worker.logger
class toil.worker.StatsDict(*args, **kwargs)
Bases: toil.lib.expando.MagicExpando
Subclass of
MagicExpando for type-checking purposes.
jobs:
list
[
toil.lib.expando.MagicExpando
]
toil.worker.nextChainable(predecessor, job_store, config)
Returns the next chainable
job's JobDescription after the given predecessor
JobDescription, if one exists, or None if the chain must
terminate.
Parameters
|
• |
predecessor ( toil.job.JobDescription ) -- The job to chain from |
||
|
• |
job_store (- toil.jobStores.abstractJobStore.AbstractJobStore ) -- The JobStore to fetch JobDescriptions from. |
||
|
• |
config ( toil.common.Config ) -- The configuration for the current run. |
Return type
Optional[ toil.job.JobDescription ]
toil.worker.workerScript(job_store,
config, job_name, job_store_id,
redirect_output_to_log_file=True,
local_worker_temp_dir=None,
debug_flags=None)
Worker process script, runs a
job.
Parameters
|
• |
job_store (- toil.jobStores.abstractJobStore.AbstractJobStore ) -- The JobStore to fetch JobDescriptions from. |
||
|
• |
config ( toil.common.Config ) -- The configuration for the current run. |
||
|
• |
job_name ( str ) -- The "job name" (a user friendly name) of the job to be run |
||
|
• |
job_store_id ( str ) -- The job store ID of the job to be run |
||
|
• |
redirect_output_to_log_file ( bool ) -- If False, log directly to the console instead of capturing job output. |
||
|
• |
local_worker_temp_dir ( Optional[str] ) -- The directory for the worker to work in. May be recursively removed after the job runs. |
||
|
• |
debug_flags ( Optional[set[str]] ) -- Flags to set on each job before running it. |
Return int
1 if a job failed, or 0 if all jobs succeeded
Return type
int
toil.worker.parse_args(args)
Parse command-line arguments to
the worker.
Parameters
args ( list[str] )
Return type
Any
toil.worker.in_contexts(contexts)
Unpickle and enter all the
pickled, base64-encoded context managers in the given list.
Then do the body, then leave them all.
Parameters
contexts ( list[str] )
Return type
collections.abc.Iterator [None]
toil.worker.main(argv=None)
Parameters
argv ( Optional[list[str]] )
Return type
None
Attributes
Exceptions
Functions
Package Contents
toil.log
toil.which(cmd, mode=os.F_OK | os.X_OK, path=None)
Return the path with conforms to the given mode on the Path.
[Copy-pasted in from python3.6's shutil.which().]
mode
defaults to os.F_OK | os.X_OK.
path
defaults to the
result of os.environ.get("PATH"), or can be
overridden with a custom search path.
Returns
The path found, or None.
Return type
Optional[ str ]
toil.toilPackageDirPath()
Return the absolute path of the directory that corresponds to the top-level toil package.
The return
value is guaranteed to end in '/toil'.
Return type
str
toil.inVirtualEnv()
Test if we are inside a
virtualenv or Conda virtual environment.
Return type
bool
toil.resolveEntryPoint(entryPoint)
Find the path to the given
entry point that
should
work on a worker.
Returns
The path found, which may be an absolute or a relative path.
Parameters
entryPoint ( str )
Return type
str
toil.physicalMemory()
Calculate the total amount of physical memory, in bytes.
>>> n
= physicalMemory()
>>> n > 0
True
>>> n == physicalMemory()
True
Return type
int
toil.physicalDisk(directory)
Parameters
directory ( str )
Return type
int
toil.applianceSelf(forceDockerAppliance=False)
Return the fully qualified name of the Docker image to start Toil appliance containers from.
The result is determined by the current version of Toil and three environment variables: TOIL_DOCKER_REGISTRY , TOIL_DOCKER_NAME and TOIL_APPLIANCE_SELF .
TOIL_DOCKER_REGISTRY
specifies an account on a publicly hosted docker registry
like Quay or Docker Hub. The default is UCSC's CGL account
on Quay.io where the Toil team publishes the official
appliance images.
TOIL_DOCKER_NAME
specifies the base
name of the image. The default of
toil
will be
adequate in most cases.
TOIL_APPLIANCE_SELF
fully
qualifies the appliance image, complete with registry, image
name and version tag, overriding both
TOIL_DOCKER_NAME
and
TOIL_DOCKER_REGISTRY`
as
well as the version tag of the image. Setting
TOIL_APPLIANCE_SELF will not be necessary in most cases.
Parameters
forceDockerAppliance ( bool )
Return type
str
toil.customDockerInitCmd()
Return the custom command set by the TOIL_CUSTOM_DOCKER_INIT_COMMAND environment variable.
The custom docker command is run prior to running the workers and/or the primary node's services.
This can be
useful for doing any custom initialization on instances
(e.g. authenticating to private docker registries). Any
single quotes are escaped and the command cannot contain a
set of blacklisted chars (newline or tab).
Returns
The custom command, or an empty string is returned if the environment variable is not set.
Return type
str
toil.customInitCmd()
Return the custom command set by the TOIL_CUSTOM_INIT_COMMAND environment variable.
The custom init command is run prior to running Toil appliance itself in workers and/or the primary node (i.e. this is run one stage before TOIL_CUSTOM_DOCKER_INIT_COMMAND ).
This can be useful for doing any custom initialization on instances (e.g. authenticating to private docker registries). Any single quotes are escaped and the command cannot contain a set of blacklisted chars (newline or tab).
returns: the
custom command or n empty string is returned if the
environment variable is not set.
Return type
str
toil.lookupEnvVar(name, envName, defaultValue)
Look up environment variables
that control Toil and log the result.
Parameters
|
• |
name ( str ) -- the human readable name of the variable |
||
|
• |
envName ( str ) -- the name of the environment variable to lookup |
||
|
• |
defaultValue ( str ) -- the fall-back value |
Returns
the value of the environment variable or the default value the variable is not set
Return type
str
toil.checkDockerImageExists(appliance)
Attempt to check a url
registryName for the existence of a docker image with a
given tag.
Parameters
appliance ( str ) -- The url of a docker image's registry (with a tag) of the form: 'quay.io/<repo_path>:<tag>' or '<repo_path>:<tag>'. Examples: 'quay.io/ucsc_cgl/toil:latest', 'ubuntu:latest', or 'broadinstitute/genomes-in-the-cloud:2.0.0'.
Returns
Raises an exception if the docker image cannot be found or is invalid. Otherwise, it will return the appliance string.
Return type
str
toil.parseDockerAppliance(appliance)
Derive parsed registry, image reference, and tag from a docker image string.
Example: "quay.io/ucsc_cgl/toil:latest" Should return: "quay.io", "ucsc_cgl/toil", "latest"
If a registry
is not defined, the default is: "docker.io" If a
tag is not defined, the default is: "latest"
Parameters
appliance ( str ) -- The full url of the docker image originally specified by the user (or the default). e.g. "quay.io/ucsc_cgl/toil:latest"
Returns
registryName, imageName, tag
Return type
tuple [ str , str , str ]
toil.checkDockerSchema(appliance)
exception toil.ApplianceImageNotFound(origAppliance, url,
statusCode)
Bases: docker.errors.ImageNotFound
Error raised
when using TOIL_APPLIANCE_SELF results in an HTTP error.
Parameters
|
• |
origAppliance ( str ) -- The full url of the docker image originally specified by the user (or the default). e.g. "quay.io/ucsc_cgl/toil:latest" |
||
|
• |
url ( str ) -- The URL at which the image's manifest is supposed to appear |
||
|
• |
statusCode ( int ) -- the failing HTTP status code returned by the URL |
toil.KNOWN_EXTANT_IMAGES
toil.requestCheckRegularDocker(origAppliance, registryName,
imageName,
tag)
Check if an image exists using the requests library.
URL is based on the docker v2 schema .
This has the following format: https://{websitehostname}.io/v2/{repo}/manifests/{tag}
Does not work
with the official (docker.io) site, because they require an
OAuth token, so a separate check is done for docker.io
images.
Parameters
|
• |
origAppliance ( str ) -- The full url of the docker image originally specified by the user (or the default). For example, quay.io/ucsc_cgl/toil:latest . |
||
|
• |
registryName ( str ) -- The url of a docker image's registry. For example, quay.io . |
||
|
• |
imageName ( str ) -- The image, including path and excluding the tag. For example, ucsc_cgl/toil . |
||
|
• |
tag ( str ) -- The tag used at that docker image's registry. For example, latest . |
||
|
Raises |
ApplianceImageNotFound if no match is found.
Returns
Return True if match found.
Return type
bool
toil.requestCheckDockerIo(origAppliance, imageName, tag)
Check docker.io to see if an image exists using the requests library.
URL is based on
the docker v2 schema. Requires that an access token be
fetched first.
Parameters
|
• |
origAppliance ( str ) -- The full url of the docker image originally specified by the user (or the default). For example, ubuntu:latest . |
||
|
• |
imageName ( str ) -- The image, including path and excluding the tag. For example, ubuntu . |
||
|
• |
tag ( str ) -- The tag used at that docker image's registry. For example, latest . |
||
|
Raises |
ApplianceImageNotFound if no match is found.
Returns
Return True if match found.
Return type
bool
toil.logProcessContext(config)
Parameters
config ( common.Config )
Return type
None
toil.cache_path = '˜/.cache/aws/cached_temporary_credentials'
|
[1] |
Created with sphinx-autoapi |
|||
|
• |
Index
|
• |
Search Page |
AUTHOR
UCSC Computational Genomics Lab
COPYRIGHT
2015 – 2025 UCSC Computational Genomics Lab