Adding support for execution on slurm grids
- Adding support for SlurmTaskExecutor in `daggyd` if DAGGY_ENABLE_SLURM is defined. - Renaming some test cases - Enabling compile-time slurm support - Adding slurm documentation
This commit is contained in:
48
README.md
48
README.md
@@ -321,14 +321,52 @@ jobs on slurm with a specific set of restrictions, or allow for local execution
|
||||
| pool | Names the executor the DAG should run on |
|
||||
| poolParameters | Any parameters the executor accepts that might modify how a task is run |
|
||||
|
||||
Executors
|
||||
=========
|
||||
|
||||
Different executors require different structures for the `job` task member.
|
||||
|
||||
Default Job Values
|
||||
------------------
|
||||
|
||||
A DAG can be submitted with the extra section `jobDefaults`. These values will be used to fill in default values for all
|
||||
tasks if they aren't overridden. This can be useful for cases like Slurm execution, where tasks will share default
|
||||
memory and runtime requirements.
|
||||
|
||||
Executors
|
||||
=========
|
||||
|
||||
Different executors require different structures for the `job` task member.
|
||||
|
||||
Local Executor (ForkingTaskExecutor)
|
||||
------------------------------------
|
||||
|
||||
The ForkingTaskExecutor runs tasks on the local box, forking to run the task, and using threads to monitor completion
|
||||
and capture output.
|
||||
|
||||
| Field | Sample | Description |
|
||||
|---------|--------|--------------|
|
||||
| command | `[ "/usr/bin/echo", "param1" ]` | The command to run on a slurm host |
|
||||
|
||||
Slurm Executor (SlurmTaskExecutor)
|
||||
----------------------------------
|
||||
|
||||
The slurm executor requires that the daggy server be running on a node capable of submitting jobs.
|
||||
|
||||
To enable slurm support use `cmake -DDAGGY_ENABLE_SLURM=ON ..` when configuring the project.
|
||||
|
||||
Required `job` config values:
|
||||
|
||||
| Field | Sample | Description |
|
||||
|---------|--------|--------------|
|
||||
| command | `[ "/usr/bin/echo", "param1" ]` | The command to run on a slurm host |
|
||||
| minCPUs | `"1"` | Minimum number of CPUs required |
|
||||
| minMemoryMB | `"1"` | Minimum memory required, in MB |
|
||||
| minTmpDiskMB | `"1"` | Minimum temporary disk required, in MB |
|
||||
| priority | `"100"` | Slurm priority |
|
||||
| timeLimitSeconds | `"100"` | Number of seconds to allow the job to run for |
|
||||
| userID | `"1002"` | Numeric UID that the job should run as |
|
||||
| workDir | `"/tmp/"` | Directory to use for work |
|
||||
| tmpDir | `"/tmp/"` | Directory to use for temporary files, as well as stdout/stderr capture |
|
||||
|
||||
Daggy will submit the `command` to run, capturing the output in `${tmpDir}/${taskName}_{RANDOM}.{stderr,stdout}` . Those
|
||||
files will then be read after the task has completed, and stored in the AttemptRecord for later retrieval.
|
||||
|
||||
For this reason, it's important that the `tmpDir` directory **be readable by the daggy engine**. i.e in a distributed
|
||||
environment, it should be a shared filesystem. If this isn't the case, the job output will not be captured by daggy,
|
||||
although it will still be available wherever it was written by slurm.
|
||||
|
||||
Reference in New Issue
Block a user