task submit / task poll model. Squashed commit of the following: commit 0ef57f095d15f0402915de54f83c1671120bd228 Author: Ian Roddis <tech@kinesin.ca> Date: Wed Feb 2 08:18:03 2022 -0400 Simplifying task polling and reducing lock scopes commit d77ef02021cc728849c7d1fb0185dd1a861b4a3d Author: Ian Roddis <tech@kinesin.ca> Date: Wed Feb 2 08:02:47 2022 -0400 Simplifying check commit c1acf34440162abb890a959f3685c2d184242ed5 Author: Ian Roddis <tech@kinesin.ca> Date: Wed Feb 2 08:01:13 2022 -0400 Removing capacity tracking from runner, since it is maintained in daggyd commit 9401246f92113ab140143c1895978b9de8bd9972 Author: Ian Roddis <tech@kinesin.ca> Date: Wed Feb 2 07:47:28 2022 -0400 Adding retry for submission commit 398aa04a320347bb35f23f3f101d91ab4df25652 Author: Ian Roddis <tech@kinesin.ca> Date: Tue Feb 1 14:54:20 2022 -0400 Adding in execution note, as well as requeuing the result if the peer disconnects commit 637b14af6d5b53f25b9c38d4c8a7ed8532af5599 Author: Ian Roddis <tech@kinesin.ca> Date: Tue Feb 1 14:13:59 2022 -0400 Fixing locking issues commit 4d6716dfda8aa7f51e0abbdab833aff618915ba0 Author: Ian Roddis <tech@kinesin.ca> Date: Tue Feb 1 13:33:14 2022 -0400 Single task daggyr working commit bd48a5452a92817faf25ee44a6115aaa2f6c30d1 Author: Ian Roddis <tech@kinesin.ca> Date: Tue Feb 1 12:22:04 2022 -0400 Checkpointing work
Daggy Runner
daggyr is a REST server process that acts as a remote task executor.
Running it
daggyr # That's it, will listen on 127.0.0.1:2504 , and run with a local executor
daggyr -d # Daemonize
daggyr --config FILE # Run with a config file
Capacity and Allocation
On startup, a server's capacity is determined automatically. The capacities are:
| Capacity | Determined by | Default | Notes |
|---|---|---|---|
| cores | std::thread::hardware_concurrency() |
max(1, max - 2) |
A value of 0 will mean all cores |
| memory_mb | sysinfo.h |
max(100, totalram * 0.75) |
totalram is converted to MB |
When a daggyd process is selecting a runner to send a task to, it will
query the current capacities, and choose the runner that:
- Can satisfy the requirements of the task
- Has the lowest impact, which is the largest relative drop in available capacity across all capacities.
For instance, if a job were submitted that requires 2 cores and 5g of memory, and three runners reported the following capacities:
| Runner | free_cores | impact_cores | free_memory | impact_memory | max_impact |
|---|---|---|---|---|---|
| 1 | 70 | 2.8% | 20g | 25.00% | 25% |
| 2 | 4 | 50.0% | 80g | 6.25% | 50% |
| 3 | 10 | 20.0% | 30g | 16.67% | 20% |
Runner 3 would be selected. Even though it doesn't have the most memory or CPU capacity, allocating the job to it minimizes the impact to the overall availability.
Submission and Execution
Tasks submitted to the runner will be executed with cgroups to enforce limits.
Jobs are submitted asynchronously, and rely on the client to poll for
results using the GET /api/v1/task/:task_id to get the resulting
TaskAttempt.
Runners are stateless, meaning that killing one will kill any running tasks and any stored results will be lost.
Config Files
{
"web-threads": 50,
"port": 2504,
"ip": "localhost",
"capacity_overrides": {
"cores": 10,
"memory_mb": 100
}
}
Capacities can be overriden from the auto-discovered results.