Squashed commit of the following: commit 69d5ef7a256b86a86d46e5ae374c00fded1497ea Author: Ian Roddis <tech@kinesin.ca> Date: Thu Dec 16 12:15:55 2021 -0400 Updating readme commit 94a9f676d0f9cc0b55cdc18c4927eaea40d82c77 Author: Ian Roddis <tech@kinesin.ca> Date: Thu Dec 16 12:05:36 2021 -0400 Fixing serialization of attempt records when querying entire dag commit 945e5f90b24abf07c9af1bc4c6bbcb33e93b8069 Author: Ian Roddis <tech@kinesin.ca> Date: Thu Dec 16 11:37:59 2021 -0400 Compiles cleanly... commit 8b23e46081d47fb80dc1a2d998fc6dc4bbf301a8 Author: Ian Roddis <tech@kinesin.ca> Date: Thu Dec 16 10:43:03 2021 -0400 Adding in missing source file to cmake build list commit 6d10d9791206e2bc15788beadeea580b8e43a853 Author: Ian Roddis <tech@kinesin.ca> Date: Thu Dec 16 10:41:43 2021 -0400 Adding new executors commit 42a2c67f4d6ae99df95d917c8621d78cd99837a1 Author: Ian Roddis <tech@kinesin.ca> Date: Thu Dec 16 10:27:14 2021 -0400 Fixing missing curl cmake dependency commit 394bc4c5d51ecee7bf14712f719c8bf7e97fb0fa Author: Ian Roddis <tech@kinesin.ca> Date: Thu Dec 16 10:21:58 2021 -0400 Fixing missing curl cmake dependency commit dd9efc8e7e7770ea1bcbccb70a1af9cfcff0414c Author: Ian Roddis <tech@kinesin.ca> Date: Wed Dec 15 17:15:38 2021 -0400 Checkpointing progress commit 3b3b55d6037bb96e46de6763f486f4ecb92fe6a0 Author: Ian Roddis <tech@kinesin.ca> Date: Wed Dec 15 14:21:18 2021 -0400 updating readme commit 303027c11452941b2a0c0d1b04ac5942e79efd74 Author: Ian Roddis <tech@kinesin.ca> Date: Wed Dec 15 14:17:16 2021 -0400 Namespacing daggyd Adding more error checking around deserialization of parameters Adding tests for runner agent commit c592eaeba12e2a449bae401e8c1d9ed236416d52 Author: Ian Roddis <tech@kinesin.ca> Date: Wed Dec 15 11:20:21 2021 -0400 Checkpointing work commit fb1862d1cefe2b53a98659cce3c8c73d88bf5d84 Author: Ian Roddis <tech@kinesin.ca> Date: Wed Dec 15 09:52:29 2021 -0400 Copying daggyd for daggyr template, adding in basic routes
69 lines
2.4 KiB
Markdown
69 lines
2.4 KiB
Markdown
# Daggy Runner
|
|
|
|
`daggyr` is a REST server process that acts as a remote task executor.
|
|
|
|
# Running it
|
|
|
|
```bash
|
|
daggyr # That's it, will listen on 127.0.0.1:2504 , and run with a local executor
|
|
daggyr -d # Daemonize
|
|
|
|
daggyr --config FILE # Run with a config file
|
|
```
|
|
|
|
# Capacity and Allocation
|
|
|
|
On startup, a server's capacity is determined automatically. The capacities are:
|
|
|
|
| Capacity | Determined by | Default | Notes |
|
|
|-----------|---------------------------------------|-----------------------------|----------------------------------|
|
|
| cores | `std::thread::hardware_concurrency()` | `max(1, max - 2)` | A value of 0 will mean all cores |
|
|
| memory_mb | `sysinfo.h` | `max(100, totalram * 0.75)` | `totalram` is converted to MB |
|
|
|
|
When a `daggyd` process is selecting a runner to send a task to, it will
|
|
query the current capacities, and choose the runner that:
|
|
|
|
- Can satisfy the requirements of the task
|
|
- Has the lowest impact, which is the largest relative drop in available capacity across all capacities.
|
|
|
|
For instance, if a job were submitted that requires 2 cores and 5g of memory,
|
|
and three runners reported the following capacities:
|
|
|
|
| Runner | free_cores | impact_cores | free_memory | impact_memory | max_impact |
|
|
|--------|------------|--------------|-------------|---------------|------------|
|
|
| 1 | 70 | 2.8% | 20g | 25.00% | 25% |
|
|
| 2 | 4 | 50.0% | 80g | 6.25% | 50% |
|
|
| 3 | 10 | 20.0% | 30g | 16.67% | 20% |
|
|
|
|
Runner 3 would be selected. Even though it doesn't have the most memory
|
|
or CPU capacity, allocating the job to it minimizes the impact to the
|
|
overall availability.
|
|
|
|
# Submission and Execution
|
|
|
|
Tasks submitted to the runner will be executed with [cgroups](https://www.man7.org/linux/man-pages/man7/cgroups.7.html)
|
|
to enforce limits.
|
|
|
|
Jobs are submitted asynchronously, and rely on the client to poll for
|
|
results using the `GET /api/v1/task/:task_id` to get the resulting
|
|
TaskAttempt.
|
|
|
|
Runners are **stateless**, meaning that killing one will kill any
|
|
running tasks and any stored results will be lost.
|
|
|
|
# Config Files
|
|
|
|
```json
|
|
{
|
|
"web-threads": 50,
|
|
"port": 2504,
|
|
"ip": "localhost",
|
|
"capacity_overrides": {
|
|
"cores": 10,
|
|
"memory_mb": 100
|
|
}
|
|
}
|
|
```
|
|
|
|
Capacities can be overriden from the auto-discovered results.
|