Files
daggy/daggyr/README.md
Ian Roddis 8d00621908 Adding support for remote execution daemons.
Squashed commit of the following:

commit 69d5ef7a256b86a86d46e5ae374c00fded1497ea
Author: Ian Roddis <tech@kinesin.ca>
Date:   Thu Dec 16 12:15:55 2021 -0400

    Updating readme

commit 94a9f676d0f9cc0b55cdc18c4927eaea40d82c77
Author: Ian Roddis <tech@kinesin.ca>
Date:   Thu Dec 16 12:05:36 2021 -0400

    Fixing serialization of attempt records when querying entire dag

commit 945e5f90b24abf07c9af1bc4c6bbcb33e93b8069
Author: Ian Roddis <tech@kinesin.ca>
Date:   Thu Dec 16 11:37:59 2021 -0400

    Compiles cleanly...

commit 8b23e46081d47fb80dc1a2d998fc6dc4bbf301a8
Author: Ian Roddis <tech@kinesin.ca>
Date:   Thu Dec 16 10:43:03 2021 -0400

    Adding in missing source file to cmake build list

commit 6d10d9791206e2bc15788beadeea580b8e43a853
Author: Ian Roddis <tech@kinesin.ca>
Date:   Thu Dec 16 10:41:43 2021 -0400

    Adding new executors

commit 42a2c67f4d6ae99df95d917c8621d78cd99837a1
Author: Ian Roddis <tech@kinesin.ca>
Date:   Thu Dec 16 10:27:14 2021 -0400

    Fixing missing curl cmake dependency

commit 394bc4c5d51ecee7bf14712f719c8bf7e97fb0fa
Author: Ian Roddis <tech@kinesin.ca>
Date:   Thu Dec 16 10:21:58 2021 -0400

    Fixing missing curl cmake dependency

commit dd9efc8e7e7770ea1bcbccb70a1af9cfcff0414c
Author: Ian Roddis <tech@kinesin.ca>
Date:   Wed Dec 15 17:15:38 2021 -0400

    Checkpointing progress

commit 3b3b55d6037bb96e46de6763f486f4ecb92fe6a0
Author: Ian Roddis <tech@kinesin.ca>
Date:   Wed Dec 15 14:21:18 2021 -0400

    updating readme

commit 303027c11452941b2a0c0d1b04ac5942e79efd74
Author: Ian Roddis <tech@kinesin.ca>
Date:   Wed Dec 15 14:17:16 2021 -0400

    Namespacing daggyd
    Adding more error checking around deserialization of parameters
    Adding tests for runner agent

commit c592eaeba12e2a449bae401e8c1d9ed236416d52
Author: Ian Roddis <tech@kinesin.ca>
Date:   Wed Dec 15 11:20:21 2021 -0400

    Checkpointing work

commit fb1862d1cefe2b53a98659cce3c8c73d88bf5d84
Author: Ian Roddis <tech@kinesin.ca>
Date:   Wed Dec 15 09:52:29 2021 -0400

    Copying daggyd for daggyr template, adding in basic routes
2021-12-16 12:16:12 -04:00

2.4 KiB

Daggy Runner

daggyr is a REST server process that acts as a remote task executor.

Running it

daggyr    # That's it, will listen on 127.0.0.1:2504 , and run with a local executor
daggyr -d # Daemonize

daggyr --config FILE # Run with a config file

Capacity and Allocation

On startup, a server's capacity is determined automatically. The capacities are:

Capacity Determined by Default Notes
cores std::thread::hardware_concurrency() max(1, max - 2) A value of 0 will mean all cores
memory_mb sysinfo.h max(100, totalram * 0.75) totalram is converted to MB

When a daggyd process is selecting a runner to send a task to, it will query the current capacities, and choose the runner that:

  • Can satisfy the requirements of the task
  • Has the lowest impact, which is the largest relative drop in available capacity across all capacities.

For instance, if a job were submitted that requires 2 cores and 5g of memory, and three runners reported the following capacities:

Runner free_cores impact_cores free_memory impact_memory max_impact
1 70 2.8% 20g 25.00% 25%
2 4 50.0% 80g 6.25% 50%
3 10 20.0% 30g 16.67% 20%

Runner 3 would be selected. Even though it doesn't have the most memory or CPU capacity, allocating the job to it minimizes the impact to the overall availability.

Submission and Execution

Tasks submitted to the runner will be executed with cgroups to enforce limits.

Jobs are submitted asynchronously, and rely on the client to poll for results using the GET /api/v1/task/:task_id to get the resulting TaskAttempt.

Runners are stateless, meaning that killing one will kill any running tasks and any stored results will be lost.

Config Files

{
  "web-threads": 50,
  "port":  2504,
  "ip": "localhost",
  "capacity_overrides": {
    "cores": 10,
    "memory_mb": 100
  }
}

Capacities can be overriden from the auto-discovered results.