Files
daggy/daggyr
Ian Roddis 57e93b5045 Simplifying daggyr server, and returning to a
task submit / task poll model.

Squashed commit of the following:

commit 0ef57f095d15f0402915de54f83c1671120bd228
Author: Ian Roddis <tech@kinesin.ca>
Date:   Wed Feb 2 08:18:03 2022 -0400

    Simplifying task polling and reducing lock scopes

commit d77ef02021cc728849c7d1fb0185dd1a861b4a3d
Author: Ian Roddis <tech@kinesin.ca>
Date:   Wed Feb 2 08:02:47 2022 -0400

    Simplifying check

commit c1acf34440162abb890a959f3685c2d184242ed5
Author: Ian Roddis <tech@kinesin.ca>
Date:   Wed Feb 2 08:01:13 2022 -0400

    Removing capacity tracking from runner, since it is maintained in daggyd

commit 9401246f92113ab140143c1895978b9de8bd9972
Author: Ian Roddis <tech@kinesin.ca>
Date:   Wed Feb 2 07:47:28 2022 -0400

    Adding retry for submission

commit 398aa04a320347bb35f23f3f101d91ab4df25652
Author: Ian Roddis <tech@kinesin.ca>
Date:   Tue Feb 1 14:54:20 2022 -0400

    Adding in execution note, as well as requeuing the result if the peer disconnects

commit 637b14af6d5b53f25b9c38d4c8a7ed8532af5599
Author: Ian Roddis <tech@kinesin.ca>
Date:   Tue Feb 1 14:13:59 2022 -0400

    Fixing locking issues

commit 4d6716dfda8aa7f51e0abbdab833aff618915ba0
Author: Ian Roddis <tech@kinesin.ca>
Date:   Tue Feb 1 13:33:14 2022 -0400

    Single task daggyr working

commit bd48a5452a92817faf25ee44a6115aaa2f6c30d1
Author: Ian Roddis <tech@kinesin.ca>
Date:   Tue Feb 1 12:22:04 2022 -0400

    Checkpointing work
2022-02-02 21:12:05 -04:00
..
2022-01-28 10:23:21 -04:00

Daggy Runner

daggyr is a REST server process that acts as a remote task executor.

Running it

daggyr    # That's it, will listen on 127.0.0.1:2504 , and run with a local executor
daggyr -d # Daemonize

daggyr --config FILE # Run with a config file

Capacity and Allocation

On startup, a server's capacity is determined automatically. The capacities are:

Capacity Determined by Default Notes
cores std::thread::hardware_concurrency() max(1, max - 2) A value of 0 will mean all cores
memory_mb sysinfo.h max(100, totalram * 0.75) totalram is converted to MB

When a daggyd process is selecting a runner to send a task to, it will query the current capacities, and choose the runner that:

  • Can satisfy the requirements of the task
  • Has the lowest impact, which is the largest relative drop in available capacity across all capacities.

For instance, if a job were submitted that requires 2 cores and 5g of memory, and three runners reported the following capacities:

Runner free_cores impact_cores free_memory impact_memory max_impact
1 70 2.8% 20g 25.00% 25%
2 4 50.0% 80g 6.25% 50%
3 10 20.0% 30g 16.67% 20%

Runner 3 would be selected. Even though it doesn't have the most memory or CPU capacity, allocating the job to it minimizes the impact to the overall availability.

Submission and Execution

Tasks submitted to the runner will be executed with cgroups to enforce limits.

Jobs are submitted asynchronously, and rely on the client to poll for results using the GET /api/v1/task/:task_id to get the resulting TaskAttempt.

Runners are stateless, meaning that killing one will kill any running tasks and any stored results will be lost.

Config Files

{
  "web-threads": 50,
  "port":  2504,
  "ip": "localhost",
  "capacity_overrides": {
    "cores": 10,
    "memory_mb": 100
  }
}

Capacities can be overriden from the auto-discovered results.