Files
daggy/daggyr
Ian Roddis 71756d9ec2 Fixing daggyr issues when reporting on tasks with very large outputs
(>10kb).

Squashed commit of the following:

commit b87fa418b4aca78928186a8fa992bef701e044a4
Author: Ian Roddis <tech@kinesin.ca>
Date:   Mon Feb 14 12:55:34 2022 -0400

    removing memory leak

commit 5e284ab92dbea991262a08c0cd50d6fc2f912e3b
Author: Ian Roddis <tech@kinesin.ca>
Date:   Mon Feb 14 11:58:57 2022 -0400

    Speeding up serialization, fixing payload sizing issue on daggyr

commit e5e358820da4c2587741abdc3b6b103e5a4d4dd3
Author: Ian Roddis <tech@kinesin.ca>
Date:   Sun Feb 13 22:24:04 2022 -0400

    changing newlines to std::endl for flush goodness

commit 705ec86b75be947e64f4124ec8017cba2c8465e6
Author: Ian Roddis <tech@kinesin.ca>
Date:   Sun Feb 13 22:16:56 2022 -0400

    adding more logging

commit aa3db9c23e55da7a0523dc57e268b605ce8faac3
Author: Ian Roddis <tech@kinesin.ca>
Date:   Sun Feb 13 22:13:56 2022 -0400

    Adding threadid

commit 3b1a0f1333b2d43bc5ecad0746435504babbaa61
Author: Ian Roddis <tech@kinesin.ca>
Date:   Sun Feb 13 22:13:24 2022 -0400

    Adding some debugging

commit 804507e65251858fa597b7c27bcece8d8dfd589d
Author: Ian Roddis <tech@kinesin.ca>
Date:   Sun Feb 13 21:52:53 2022 -0400

    Removing curl global cleanup
2022-02-15 11:22:21 -04:00
..

Daggy Runner

daggyr is a REST server process that acts as a remote task executor.

Running it

daggyr    # That's it, will listen on 127.0.0.1:2504 , and run with a local executor
daggyr -d # Daemonize

daggyr --config FILE # Run with a config file

Capacity and Allocation

On startup, a server's capacity is determined automatically. The capacities are:

Capacity Determined by Default Notes
cores std::thread::hardware_concurrency() max(1, max - 2) A value of 0 will mean all cores
memory_mb sysinfo.h max(100, totalram * 0.75) totalram is converted to MB

When a daggyd process is selecting a runner to send a task to, it will query the current capacities, and choose the runner that:

  • Can satisfy the requirements of the task
  • Has the lowest impact, which is the largest relative drop in available capacity across all capacities.

For instance, if a job were submitted that requires 2 cores and 5g of memory, and three runners reported the following capacities:

Runner free_cores impact_cores free_memory impact_memory max_impact
1 70 2.8% 20g 25.00% 25%
2 4 50.0% 80g 6.25% 50%
3 10 20.0% 30g 16.67% 20%

Runner 3 would be selected. Even though it doesn't have the most memory or CPU capacity, allocating the job to it minimizes the impact to the overall availability.

Submission and Execution

Tasks submitted to the runner will be executed with cgroups to enforce limits.

Jobs are submitted asynchronously, and rely on the client to poll for results using the GET /api/v1/task/:task_id to get the resulting TaskAttempt.

Runners are stateless, meaning that killing one will kill any running tasks and any stored results will be lost.

Config Files

{
  "web-threads": 50,
  "port":  2504,
  "ip": "localhost",
  "capacity_overrides": {
    "cores": 10,
    "memory_mb": 100
  }
}

Capacities can be overriden from the auto-discovered results.