Files

Ian Roddis 71756d9ec2 Fixing daggyr issues when reporting on tasks with very large outputs

(>10kb).

Squashed commit of the following:

commit b87fa418b4aca78928186a8fa992bef701e044a4
Author: Ian Roddis <tech@kinesin.ca>
Date:   Mon Feb 14 12:55:34 2022 -0400

    removing memory leak

commit 5e284ab92dbea991262a08c0cd50d6fc2f912e3b
Author: Ian Roddis <tech@kinesin.ca>
Date:   Mon Feb 14 11:58:57 2022 -0400

    Speeding up serialization, fixing payload sizing issue on daggyr

commit e5e358820da4c2587741abdc3b6b103e5a4d4dd3
Author: Ian Roddis <tech@kinesin.ca>
Date:   Sun Feb 13 22:24:04 2022 -0400

    changing newlines to std::endl for flush goodness

commit 705ec86b75be947e64f4124ec8017cba2c8465e6
Author: Ian Roddis <tech@kinesin.ca>
Date:   Sun Feb 13 22:16:56 2022 -0400

    adding more logging

commit aa3db9c23e55da7a0523dc57e268b605ce8faac3
Author: Ian Roddis <tech@kinesin.ca>
Date:   Sun Feb 13 22:13:56 2022 -0400

    Adding threadid

commit 3b1a0f1333b2d43bc5ecad0746435504babbaa61
Author: Ian Roddis <tech@kinesin.ca>
Date:   Sun Feb 13 22:13:24 2022 -0400

    Adding some debugging

commit 804507e65251858fa597b7c27bcece8d8dfd589d
Author: Ian Roddis <tech@kinesin.ca>
Date:   Sun Feb 13 21:52:53 2022 -0400

    Removing curl global cleanup

2022-02-15 11:22:21 -04:00

daggyr

Fixing daggyr issues when reporting on tasks with very large outputs

2022-02-15 11:22:21 -04:00

libdaggyr

Fixing daggyr issues when reporting on tasks with very large outputs

2022-02-15 11:22:21 -04:00

tests

Fixing unit tests, fixing cmake so make test works properly

2022-02-04 12:36:15 -04:00

CMakeLists.txt

Adding support for remote execution daemons.

2021-12-16 12:16:12 -04:00

README.md

Adding support for remote execution daemons.

2021-12-16 12:16:12 -04:00

README.md

Daggy Runner

daggyr is a REST server process that acts as a remote task executor.

Running it

daggyr    # That's it, will listen on 127.0.0.1:2504 , and run with a local executor
daggyr -d # Daemonize

daggyr --config FILE # Run with a config file

Capacity and Allocation

On startup, a server's capacity is determined automatically. The capacities are:

Capacity	Determined by	Default	Notes
cores	`std::thread::hardware_concurrency()`	`max(1, max - 2)`	A value of 0 will mean all cores
memory_mb	`sysinfo.h`	`max(100, totalram * 0.75)`	`totalram` is converted to MB

When a daggyd process is selecting a runner to send a task to, it will query the current capacities, and choose the runner that:

Can satisfy the requirements of the task
Has the lowest impact, which is the largest relative drop in available capacity across all capacities.

For instance, if a job were submitted that requires 2 cores and 5g of memory, and three runners reported the following capacities:

Runner	free_cores	impact_cores	free_memory	impact_memory	max_impact
1	70	2.8%	20g	25.00%	25%
2	4	50.0%	80g	6.25%	50%
3	10	20.0%	30g	16.67%	20%

Runner 3 would be selected. Even though it doesn't have the most memory or CPU capacity, allocating the job to it minimizes the impact to the overall availability.

Submission and Execution

Tasks submitted to the runner will be executed with cgroups to enforce limits.

Jobs are submitted asynchronously, and rely on the client to poll for results using the GET /api/v1/task/:task_id to get the resulting TaskAttempt.

Runners are stateless, meaning that killing one will kill any running tasks and any stored results will be lost.

Config Files

{
  "web-threads": 50,
  "port":  2504,
  "ip": "localhost",
  "capacity_overrides": {
    "cores": 10,
    "memory_mb": 100
  }
}

Capacities can be overriden from the auto-discovered results.