Files
sovereign/README.md
T
2026-05-02 23:08:30 -03:00

686 lines
29 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Sovereign
Sovereign is an Ansible project that deploys a complete self-hosted infrastructure stack for small businesses using Docker and Docker Compose on a single Linux host. Every service is behind Traefik (TLS via Let's Encrypt), authenticated through Authentik (OIDC/OAuth2), and ships logs to Graylog (GELF UDP).
## Table of Contents
- [Services](#services)
- [Requirements](#requirements)
- [Quick Start](#quick-start)
- [New Tenant Setup](#new-tenant-setup)
- [Configuration Reference](#configuration-reference)
- [Deployment](#deployment)
- [Testing](#testing)
- [Maintenance](#maintenance)
- [Architecture Notes](#architecture-notes)
---
## Services
| Role | Service | URL |
|---------------|-------------------------------------|------------------------------------|
| `common` | Traefik (reverse proxy + TLS) | `traefik.<domain>` |
| `dns` | BIND9 (authoritative nameserver) | `ns1.<domain>` |
| `graylog` | Graylog + OpenSearch + MongoDB | `logs.<domain>` |
| `authentik` | Authentik (identity provider) | `auth.<domain>` |
| `minio` | MinIO (object storage) | `s3.<domain>`, `minio.<domain>` |
| `nextcloud` | Nextcloud + MariaDB + Redis | `cloud.<domain>` |
| `stalwart` | Stalwart Mail (SMTP/IMAP) | `mail.<domain>` |
| `roundcube` | Roundcube (webmail) | `webmail.<domain>` |
| `matrix` | Synapse + Element | `matrix.<domain>`, `chat.<domain>` |
| `jitsi` | Jitsi Meet | `meet.<domain>` |
| `headscale` | Headscale (WireGuard mesh VPN) | `headscale.<domain>` |
| `wazuh` | Wazuh Manager + Indexer + Dashboard | `wazuh.<domain>` |
| `vaultwarden` | Vaultwarden + PostgreSQL | `vault.<domain>` |
| `forgejo` | Forgejo + PostgreSQL | `git.<domain>` |
| `uptimekuma` | Uptime Kuma (uptime monitoring) | `status.<domain>` |
| `automatisch` | Automatisch (workflow automation) | `automate.<domain>` |
| `twenty` | Twenty CRM + PostgreSQL + Redis | `crm.<domain>` |
| `website` | Nginx (static website) | `<domain>` |
---
## Requirements
**Control machine** (where you run Ansible):
- Python 3.9+
- Ansible 8+ — installed via `pip install -r requirements.txt` (see [Installing Dependencies](#installing-dependencies))
- Ansible collections (see [Installing Dependencies](#installing-dependencies))
**Target host**:
- Ubuntu 22.04 or 24.04 (amd64)
- Root or sudo access
- Ports 80, 443, 51820/UDP, and 53/TCP+UDP open
- Domain registered with a registrar that supports custom nameservers and glue records
### Installing Dependencies
Install Python packages (Ansible, Molecule, and linting tools) and Ansible collections:
```bash
pip install -r requirements.txt
ansible-galaxy collection install -r requirements.yml
```
Python packages (`requirements.txt`): `ansible`, `molecule`, `ansible-lint`, `yamllint`.
Ansible collections (`requirements.yml`): `community.docker >=3.0.0`, `community.general >=8.0.0`, `ansible.posix >=1.5.0`.
---
## Quick Start
```bash
# 1. Clone the repo
git clone <repo-url> sovereign && cd sovereign
# 2. Install Python packages and Ansible collections
pip install -r requirements.txt
ansible-galaxy collection install -r requirements.yml
# 3. Set the target host connection details
export SOVEREIGN_HOST=203.0.113.10
export SOVEREIGN_USER=ubuntu
export SOVEREIGN_SSH_KEY=~/.ssh/id_rsa
# 4. Generate a complete, deployment-ready config
python3 configure.py
# 5. Deploy
ansible-playbook playbooks/site.yml
```
---
## New Tenant Setup
Each deployment is controlled entirely by `inventories/production/group_vars/all.yml`. The recommended way to create this file for a new tenant is with the interactive configurator script.
### Using the configurator (recommended)
```bash
python3 configure.py
```
The script walks you through each configuration section, prompts for the handful of deployment-specific values (domain name, organisation name, server IP, admin email), and auto-generates every password and cryptographic secret using a cryptographically secure random source. It then writes a complete `group_vars/all.yml` with no `changeme_*` placeholders left, and prints a credential summary to the terminal.
You can also write to a custom path or pipe the YAML to stdout:
```bash
python3 configure.py -o /path/to/all.yml # custom output path
python3 configure.py --stdout > all.yml # print YAML; prompts go to stderr
just configure # shorthand via Justfile
just configure-to /path/to/all.yml
```
The configurator prompts for the following values (all others take secure defaults):
| Prompt | Notes |
|--------|-------|
| Base domain | e.g. `acme.com` — required |
| Organisation name | Shown in service UIs |
| Admin email | Used for ACME/Let's Encrypt and initial admin accounts |
| Graylog host IP | IP reachable from Docker containers for GELF UDP |
| Server public IPv4 | Used to populate DNS A records |
| DMARC policy | `none`, `quarantine`, or `reject` |
| DKIM selector | Defaults to `default` |
All passwords, secret keys, database credentials, and signing tokens are generated automatically.
### Manual setup
If you prefer to configure the file by hand, copy `all.yml`, set `base_domain`, and replace every `changeme_*` placeholder with a secure value. Helper commands for generating specific values:
```bash
# Generic random secret
openssl rand -base64 32
# Graylog password_secret (min 16 chars)
openssl rand -base64 48
# Graylog root password hash
echo -n 'yourpassword' | sha256sum | awk '{print $1}'
# Traefik dashboard password (htpasswd format)
htpasswd -nb admin yourpassword
# Authentik secret key (exactly 50 characters)
openssl rand -base64 37 | head -c 50
# Roundcube DES key (exactly 24 characters)
openssl rand -base64 18 | head -c 24
# Forgejo tokens (run 3× for secret_key, internal_token, lfs_jwt_secret)
openssl rand -hex 32
```
### Post-deployment steps
These steps must be completed after the first `ansible-playbook` run regardless of whether you used the configurator or manual setup.
#### DNS — nameserver delegation
The `dns` role runs BIND9 as an authoritative nameserver for your domain. After deployment:
1. Register a **glue record** at your domain registrar: `ns1.<domain>` → your server's public IP.
2. Set your domain's **nameservers** to `ns1.<domain>`.
Once delegation propagates (typically minutes to hours), all service subdomains will resolve via BIND9 without needing individual A records at your registrar.
#### DKIM — email signing key
Stalwart generates its DKIM signing key on first start. After Stalwart is running:
1. Log in to the Stalwart admin UI at `https://mail.<domain>`.
2. Navigate to **Settings → DKIM keys** and copy the public key.
3. Add it to `all.yml`:
```yaml
stalwart_dkim_public_key: "MIGfMA0GCSqGSIb3DQEB..."
```
4. Re-run the DNS role to publish the TXT record:
```bash
ansible-playbook playbooks/site.yml --tags dns
```
#### Authentik OIDC applications
Log into Authentik at `https://auth.<domain>` and create an OAuth2/OIDC provider and application for each service that integrates with SSO. Then fill in the `changeme_*_oidc_secret` placeholders in the relevant compose templates under `roles/<service>/templates/`.
Services that use native OIDC — create an OAuth2/OIDC provider in Authentik for each, then set the corresponding variable in `all.yml`:
| Service | `all.yml` variable |
|---------|-------------------|
| MinIO | `changeme_minio_oidc_secret` |
| Headscale | `changeme_headscale_oidc_secret` |
| Vaultwarden | `changeme_vaultwarden_oidc_secret` |
| Forgejo | `changeme_forgejo_oidc_secret` |
| Twenty CRM | `twenty_oidc_client_secret` |
For Twenty CRM, after setting the variable and redeploying the role, also configure the provider inside the app: **Settings → Security → SSO → Add provider**, using the discovery URL `https://auth.<domain>/application/o/twenty/.well-known/openid-configuration`.
Services that use Authentik **forward auth** (no native OIDC) — create a **Proxy Provider** in Forward Auth mode for each, create an Application bound to it, and add it to the embedded outpost:
| Service | External host |
|---------|--------------|
| Uptime Kuma | `https://status.<domain>` |
| Automatisch | `https://automate.<domain>` |
With the embedded outpost running, Traefik will redirect unauthenticated requests to the Authentik login page automatically — no further role changes are needed.
#### Wazuh TLS certificates
Wazuh requires TLS certificates between its manager, indexer, and dashboard components before the first run. Generate them using the Wazuh certificate tool:
```bash
# Download the Wazuh certs generation tool
curl -sO https://packages.wazuh.com/4.9/wazuh-certs-tool.sh
curl -sO https://packages.wazuh.com/4.9/config.yml
# Edit config.yml with your node hostnames, then run:
bash wazuh-certs-tool.sh -A
# Copy the resulting certs into the wazuh data directory on the target host
# before running the wazuh role for the first time.
```
Refer to the [Wazuh Docker documentation](https://documentation.wazuh.com/current/deployment-options/docker/wazuh-container.html) for full details.
#### Static website content
Place your static HTML/CSS/JS files in `/opt/sovereign/website/html/` on the target host. Nginx serves this directory at `https://<domain>`. The directory is created by the `website` role on first deployment — you can populate it before or after running the playbook.
---
## Configuration Reference
All variables live in `inventories/production/group_vars/all.yml`.
### Global
| Variable | Default | Description |
|----------|---------|-------------|
| `base_domain` | `example.com` | Root domain. All subdomains are derived from this. |
| `sovereign_base_dir` | `/opt/sovereign` | Base path on the target host for all service data. |
### Branding
These variables apply consistent tenant branding across all services that support it. Services apply branding via environment variables, config file templates, or post-deploy API calls (e.g. Nextcloud `occ`, Authentik blueprints).
| Variable | Default | Description |
|----------|---------|-------------|
| `tenant_name` | `Example Corp` | Display name shown in service UIs, email subjects, and page titles. |
| `tenant_logo_local_path` | `""` | Path to a logo image on the Ansible control machine (PNG recommended). Leave empty to use each service's default logo. Example: `files/logo.png`. |
| `tenant_primary_color` | `#2563eb` | Primary brand colour (hex). Used for backgrounds, buttons, and highlights. |
| `tenant_accent_color` | `#1e40af` | Secondary/accent colour (hex). |
Services with branding support: Authentik (title, colour, logo via blueprint), Element/Matrix (brand name, theme), Forgejo (app name, logo), Nextcloud (name, colour, logo via `occ`), Jitsi (app name, watermark), Roundcube (product name), Wazuh dashboard (title).
### Traefik (`common` role)
| Variable | Default | Description |
|----------|---------|-------------|
| `traefik_acme_email` | `admin@<domain>` | Email used for Let's Encrypt certificate registration. |
| `traefik_domain` | `traefik.<domain>` | Traefik dashboard URL. |
| `traefik_dashboard_password` | — | htpasswd-formatted credential for dashboard basic auth. |
| `traefik_version` | `v3.1` | Traefik image tag. |
### DNS / BIND9
| Variable | Default | Description |
|----------|---------|-------------|
| `dns_server_ip` | — | Public IPv4 address of this server. Used for all service A records and the `ns1` glue record. |
| `dns_ns_hostname` | `ns1.<domain>` | Fully-qualified hostname of the nameserver. |
| `dns_ttl` | `3600` | Default TTL for zone records (seconds). |
| `bind_version` | `9.18-22.04_beta` | `ubuntu/bind9` image tag. |
| `stalwart_dkim_selector` | `default` | DKIM selector name. Must match the selector configured in Stalwart. |
| `stalwart_dkim_public_key` | `""` | RSA public key for DKIM signing. Retrieve from the Stalwart admin UI after first deployment. Leave empty to skip the DKIM TXT record. Long keys are automatically split into 255-byte chunks as required by RFC 4871. |
| `dmarc_policy` | `quarantine` | DMARC enforcement policy: `none`, `quarantine`, or `reject`. |
| `dmarc_rua` | `mailto:dmarc-reports@<domain>` | Address to receive aggregate DMARC reports. |
| `dmarc_ruf` | `mailto:dmarc-forensics@<domain>` | Address to receive forensic DMARC reports. |
The DNS role publishes the following records for `<domain>`:
| Type | Name | Value |
|------|------|-------|
| A | `ns1` | `dns_server_ip` |
| A | `@`, all service subdomains | `dns_server_ip` |
| MX | `@` | `mail.<domain>` (priority 10) |
| TXT | `@` | SPF: `v=spf1 mx ~all` |
| TXT | `_dmarc` | DMARC policy record |
| TXT | `<selector>._domainkey` | DKIM public key (when `stalwart_dkim_public_key` is set) |
### Graylog
| Variable | Default | Description |
|----------|---------|-------------|
| `graylog_domain` | `logs.<domain>` | Graylog web UI URL. |
| `graylog_version` | `6.0` | Graylog image tag. |
| `graylog_password_secret` | — | Random secret, minimum 16 characters. |
| `graylog_root_password_sha2` | — | SHA-256 hash of the root (admin) password. |
| `graylog_host` | `127.0.0.1` | IP address reachable from Docker containers for GELF ingestion. Usually the host's Docker bridge IP or `127.0.0.1` when using host networking. |
| `graylog_gelf_port` | `12201` | UDP port for GELF log ingestion. |
### Authentik
| Variable | Default | Description |
|----------|---------|-------------|
| `authentik_domain` | `auth.<domain>` | Authentik URL. |
| `authentik_version` | `2024.10.5` | Authentik image tag. |
| `authentik_secret_key` | — | 50-character random string used for signing. |
| `authentik_db_password` | — | PostgreSQL password for Authentik's database. |
| `authentik_admin_email` | `admin@<domain>` | Initial admin account email. |
| `authentik_admin_password` | — | Initial admin account password. |
### Stalwart Mail
| Variable | Default | Description |
|----------|---------|-------------|
| `stalwart_domain` | `mail.<domain>` | Stalwart web admin URL. |
| `stalwart_version` | `latest` | Stalwart image tag. |
| `stalwart_admin_password` | — | Stalwart admin password. |
### Roundcube
| Variable | Default | Description |
|----------|---------|-------------|
| `roundcube_domain` | `webmail.<domain>` | Roundcube URL. |
| `roundcube_version` | `latest` | Roundcube image tag. |
| `roundcube_db_password` | — | MariaDB password for Roundcube's database. |
| `roundcube_des_key` | — | Exactly 24-character key for session encryption. |
### Wazuh
| Variable | Default | Description |
|----------|---------|-------------|
| `wazuh_domain` | `wazuh.<domain>` | Wazuh dashboard URL. |
| `wazuh_version` | `4.9.0` | Wazuh image tag. |
| `wazuh_admin_password` | — | Wazuh dashboard admin password. |
| `wazuh_api_password` | — | Wazuh REST API password. |
### Headscale / WireGuard
| Variable | Default | Description |
|----------|---------|-------------|
| `headscale_domain` | `headscale.<domain>` | Headscale control plane URL. |
| `headscale_version` | `0.23.0` | Headscale image tag. |
| `wireguard_port` | `51820` | UDP port for WireGuard traffic. Must be open in the firewall. |
| `headscale_noise_private_key` | `""` | Leave blank; generated automatically on first run. |
### Matrix / Element
| Variable | Default | Description |
|----------|---------|-------------|
| `matrix_domain` | `matrix.<domain>` | Synapse homeserver URL. |
| `element_domain` | `chat.<domain>` | Element web client URL. |
| `matrix_version` | `v1.118.0` | Synapse image tag. |
| `matrix_registration_secret` | — | Shared secret for server-side user registration. |
| `matrix_db_password` | — | PostgreSQL password for Synapse's database. |
### Jitsi
| Variable | Default | Description |
|----------|---------|-------------|
| `jitsi_domain` | `meet.<domain>` | Jitsi Meet URL. |
| `jitsi_version` | `stable-9753` | Jitsi image tag. |
| `jitsi_jicofo_auth_password` | — | Internal XMPP password for Jicofo. |
| `jitsi_jvb_auth_password` | — | Internal XMPP password for the video bridge. |
| `jitsi_jibri_recorder_password` | — | Internal XMPP password for Jibri (recording). |
| `jitsi_jibri_xmpp_password` | — | Internal XMPP password for Jibri XMPP. |
| `jitsi_turn_secret` | — | Shared secret for TURN server authentication. |
### MinIO
| Variable | Default | Description |
|----------|---------|-------------|
| `minio_domain` | `s3.<domain>` | MinIO S3 API endpoint. |
| `minio_console_domain` | `minio.<domain>` | MinIO web console URL. |
| `minio_version` | `latest` | MinIO image tag. |
| `minio_root_user` | `minioadmin` | MinIO root username. |
| `minio_root_password` | — | MinIO root password. |
| `minio_nextcloud_bucket` | `nextcloud` | Bucket name for Nextcloud primary storage. |
| `minio_nextcloud_access_key` | `nextcloud` | Access key for Nextcloud's MinIO credentials. |
| `minio_nextcloud_secret_key` | — | Secret key for Nextcloud's MinIO credentials. |
### Nextcloud
| Variable | Default | Description |
|----------|---------|-------------|
| `nextcloud_domain` | `cloud.<domain>` | Nextcloud URL. |
| `nextcloud_version` | `29` | Nextcloud image tag. |
| `nextcloud_admin_user` | `admin` | Initial Nextcloud admin username. |
| `nextcloud_admin_password` | — | Initial Nextcloud admin password. |
| `nextcloud_db_password` | — | MariaDB password for Nextcloud's database. |
| `nextcloud_db_root_password` | — | MariaDB root password. |
### Vaultwarden
| Variable | Default | Description |
|----------|---------|-------------|
| `vaultwarden_domain` | `vault.<domain>` | Vaultwarden URL. |
| `vaultwarden_version` | `latest` | Vaultwarden image tag. |
| `vaultwarden_admin_token` | — | Token for the `/admin` panel. |
| `vaultwarden_db_password` | — | PostgreSQL password for Vaultwarden's database. |
### Forgejo
| Variable | Default | Description |
|----------|---------|-------------|
| `forgejo_domain` | `git.<domain>` | Forgejo URL. |
| `forgejo_version` | `latest` | Forgejo image tag. |
| `forgejo_db_password` | — | PostgreSQL password for Forgejo's database. |
| `forgejo_secret_key` | — | Random secret for internal signing. |
| `forgejo_internal_token` | — | Random token for internal API calls. |
| `forgejo_lfs_jwt_secret` | — | Random secret for Git LFS JWT tokens. |
| `forgejo_admin_user` | `admin` | Initial admin username. |
| `forgejo_admin_password` | — | Initial admin password. |
| `forgejo_admin_email` | `admin@<domain>` | Initial admin email. |
| `forgejo_ssh_port` | `2222` | Host port for Forgejo SSH access. Must be open in the firewall. |
### Uptime Kuma
| Variable | Default | Description |
|----------|---------|-------------|
| `uptimekuma_domain` | `status.<domain>` | Uptime Kuma dashboard URL. |
| `uptimekuma_version` | `1` | Uptime Kuma image tag (`1` tracks the latest v1 release). |
Access is controlled entirely by Authentik forward auth — Uptime Kuma's own account system is not used. After deployment, add monitors for each service subdomain via the web UI.
### Automatisch
| Variable | Default | Description |
|----------|---------|-------------|
| `automatisch_domain` | `automate.<domain>` | Automatisch URL. |
| `automatisch_version` | `latest` | Automatisch image tag. |
| `automatisch_db_password` | — | PostgreSQL password for Automatisch's database. |
| `automatisch_encryption_key` | — | Encrypts stored integration credentials. Generate with `openssl rand -base64 36`. **Never rotate after first deployment** — doing so breaks all existing connections. |
| `automatisch_webhook_secret_key` | — | Verifies incoming webhook requests. Same rotation warning applies. |
| `automatisch_app_secret_key` | — | Used for user session signing. Same rotation warning applies. |
Access is controlled by Authentik forward auth.
### Twenty CRM
| Variable | Default | Description |
|----------|---------|-------------|
| `twenty_domain` | `crm.<domain>` | Twenty CRM URL. |
| `twenty_version` | `latest` | Twenty image tag. |
| `twenty_app_secret` | — | Random secret for JWT signing. Generate with `openssl rand -base64 36`. |
| `twenty_db_password` | — | PostgreSQL password for Twenty's database. |
| `twenty_oidc_client_secret` | — | OIDC client secret from the Authentik OAuth2 application. |
### Website
| Variable | Default | Description |
|----------|---------|-------------|
| `website_nginx_version` | `alpine` | Nginx image tag used to serve the static site. |
### SMTP (shared)
These variables are consumed by every service that sends email.
| Variable | Default | Description |
|----------|---------|-------------|
| `smtp_host` | `stalwart` | SMTP relay hostname. Default routes through the bundled Stalwart container. |
| `smtp_port` | `587` | SMTP submission port. |
| `smtp_from` | `noreply@<domain>` | Default sender address. |
| `smtp_user` | `noreply@<domain>` | SMTP authentication username. |
| `smtp_password` | — | SMTP authentication password. |
| `smtp_tls` | `starttls` | TLS mode: `starttls`, `tls`, or `none`. |
---
## Deployment
### Environment variables
The inventory reads connection details from environment variables:
```bash
export SOVEREIGN_HOST=203.0.113.10 # target host IP or hostname
export SOVEREIGN_USER=ubuntu # SSH user with sudo privileges
export SOVEREIGN_SSH_KEY=~/.ssh/id_rsa
```
### Full deployment
```bash
ansible-playbook playbooks/site.yml
```
Services are deployed in dependency order: common (Docker + Traefik) → DNS → Graylog (logging) → Authentik (auth) → all other services.
### Deploy a single service
Use the role's tag to deploy only that service:
```bash
ansible-playbook playbooks/site.yml --tags dns
ansible-playbook playbooks/site.yml --tags authentik
ansible-playbook playbooks/site.yml --tags nextcloud
ansible-playbook playbooks/site.yml --tags website
```
Available tags: `common`, `dns`, `graylog`, `authentik`, `minio`, `nextcloud`, `stalwart`, `roundcube`, `matrix`, `jitsi`, `headscale`, `wazuh`, `vaultwarden`, `forgejo`, `uptimekuma`, `automatisch`, `twenty`, `website`.
### Dry run
Preview changes without applying them:
```bash
ansible-playbook playbooks/site.yml --check --diff
```
### Syntax check / lint
```bash
ansible-playbook playbooks/site.yml --syntax-check
ansible-lint
```
---
## Testing
Each role has a [Molecule](https://ansible.readthedocs.io/projects/molecule/) test scenario under `roles/<role>/molecule/default/`. Tests run entirely on the local machine — no target host or Docker daemon required.
### What the tests cover
- **Directory creation** — all expected data directories are created with correct permissions.
- **Template rendering** — every Jinja2 template renders without errors and with all variables substituted (no unresolved `{{ }}` in output files).
- **Config file content** — role-specific config files (Element `config.json`, Headscale `config.yaml`, Authentik branding blueprint, Roundcube `custom.inc.php`, Jitsi interface config, Wazuh dashboard YAML, BIND9 `named.conf` and zone file) contain the expected values.
- **Docker Compose structure** — `docker-compose.yml` references the correct image, Traefik routing labels, GELF logging address, and external network declaration.
- **Idempotency** — Molecule re-runs each role after converge and asserts zero changed tasks.
Docker/OS tasks (container start, `apt`, `systemd`, `sysctl`, health checks) are skipped during tests via the `molecule_test_mode` variable, which defaults to `false` and has no effect on real deployments.
### Install test dependencies
```bash
pip install -r requirements.txt
ansible-galaxy collection install -r requirements.yml
```
### Run tests for a single role
```bash
cd roles/authentik
molecule test
```
Or using the Justfile shorthand:
```bash
just test-role authentik
just test-role dns
```
`molecule test` runs the full lifecycle: dependency → converge → idempotency check → verify → cleanup.
For a faster iteration loop during development:
```bash
# Apply the role and run assertions (skip create/destroy lifecycle)
molecule converge && molecule verify
# Clean up temp files when done
molecule destroy
```
### Run tests for all roles
```bash
just test
```
This iterates over all roles and reports any failures at the end.
### Lint
```bash
ansible-lint # Ansible best-practice checks across all roles
yamllint . # YAML formatting checks
```
Both tools are configured via `.ansible-lint` and `.yamllint` at the repo root. The ansible-lint config mocks Docker and system modules so linting works without a live environment.
### Adding tests for a new role
1. Create `roles/<service>/molecule/default/` with `molecule.yml`, `converge.yml`, and `verify.yml` following the pattern of an existing simple role (e.g. `roles/dns/molecule/default/`).
2. Add the new role's variables to `roles/molecule/shared/vars.yml`.
3. Add `when: not (molecule_test_mode | default(false))` to any tasks that call `community.docker.docker_compose_v2`, `ansible.builtin.uri` (health checks), or `ansible.builtin.command` (docker exec).
4. Add the same guard to the role's restart handler in `handlers/main.yml`.
---
## Maintenance
### Updating a service
Change the version variable in `all.yml` (e.g., `nextcloud_version: "30"`) and re-run the relevant tag:
```bash
ansible-playbook playbooks/site.yml --tags nextcloud
```
The handler will recreate the container with the new image.
### Restarting a service
SSH into the host and use Docker Compose directly:
```bash
cd /opt/sovereign/nextcloud
docker compose restart
```
Or pull and recreate:
```bash
docker compose pull
docker compose up -d --force-recreate
```
### Viewing logs
All containers ship logs to Graylog via GELF UDP. Use the Graylog web UI at `https://logs.<domain>` to search and filter.
To tail logs directly on the host:
```bash
docker logs -f nextcloud
docker logs -f authentik-server
```
### Backing up data
All persistent data is stored under `/opt/sovereign/` on the target host. A minimal backup strategy:
```bash
# Stop services, snapshot, restart
cd /opt/sovereign/vaultwarden && docker compose stop
tar czf /backup/vaultwarden-$(date +%F).tar.gz /opt/sovereign/vaultwarden
cd /opt/sovereign/vaultwarden && docker compose start
```
For databases, prefer native dumps over filesystem snapshots taken while the container is running:
```bash
# PostgreSQL (Vaultwarden, Forgejo, Matrix, Authentik)
docker exec vaultwarden-db pg_dump -U vaultwarden vaultwarden > vaultwarden-$(date +%F).sql
# MariaDB (Nextcloud, Roundcube)
docker exec nextcloud-db mysqldump -u root -p"$NEXTCLOUD_DB_ROOT_PASSWORD" nextcloud > nextcloud-$(date +%F).sql
```
### Rotating secrets
1. Update the value in `all.yml`.
2. Re-run the affected role: `ansible-playbook playbooks/site.yml --tags <service>`.
3. Some services (Authentik, Graylog) require a container restart to pick up new environment variables — this happens automatically via the role's handler.
### Adding a new service role
Follow the pattern used by existing roles:
1. Create `roles/<service>/{defaults,handlers,tasks,templates}/main.yml` and `docker-compose.yml.j2`.
2. Add service variables to `inventories/production/group_vars/all.yml`.
3. Add the role to `playbooks/site.yml` with an appropriate tag.
4. Attach the container to the `sovereign` Docker network and add Traefik labels for routing.
5. Add `logging: driver: gelf` with `gelf-address: "udp://{{ graylog_host }}:{{ graylog_gelf_port }}"` to ship logs.
6. Add a Molecule scenario — see [Adding tests for a new role](#adding-tests-for-a-new-role).
---
## Architecture Notes
- **Reverse proxy**: Traefik handles all inbound HTTPS traffic, terminates TLS using Let's Encrypt (TLS challenge), and routes to containers via Docker labels.
- **Authentication**: The `authentik` Traefik forward-auth middleware is defined in the `common` role and can be applied to any router label: `traefik.http.routers.<name>.middlewares=authentik`.
- **DNS**: BIND9 runs as an authoritative-only nameserver (recursion disabled) on port 53/TCP+UDP. It publishes A records for every service subdomain, MX records pointing to Stalwart, and email authentication records (SPF, DMARC, DKIM). Users must register a glue record at their domain registrar and delegate the domain's nameservers to `ns1.<domain>` after deployment.
- **Email authentication**: SPF restricts sending to the MX host. DMARC policy is configurable (`none`/`quarantine`/`reject`). DKIM requires retrieving the public key from Stalwart after first deployment and re-running the `dns` role to publish it.
- **Networking**: All containers that need Traefik routing join the external `sovereign` Docker network. Services with databases also have a private `internal` network for backend isolation.
- **Logging**: Every container uses the `gelf` log driver pointed at `graylog_host:12201`. `graylog_host` should be an IP reachable from inside Docker containers (typically the host's IP on the Docker bridge, not `localhost`).
- **Data persistence**: Each service stores data under `{{ sovereign_base_dir }}/<service>/` (default `/opt/sovereign/<service>/`). This path is defined in each role's `defaults/main.yml` as `<service>_data_dir`.