[services.*]
A service is an always-on process that RunWisp keeps alive. It exits, it gets restarted. It crashes, it gets restarted. The only way a service stops permanently is by you stopping it.
The TOML key (api-worker in [services.api-worker]) is the service
name. It shares one namespace with [tasks.*] — names must be unique
across both kinds. Each replica’s run shows up in the same history view
as task runs, with its own ULID, log file, and lifecycle.
Minimum example
Section titled “Minimum example”[services.metrics-collector]run = "/usr/local/bin/metrics-agent"That’s a complete service — one replica, restarted forever with exponential backoff capped at 60 seconds.
Identity & metadata
Section titled “Identity & metadata”| Key | Type | Default | What it does |
|---|---|---|---|
| table key | string | required | The service name. Used in CLI, API, log paths. |
run | string | required | Shell command. Multi-line OK with TOML triple-quotes. |
description | string | (empty) | Human-readable description shown in the UI and TUI. |
group | string | "Services" | UI grouping label. |
api_trigger | bool | true | Allow manual trigger from CLI / API / UI. (Restart is the usual interaction for services.) |
Replicas
Section titled “Replicas”[services.api-worker]instances = 3run = "/usr/local/bin/worker"| Key | Type | Default | What it does |
|---|---|---|---|
instances | int | 1 | Number of concurrent replicas. Bounded 1 ≤ instances ≤ 64. |
Each replica is its own visible run with its own replica_index
(0, 1, 2, …). They share configuration, logs are unified per
service, and replicas are restarted independently when their process
exits.
Restart behaviour
Section titled “Restart behaviour”| Key | Type | Default | What it does |
|---|---|---|---|
restart_delay | duration | 1s | Base delay between restarts. Go duration string. |
restart_backoff | enum | "exponential" | Curve applied to restart_delay: constant, linear, or exponential (shared with task retry_backoff). |
Backoff is capped at 60 seconds, so even after a long flap session the next restart never waits more than a minute. A replica that stays up for 60 seconds resets its backoff counter, so transient flapping doesn’t permanently slow restarts on a service that eventually stabilises.
restart = "always" is implicit and cannot be overridden — that’s
the contract. If you want “run once and exit,” use a task.
Concurrency
Section titled “Concurrency”| Key | Type | Default | What it does |
|---|---|---|---|
parallelism | int | 1 | Maximum concurrent runs per replica. Must be > 0. |
on_overlap | enum | "skip" | What happens when something tries to start a new run while one is going. |
Services default to on_overlap = "skip" because the supervisor keeps
the replica count steady and overlap is unusual. Manually triggering a
service that’s already running gets cleanly rejected.
Logs & retention
Section titled “Logs & retention”The log story is identical to tasks — same fields, same defaults:
| Key | Type | Default |
|---|---|---|
log_max_size | byte size | 100MB |
log_on_full | enum | "drop_old" |
keep_runs | int | inherited |
keep_for | duration | inherited |
A service’s run history can grow much faster than a task’s because each
crash is a new run row. Set keep_runs defensively — 200 is a
reasonable starting point for a flap-prone service.
See Logs & retention for the underlying behaviour.
Notifications
Section titled “Notifications”| Key | Type | Default | What it does |
|---|---|---|---|
notify_on_failure | string[] | (none) | Notifier IDs to alert when a replica exits with failed / crashed. |
notify_on_success | string[] | (none) | Notifier IDs to alert on run.succeeded (a clean replica shutdown). |
Same shape as tasks. See Per-task notification sugar.
Graceful shutdown
Section titled “Graceful shutdown”When the daemon shuts down (systemctl stop, docker stop, Ctrl-C),
the supervisor sends SIGTERM to each replica and waits for it to exit,
then sends SIGKILL if it’s still running after the grace period.
In practice: trap SIGTERM in your run command and exit cleanly.
The example file’s pattern is a good starting point:
trap 'echo "SIGTERM — shutting down"; exit 0' TERM INTwhile true; do # do workdoneA replica that exits cleanly via SIGTERM records end_reason = stopped.
What’s rejected on services
Section titled “What’s rejected on services”cron,catch_up— services aren’t cron-driven.retry_attempts,retry_delay,retry_backoff— services restart instead of retry. Userestart_delay/restart_backoff.- A name shared with a
[tasks.*]entry. - Empty or missing
run. instancesoutside[1, 64].
Worked example: 3 queue workers
Section titled “Worked example: 3 queue workers”[services.api-worker]description = "Three always-on workers consuming the same job queue"instances = 3restart_delay = "2s"restart_backoff = "exponential"keep_runs = 500notify_on_failure = ["slack-ops"]run = """trap 'echo "SIGTERM — draining and exiting"; exit 0' TERM INTecho "[$(date -Iseconds)] worker starting up..."while true; do /usr/local/bin/consume-jobdone"""Where to next
Section titled “Where to next”[tasks.*]reference — the run-and-exit counterpart.- Tasks vs Services — picking the right kind.
- Retries & timeouts — why retry and restart are different things.