Nightly backup
A nightly database backup is the canonical RunWisp task. It runs on a schedule, writes a timestamped artefact, takes long enough that you care about overlap, and you absolutely want to know when it fails.
This recipe covers Postgres; the shape is the same for MySQL, SQLite, MongoDB, or any external service you can drive from a shell script.
The task
Section titled “The task”[tasks.backup-postgres]group = "Backups"description = "Nightly logical dump of the production database"cron = "30 2 * * *" # 02:30 every day, daemon-local timeon_overlap = "skip" # never two dumps at oncetimeout = "45m" # die before the next firingretry_attempts = 1 # one quick retry on transient failureretry_delay = "5m"keep_runs = 30keep_for = "60d"notify_on_failure = ["slack-ops"]
run = """set -euo pipefailTS=$(date -u +%Y%m%dT%H%M%SZ)DEST=/srv/backups/postgresmkdir -p "$DEST"
PGPASSWORD="$BACKUP_DB_PASSWORD" pg_dump \\ --host=db.internal \\ --username=backup \\ --format=custom \\ --no-owner --no-privileges \\ app_production \\| gzip --best > "$DEST/app_production-$TS.dump.gz"
# Verify the archive is at least readable end-to-end.gzip -t "$DEST/app_production-$TS.dump.gz"echo "Wrote $DEST/app_production-$TS.dump.gz ($(du -h "$DEST/app_production-$TS.dump.gz" | cut -f1))""""The matching [[notifier]] block — described on the
Slack provider page — listens for run.failed, run.timeout,
and run.crashed because that’s what notify_on_failure desugars to.
Why each knob
Section titled “Why each knob”cron = "30 2 * * *"
Section titled “cron = "30 2 * * *"”Off-peak. Avoid landing on the hour or half-hour — a host running
many cron daemons at exactly 0 2 * * * and 0 3 * * * will
serialise its own writes and cause backup contention.
on_overlap = "skip"
Section titled “on_overlap = "skip"”If a previous dump is still running at the next firing,
don’t start a second one. The default of "queue" would line
up overlapping firings; for a nightly task that doesn’t help and
can cause a backup pile-up if a slow night extends past 02:30 the
next morning.
timeout = "45m"
Section titled “timeout = "45m"”A backup that takes 45 minutes is broken — kill it. The cron is
24 hours apart so this is a safe ceiling. See
Retries & timeouts for what timeout actually does
(per-attempt, hard kill, no grace period).
retry_attempts = 1 + retry_delay = "5m"
Section titled “retry_attempts = 1 + retry_delay = "5m"”A transient network blip (database briefly unreachable, an intermittent DNS hiccup) shouldn’t trigger an alert. One retry five minutes later forgives that. Two retries is rarely the right call here — if the real cause persists, you want the alert to fire so a human looks at it.
keep_runs = 30 + keep_for = "60d"
Section titled “keep_runs = 30 + keep_for = "60d"”Two months of nightly dumps is enough for both forensics (“when did the schema change?”) and to outlast a long incident (“we discovered the data corruption a month later”). The retention sweeper deletes the oldest completed runs (rows + log files) when either limit is exceeded — see Logs & retention.
notify_on_failure
Section titled “notify_on_failure”Sugar that desugars to a route firing on
run.failed / run.timeout / run.crashed. Per-task notification
sugar is documented separately. The “in-app” notifier is
implicitly added — even without Slack, the bell in the Web UI
lights up.
set -euo pipefail inside run
Section titled “set -euo pipefail inside run”Bash’s -e exits on the first failed command, -u errors on
unset variables, -o pipefail propagates the exit code from any
stage of a pipeline. Without these, a pg_dump that fails
mid-stream will still produce a “successful” gzipped file (gzip
exits 0 on truncated input) and your backup task will quietly
return success.
This is the pattern for every non-trivial run block.
RunWisp itself has no opinion on shell flags — the burden is on
your script.
Off-host copy
Section titled “Off-host copy”Local backups die with the host. Append a sync to S3, B2, or your NAS:
aws s3 cp "$DEST/app_production-$TS.dump.gz" \\ "s3://my-backups/postgres/$(hostname)/app_production-$TS.dump.gz" \\ --storage-class GLACIER_IROr split it into a second task that depends on the first having landed something on disk — one cron-fired backup task plus a separate cron-fired sync task is simpler than wiring up a multi-step DAG (which RunWisp deliberately doesn’t do).
Verifying the dump
Section titled “Verifying the dump”A backup you’ve never restored is a hopeful filename, not a backup. Run a periodic restore-test as its own task:
[tasks.backup-restore-test]group = "Backups"description = "Restore last night's dump into a scratch DB and run a smoke query"cron = "0 5 * * *" # 03:00 dump → 05:00 restore-teston_overlap = "skip"timeout = "1h"notify_on_failure = ["slack-ops"]
run = """set -euo pipefailLATEST=$(ls -1t /srv/backups/postgres/app_production-*.dump.gz | head -n1)test -n "$LATEST" || { echo "no dump found"; exit 1; }
# Restore into a scratch database the daemon can drop and recreate.psql -h db.internal -U backup -d postgres -c 'DROP DATABASE IF EXISTS app_restore_test'psql -h db.internal -U backup -d postgres -c 'CREATE DATABASE app_restore_test'
gunzip -c "$LATEST" | pg_restore --no-owner --no-privileges --dbname=app_restore_test
# Smoke query — adjust to something cheap that proves the schema is real.psql -h db.internal -U backup -d app_restore_test -c 'SELECT count(*) FROM users LIMIT 1'"""Two cron rows in the daemon, two log streams, two failure paths. You’ll know within 24 hours if a backup file isn’t restorable — which is the only failure mode that actually matters.
Where to next
Section titled “Where to next”- Slack provider — wiring up the
slack-opsnotifier this recipe references. - Concepts: retries & timeouts — what
retry_attemptsandtimeoutactually do, and which end reasons trigger retries. [storage]— the daemon-wide cap that sits abovekeep_runs. Don’t let on-disk dumps fill the data dir.