Troubleshooting
A symptom-driven map of the failure modes operators actually hit, with the messages the daemon emits when each one happens. If you’re reading this with a problem in front of you, find the symptom that matches and work from there.
For everything else, the daemon’s own logs are the next stop:
sudo journalctl -u runwisp -n 200 # systemddocker logs --tail 200 runwisp # dockertail -n 200 data/daemon.log # self-spawned via the TUIDaemon won’t start
Section titled “Daemon won’t start”port 9477 on 127.0.0.1 is already in use by another process
Section titled “port 9477 on 127.0.0.1 is already in use by another process”Some other service holds the port. The daemon’s error tells you which diagnostic command to run:
ss -ltnp 'sport = :9477'# orlsof -iTCP:9477 -sTCP:LISTENEither stop the offending process, or pass --port 9478 (or any free
port) to RunWisp. If you’ve changed the port, remember to update any
HEALTHCHECK, reverse proxy, or runwisp status invocation that
hardcodes it.
another RunWisp daemon is already running on port 9477 with a different password
Section titled “another RunWisp daemon is already running on port 9477 with a different password”You have a second copy of runwisp running — likely from a different
working directory with its own data/ and data/password. The CLI you
just started can’t authenticate to that daemon because they hold
different secrets.
Three ways out:
- Stop the other daemon.
pkill runwispor kill the PID listed in the other dir’sdaemon.pid. - Use the matching password.
RUNWISP_PASSWORD=$(cat /other-data/password) runwisp tui. - Pick a different port.
runwisp --port 9478 --data ./local-data.
daemon failed to start: health check timed out
Section titled “daemon failed to start: health check timed out”Process started but didn’t pass /health within 10 seconds. Look at
data/daemon.log (when self-spawned) or your service manager’s logs
for the underlying cause — usually a config validation error printed
before the listener bound.
refusing to write data/password: path is a symlink
Section titled “refusing to write data/password: path is a symlink”A defensive check. The daemon refuses to follow symlinks when writing secrets, to prevent a TOCTOU attack against the data directory. Replace the symlink with a real file (or a real directory) and restart.
”permission denied” on the data directory
Section titled “”permission denied” on the data directory”The user the daemon runs as can’t mkdir or write under --data.
Common when systemd’s User= was changed but the data dir’s owner
wasn’t updated:
sudo chown -R runwisp:runwisp /var/lib/runwispsudo chmod 0700 /var/lib/runwispCan’t log in
Section titled “Can’t log in”401 Invalid password
Section titled “401 Invalid password”Either the password is wrong, or you typed it against the wrong daemon
(see different password).
The canonical password is whatever’s in <data-dir>/password.
401 Invalid or expired challenge
Section titled “401 Invalid or expired challenge”The CHAP nonce expired (5-minute lifetime) or was already consumed by a
parallel login attempt. Refresh the page or retry — the client fetches
a new challenge automatically. If the error keeps repeating after a
fresh tab, the host clock is likely skewed; check date -u against the
daemon host.
too many authentication attempts against the daemon on port 9477
Section titled “too many authentication attempts against the daemon on port 9477”You hit the 5 attempts per IP per 5 minutes rate limit. Wait it out (the window slides automatically) or restart the daemon to clear the in-memory limiter. See Auth: rate limiting.
”I lost the password”
Section titled “”I lost the password””It’s in data/password. If the file is also lost — delete the file
and restart the daemon. RunWisp will generate a new one and print it
on stdout. Existing JWTs become invalid (the daemon also rotates the
JWT secret when a previously-explicit password disappears), so
everyone re-logs in.
Web UI shows 401 after restart, then works after a re-login
Section titled “Web UI shows 401 after restart, then works after a re-login”Your JWT expired (24-hour lifetime) or you changed
RUNWISP_PASSWORD between restarts. The latter intentionally rotates
the JWT secret to invalidate stale sessions. Re-login.
Tasks aren’t running
Section titled “Tasks aren’t running”A scheduled task never fires
Section titled “A scheduled task never fires”Check, in order:
- Did the daemon load this task?
runwisp listprints every task the config loader saw. If yours isn’t there, the config didn’t parse or the task was renamed. - Is the cron expression what you expect?
runwisp listshows the raw expression from the file. Compare against an external evaluator like crontab.guru. - Is the daemon’s clock right? Cron runs in the daemon’s local
timezone; a host with a clock skewed by hours will fire at unexpected
times.
date -uon the host. - Is
parallelismsaturated withon_overlap = "skip"? Each firing decides whether to start based on how many runs are currently in-flight. A long-running task withparallelism = 1andon_overlap = "skip"skips every firing while the first run is still going. The Web UI’s task detail shows this clearly — every skipped firing is recorded as afailedrow with exit code-1and the message “task already running, skipping (policy: skip)”.
A manual runwisp exec exits 0 but the daemon shows nothing
Section titled “A manual runwisp exec exits 0 but the daemon shows nothing”runwisp exec runs the task inline in the CLI process, not
against the running daemon — it is intentional. (Since 0.5, exec
also refuses when a daemon is up, so this only happens in setups where
the daemon is on a different host or --data directory.) To trigger
via the daemon, use runwisp run-task <name>, the Web UI,
the TUI, or POST /api/tasks/{name}/trigger.
A service replica keeps restarting
Section titled “A service replica keeps restarting”The supervisor restarts crashed replicas with exponential backoff
capped at 60 seconds. If your run exits within 60 seconds it never
“stabilises” and the backoff keeps growing — open the run history,
read the most recent log to see why the process is exiting.
A common cause: the binary in run doesn’t exist on the target
filesystem, or the user RunWisp runs as can’t exec it. The log
records exec: <path>: no such file or directory immediately.
Task fires but exits with code -2
Section titled “Task fires but exits with code -2”-2 means crashed. The daemon was killed (SIGKILL, OOM, host
reboot) while this run was in flight. On startup the reconciler marks
all running rows as crashed and assigns exit -2. This is
expected behaviour, not a bug — the run was real but not observed to
completion.
Logs are missing
Section titled “Logs are missing”Log output stopped: disk space critically low
Section titled “Log output stopped: disk space critically low”Your [storage] min_free_space threshold has been crossed mid-run.
The log writer silently drops further lines until disk pressure
recovers. The task itself is not killed — see
[storage] reference. Free up disk and the next run will
log normally.
Task logs cut off at log_max_size
Section titled “Task logs cut off at log_max_size”Per-task log_max_size reached. With log_on_full = "drop_old"
(default) the head of the log is rotated to *.log.prev and new lines
overwrite the file; with drop_new further lines are dropped; with
kill_task the task is terminated. Increase the cap or reduce the
volume.
A run row exists in history but its log pane is blank
Section titled “A run row exists in history but its log pane is blank”The retention sweeper deleted the underlying log file (the SQLite row
outlives the on-disk file by design — metadata is cheap, log bytes
are not). Check [storage] max_size and per-task keep_runs /
keep_for — the sweeper deletes oldest completed runs (rows + log
files) to enforce the cap.
Notifications are silent
Section titled “Notifications are silent”A failure happened but Slack didn’t fire
Section titled “A failure happened but Slack didn’t fire”Walk down the chain:
- In-app got it? Check the bell in the Web UI. If yes, the event reached the notification subsystem; the problem is the outbound channel.
- Is there a route that matches? Either an explicit
[[notification_route]]withmatch.kind = ["run.failed", …], or anotify_on_failure = ["slack-ops"]on the task itself. - Is the notifier ID spelled right? The route refers to a
[[notifier]]block byid. Typos surface at config load — but only if the route uses an unknown id; an unused notifier is fine. - Did the channel itself fail? Look for an in-app row of kind
notify.delivery_failedwith the reason. Slack outages, expired webhooks, and revoked tokens all show up here.
notify.delivery_failed events keep arriving
Section titled “notify.delivery_failed events keep arriving”The outbound channel is broken. The daemon retries with exponential backoff inside a 5-minute total budget. If the channel stays down, each event ends up as one in-app delivery-failure row. Fix the channel; the synthetic events stop on their own.
You can route notify.delivery_failed to an alternate channel — see
the route reference.
Web UI / TUI quirks
Section titled “Web UI / TUI quirks””Connection lost” overlay
Section titled “”Connection lost” overlay”Daemon is unreachable. The UI auto-reconnects with backoff; click Retry for an immediate attempt. If the daemon is genuinely down, your service manager’s logs will tell you why.
cannot reach daemon at <url> — is it running?
Section titled “cannot reach daemon at <url> — is it running?”The TUI prints this when its API base URL can’t reach a daemon. Almost
always a --host / --port mismatch. The TUI inherits the flags you
passed; double-check you’re pointing at the same address as the daemon.
runwisp openapi differs from apps/runwisp/openapi.json
Section titled “runwisp openapi differs from apps/runwisp/openapi.json”That’s fine — runwisp openapi reflects the binary you have
installed; the file in the repo reflects the head of main. Use the
binary’s output as the source of truth for whatever client you’re
generating.
Diagnostics that always help
Section titled “Diagnostics that always help”A short checklist for any reported issue:
- Version:
runwisp --version(or checkrunwisp status). - Status:
runwisp status— exit code, version, uptime. - Config:
runwisp validate— confirms TOML parses against the installed binary. - Logs: the daemon’s stderr or
data/daemon.log. - Recent runs:
data/logs/<task>/for the last few.logfiles.
If you’re filing an issue, include all five and a redacted
runwisp.toml. That’s enough for someone to reproduce.
Where to next
Section titled “Where to next”- Operations: auth (CHAP + JWT) — login failures and rate limiting in detail.
- Operations: data directory — where to look on disk.
- Operations: upgrading — when an upgrade starts going sideways.