Logs & retention

Every run’s stdout and stderr is captured to a per-run log file under the data directory. SQLite stores run metadata — exit code, duration, timestamps — but not the log bodies. The daemon is not a log aggregator. It’s a log archive for “what did this task print on 2026-04-12 at 03:15?”.

Where logs live on disk

{data_dir}/logs/{task-name}/{YYYYMMDD}_{HHMMSS}_{run-id-suffix}.log
{data_dir}/logs/{task-name}/{YYYYMMDD}_{HHMMSS}_{run-id-suffix}.log.idx
{data_dir}/logs/{task-name}/{YYYYMMDD}_{HHMMSS}_{run-id-suffix}.log.tidx
{data_dir}/logs/{task-name}/{YYYYMMDD}_{HHMMSS}_{run-id-suffix}.log.meta

One file per run, not one per task. Each run gets its own isolated log, named after the run’s start time and ULID suffix so files are sortable and unique even at sub-second cadences.
The .idx and .tidx sidecars are byte and timestamp indices used by the Web UI to scrub long logs efficiently.
The .meta sidecar tracks rotation accounting and a finalized flag written when the run cleanly ends.
When a run is deleted by retention, the daemon removes the log, sidecars, rotation artifacts (.prev, .idx.prev), and any newly empty parent directory.

`log_max_size`

[tasks.bulky-job]
cron         = "0 4 * * *"
run          = "/usr/local/bin/bulky.sh"
log_max_size = "50MB"

Default: 100MB.
Scope: per run. A task that runs hourly accumulates one fresh capped file per run, not one shared file growing forever.
Units: b, kb, mb, gb, tb (case-insensitive). Bare numbers are parsed as bytes. "100MB", "1.5gb", "4096" are all valid.
log_max_size = 0 means unbounded. No cap is enforced, no lines are dropped. Use this only when you’ve thought about your disk.

`log_on_full`

What happens when a run’s output hits log_max_size:

Value	Behaviour
`drop_old`	Default. Rotates the current log to `.prev`, keeps writing fresh output. Earlier output lost.
`drop_new`	Stops accepting new lines; the process keeps running. A system line records the truncation.
`kill_task`	Cancels the run’s context, terminating the process. The run records as `log_overflow` (exit code from the kill) — a dedicated end reason that’s still treated as a failure for retries and notifications, so the cause is visible without inspecting logs.

Whichever policy fires, the daemon writes a synthetic line at the truncation point so a reader scrubbing the log can see exactly when the limit hit. There is no silent drop.

log_on_full also controls what happens when [storage] min_free_space trips during a run: kill_task cancels the run on disk pressure; drop_new and drop_old silently stop accepting lines (but the daemon always raises a log.disk_pressure notification so the operator discovers the silenced output). See storage configuration.

drop_old is the default because it preserves the end of the log — which is usually where the failure is. drop_new is right when the start is the interesting part (a daemon’s startup banner, a long batch’s preamble) and the rest is repetitive noise.

Retention: `keep_runs` and `keep_for`

Retention controls how long old run rows and their log files stick around.

[tasks.metrics]
cron      = "* * * * *"
run       = "/usr/local/bin/metrics.sh"
keep_runs = 500
keep_for  = "7d"

keep_runs — keep the N most recent runs for this task. Older runs are deleted (row + log files).
keep_for — delete runs whose created_at is older than the given duration. Accepts extended units, including days and weeks: "7d", "2w", "36h", "30m".
Both at once: both criteria contribute to deletion. A run is removed if either rule says so. The stricter rule wins in practice — set both if you want a hard floor and a hard ceiling.
Per task: each task’s retention is evaluated independently using that task’s own settings (or the inherited [defaults]).

Cleanup runs in the background — by default once an hour — so you may see slightly more than keep_runs rows briefly between sweeps. That’s fine. In-flight runs are never deleted.

When retention triggers, both the SQLite row and the log file (with all its sidecars) are removed. There’s no orphaning: a row never points at a missing log, and a log file never lingers without a row.

Live streaming

The Web UI’s run page tails logs in real time. The endpoint:

GET /api/tasks/{task-name}/runs/{run-id}/log/stream

It’s a Server-Sent Events stream pushed from the daemon’s in-memory event bus, not a polling tail. New lines arrive within milliseconds of being written.

A few details worth knowing:

The stream first replays the on-disk history, then transitions to live events. You don’t miss the start.
The event-bus buffer is bounded (4096 events). If a producer outpaces a slow client, the oldest queued lines are dropped and a LogDroppedEvent is sent so the client can show “N lines dropped”.
Connections idle out after 10 minutes. The browser auto-resumes via the Last-Event-ID header.
The bus is in-process and best-effort. It is not a durability mechanism — the source of truth for “what did this run print” is always the on-disk file.

Crash safety

Log writes flush and close cleanly on Close(), which fires when the run ends or on graceful daemon shutdown. The .meta sidecar’s finalized = true flag is written at that point.

If the daemon is killed mid-write (SIGKILL, power loss):

The log file is not truncated — partial writes survive on disk. Readers tolerate a partial last line.
On next startup, any run that was in running or pending state in SQLite is reconciled to crashed with exit code -2. The log file for that run is left as-is — it’s the last thing that process said before the daemon disappeared, and it’s the most useful artifact for debugging.

Combined with the boot semantics for in-flight runs, this means: every run row in your history reaches a terminal state, and every terminal run has a log file you can read.

What the daemon deliberately doesn’t do

No log aggregation. RunWisp captures logs for the runs it supervises. It does not collect, index, or search across hosts. If you need that, ship the per-run files (or your task’s stdout) to Loki / ELK / CloudWatch / wherever — that’s their job.
No remote sinks built in. No Fluent Bit, no syslog forwarder, no S3 upload. The TOML schema doesn’t have a place to configure them.
No log-content notifications. Notifications fire on run lifecycle events (failed, timeout, crashed, etc.), not on patterns matched against captured output. If you want “alert when stderr contains ERROR”, grep in your script and exit non-zero.

Where to next

Notifications model — how a failed run ends up in your inbox or chat.
How scheduling works — what creates the run rows that retention later trims.
[tasks.*] reference — log_max_size, log_on_full, keep_runs, keep_for.