Authio docs

Authio runs a tiny in-process health monitor on every Action. A persistently unhealthy endpoint gets disabled automatically; a human re-enables it after fixing the root cause.

Why we have one

Actions run synchronously on the auth-event hot path. A single slow customer endpoint slows down every sign-in for that project. Auto-disabling a misbehaving Action keeps the customer from accidentally taking themselves offline.

Trip conditions (locked at GA)

Authio kills an Action when any of these is true:

Error rate > 50% over the trailing 5 minutes. Errors = timeouts, network failures, invalid signatures, invalid JSON, HTTP 4xx/5xx, SSRF blocks. A decision: deny verdict is NOT an error.
Timeout rate > 50% over the trailing 5 minutes. Specifically — when the customer endpoint takes longer than the configured timeout_ms on more than half of recent calls.
100 consecutive 5xx responses. No window — this trips immediately so a hard-down endpoint disables before it amasses an hour of error logs.

The window evaluation runs lazily — every 100th invocation we compute the trailing window. Worst case: a 5-minute window worth of failures plus the next 100 invocations before the trip fires. At typical sign-in rates that’s 30 seconds.

What happens on trip

The action’s row gets killed_at = now() and killed_reason = <reason>.
Every subsequent auth event treats the action as if it didn’t exist. fail_mode does NOT apply here — the action is gone, not failing.
An action.killed audit event is recorded with actor=system.
An email goes out to every project owner and admin with the action name, the reason, and a deep link to the dashboard.
The kill-switch state is held in memory per auth-core instance. A multi-replica auth-core may have a brief window where one replica has killed the action and another hasn’t — both honor the DB’s killed_at column, so the at-most-one-call window is bounded by the slowest replica catching up.

What this means for sign-ins. When an action is killed:

pre_authenticate / post_authenticate actions: sign-in proceeds as if no Action were configured.
pre_token_mint actions: the JWT is signed with Authio-side roles + permissions. Pattern 3 customers should be aware that their override layer is offline until they fix the endpoint.
pre_register actions: signups proceed normally.

This is the same effect as deleting the action — the goal of the kill-switch is to keep the customer from being locked out of their own product. Whether or not your fail_mode was closed, a killed action does NOT deny.

Recovery

Fix the underlying issue. Check your endpoint’s logs. Common culprits: cold-start latency on Lambda, exhausted DB connection pool, an expired TLS cert, a deploy that bumped the response size past 64 KiB.
From the dashboard, go to Actions → [your action]. The red "Auto-killed" banner has a Revive button (owner/admin only).
Clicking Revive clears killed_at, enables the action, AND resets the in-process health window on every auth-core instance. The next invocation gets a clean slate.
Click Send test event immediately afterwards to confirm the endpoint is back.

Reviving doesn’t bypass the trip rules

If you click Revive while the endpoint is still broken, Authio will re-trip the kill-switch within ~30 seconds (the next 100 invocations). The Revive endpoint deliberately does NOT take a force=true flag — there’s no operational scenario where you want to keep a broken Action in the hot path.

The auto kill-switch

Why we have one

Trip conditions (locked at GA)

What happens on trip

Recovery

Reviving doesn’t bypass the trip rules

Read next