Actions
The auto kill-switch
Authio runs a tiny in-process health monitor on every Action. A persistently unhealthy endpoint gets disabled automatically; a human re-enables it after fixing the root cause.
Why we have one
Actions run synchronously on the auth-event hot path. A single slow customer endpoint slows down every sign-in for that project. Auto-disabling a misbehaving Action keeps the customer from accidentally taking themselves offline.
Trip conditions (locked at GA)
Authio kills an Action when any of these is true:
- Error rate > 50% over the trailing 5 minutes. Errors = timeouts, network failures, invalid signatures, invalid JSON, HTTP 4xx/5xx, SSRF blocks. A
decision: denyverdict is NOT an error. - Timeout rate > 50% over the trailing 5 minutes. Specifically — when the customer endpoint takes longer than the configured
timeout_mson more than half of recent calls. - 100 consecutive 5xx responses. No window — this trips immediately so a hard-down endpoint disables before it amasses an hour of error logs.
The window evaluation runs lazily — every 100th invocation we compute the trailing window. Worst case: a 5-minute window worth of failures plus the next 100 invocations before the trip fires. At typical sign-in rates that’s 30 seconds.
What happens on trip
- The action’s row gets
killed_at = now()andkilled_reason = <reason>. - Every subsequent auth event treats the action as if it didn’t exist.
fail_modedoes NOT apply here — the action is gone, not failing. - An
action.killedaudit event is recorded with actor=system. - An email goes out to every project owner and admin with the action name, the reason, and a deep link to the dashboard.
- The kill-switch state is held in memory per auth-core instance. A multi-replica auth-core may have a brief window where one replica has killed the action and another hasn’t — both honor the DB’s
killed_atcolumn, so the at-most-one-call window is bounded by the slowest replica catching up.
What this means for sign-ins. When an action is killed:
pre_authenticate/post_authenticateactions: sign-in proceeds as if no Action were configured.pre_token_mintactions: the JWT is signed with Authio-side roles + permissions. Pattern 3 customers should be aware that their override layer is offline until they fix the endpoint.pre_registeractions: signups proceed normally.
This is the same effect as deleting the action — the goal of the kill-switch is to keep the customer from being locked out of their own product. Whether or not your fail_mode was closed, a killed action does NOT deny.
Recovery
- Fix the underlying issue. Check your endpoint’s logs. Common culprits: cold-start latency on Lambda, exhausted DB connection pool, an expired TLS cert, a deploy that bumped the response size past 64 KiB.
- From the dashboard, go to Actions → [your action]. The red "Auto-killed" banner has a Revive button (owner/admin only).
- Clicking Revive clears
killed_at, enables the action, AND resets the in-process health window on every auth-core instance. The next invocation gets a clean slate. - Click Send test event immediately afterwards to confirm the endpoint is back.
Reviving doesn’t bypass the trip rules
If you click Revive while the endpoint is still broken, Authio will re-trip the kill-switch within ~30 seconds (the next 100 invocations). The Revive endpoint deliberately does NOT take a force=true flag — there’s no operational scenario where you want to keep a broken Action in the hot path.
