Plan a Platform Rollout
Use this runbook for version upgrades, configuration changes, chart changes, secret rotations, ingress changes, and policy rollout waves.
Rollout Gates
Section titled “Rollout Gates”| Gate | Pass condition |
|---|---|
| Source verification | Values, env vars, ports, and secrets match current chart and service config. |
| Render validation | helm lint and helm template succeed with environment values. |
| Migration safety | Migrations are forward-only and complete before service rollout. |
| Readiness | API, STS, Gateway, Audit, and Coordinator /ready endpoints pass. |
| Event health | Redis streams, outbox dispatch, audit ingestion, revocation propagation, and replay backlog are healthy. |
| Rollback plan | Previous image tag, chart revision, values, and compatible schema plan are documented. |
Rollout Sequence
Section titled “Rollout Sequence”sequenceDiagram participant Owner as Release owner participant Helm as Helm participant DB as Postgres participant Pods as Caracal pods participant Obs as Observability Owner->>Helm: render and diff Helm->>DB: run migration Job Helm->>Pods: roll services Pods->>Obs: expose health, readiness, metrics Owner->>Obs: confirm gates
Execution
Section titled “Execution”helm -n caracal diff upgrade caracal infra/helm/caracal -f values.production.yamlhelm -n caracal upgrade caracal infra/helm/caracal -f values.production.yamlkubectl -n caracal rollout status deploy/caracal-apikubectl -n caracal rollout status deploy/caracal-auditkubectl -n caracal rollout status deploy/caracal-coordinatorWhen replay persistence is enabled, STS and Gateway render as StatefulSets:
kubectl -n caracal rollout status statefulset/caracal-stskubectl -n caracal rollout status statefulset/caracal-gatewayStop Conditions
Section titled “Stop Conditions”Stop the rollout when any of these occur:
- Migration Job fails.
- STS or Gateway cannot prove readiness.
- Audit DLQ grows or replay backlog ages.
- Gateway revocation snapshot becomes stale.
- API outbox dead messages appear.
- Postgres pool saturation or Redis memory pressure persists.
Rollback
Section titled “Rollback”Use helm rollback only after confirming the previous app version is compatible with the current database schema. If schema compatibility is uncertain, roll forward with a corrected app image instead of applying down migrations.
Next Step
Section titled “Next Step”Use Deploy Policy Changes for policy-specific rollout gates, simulation, activation, and rollback.

