Skip to content

Plan a Platform Rollout

Use this runbook for version upgrades, configuration changes, chart changes, secret rotations, ingress changes, and policy rollout waves.

GatePass condition
Source verificationValues, env vars, ports, and secrets match current chart and service config.
Render validationhelm lint and helm template succeed with environment values.
Migration safetyMigrations are forward-only and complete before service rollout.
ReadinessAPI, STS, Gateway, Audit, and Coordinator /ready endpoints pass.
Event healthRedis streams, outbox dispatch, audit ingestion, revocation propagation, and replay backlog are healthy.
Rollback planPrevious image tag, chart revision, values, and compatible schema plan are documented.
sequenceDiagram
  participant Owner as Release owner
  participant Helm as Helm
  participant DB as Postgres
  participant Pods as Caracal pods
  participant Obs as Observability
  Owner->>Helm: render and diff
  Helm->>DB: run migration Job
  Helm->>Pods: roll services
  Pods->>Obs: expose health, readiness, metrics
  Owner->>Obs: confirm gates
Terminal window
helm -n caracal diff upgrade caracal infra/helm/caracal -f values.production.yaml
helm -n caracal upgrade caracal infra/helm/caracal -f values.production.yaml
kubectl -n caracal rollout status deploy/caracal-api
kubectl -n caracal rollout status deploy/caracal-audit
kubectl -n caracal rollout status deploy/caracal-coordinator

When replay persistence is enabled, STS and Gateway render as StatefulSets:

Terminal window
kubectl -n caracal rollout status statefulset/caracal-sts
kubectl -n caracal rollout status statefulset/caracal-gateway

Stop the rollout when any of these occur:

  • Migration Job fails.
  • STS or Gateway cannot prove readiness.
  • Audit DLQ grows or replay backlog ages.
  • Gateway revocation snapshot becomes stale.
  • API outbox dead messages appear.
  • Postgres pool saturation or Redis memory pressure persists.

Use helm rollback only after confirming the previous app version is compatible with the current database schema. If schema compatibility is uncertain, roll forward with a corrected app image instead of applying down migrations.

Use Deploy Policy Changes for policy-specific rollout gates, simulation, activation, and rollback.