Skip to content

Harden Production

Production Caracal should run in published mode (CARACAL_MODE=stable, or rc for staging), use TLS at external boundaries, and keep stateful dependencies private. rc and stable enforce the same fail-closed trust boundary; rc is a release-candidate build that may be functionally unstable, so use stable for production.

AreaRequired stance
External TLSTerminate TLS at ingress, load balancer, service mesh, or Gateway TLS files. The web console public edge must be HTTPS.
Internal servicesPrefer private ClusterIP or bridge-network traffic.
SecretsUse mounted secret files, not inline env vars.
ContainersDrop Linux capabilities, set no-new-privileges, use read-only filesystems where supported.
NetworkPolicyAdmit only required ingress and egress, including explicit ingress-controller traffic to the web tier.
Gateway upstreamsKeep private upstreams disabled unless explicitly allowed and allowlisted.
Runtime CLIDo not expose product management as top-level runtime CLI commands.

The production web tier is a same-origin deployment: the caracal-web image serves the built SPA and the backend-for-frontend from the same HTTPS origin. The BFF proxies /api/console/* to the internal API and Coordinator, so browser traffic does not reach those services directly and no cross-origin CORS or CSRF surface is needed for the standard deployment.

Use HTTPS at the public edge and set CARACAL_AUTH_URL to that origin. The BFF pins Secure, HttpOnly, and SameSite session cookies, sets hardened response headers (Content-Security-Policy with frame-ancestors 'none', X-Frame-Options: DENY, X-Content-Type-Options: nosniff, Referrer-Policy: no-referrer, and HSTS when HTTPS is in use), enforces the request Origin on state-changing proxied calls, rate-limits credential endpoints, and validates proxied paths.

For a split deployment where the browser origin differs from the BFF origin, set CARACAL_WEB_ORIGIN to the trusted browser origins and use SameSite=None; Secure cookies. Keep this shape explicit; the packaged Helm and Compose paths serve the SPA same-origin.

With Helm NetworkPolicy enabled, add a networkPolicy.extraIngress rule for the ingress controller to reach the web service on port 3002. API, Coordinator, and Gateway should remain private unless a deployment requires their endpoints externally.

Upstream hosts are operator-provisioned through the Control API, never client-supplied, so the Gateway routes to private and on-prem upstreams (internal tools, local MCPs) by default. Dangerous infrastructure ranges that are never a legitimate upstream — cloud metadata (169.254.0.0/16), loopback, carrier-grade NAT, and multicast — are always blocked, including under DNS rebinding.

To pin egress to an explicit set of hosts as a hardening control, set:

Terminal window
UPSTREAM_HOST_ALLOWLIST=internal-api.example.svc,provider.example.com

Published modes require production-grade keys:

KeyUsed for
AUDIT_HMAC_KEYAudit event integrity.
STREAMS_HMAC_KEYRedis stream message signatures.
GATEWAY_STS_HMAC_KEYGateway-to-STS service exchange.
ZONE_KEKZone secret/key encryption.

HMAC keys must be hex-encoded and at least 32 bytes where validated.

The Helm chart sets non-root pods, RuntimeDefault seccomp, dropped capabilities, read-only root filesystem for service containers, optional NetworkPolicy, PDBs, HPAs, and optional Ingress TLS.

SymptomCheck
Published-mode startup failsMissing or short HMAC key, unsafe URL, forbidden fail-open setting, or wrong service port.
Gateway cannot reach upstreamNetworkPolicy egress, DNS egress, HTTPS egress, and upstream allowlist.
Metrics endpoint deniedConfirm METRICS_BEARER and scraper configuration.
TLS works externally but clients fail token validationConfirm STS ISSUER_URL matches the URL clients use for JWKS and token issuer checks.

Use Rotate Keys and Secrets to plan secret storage, key overlap, and rotation evidence.