TLS and Production Hardening
The Gateway is the only Caracal service exposed to external traffic. It enforces TLS, validates upstream URLs, and applies several production-mode constraints that are deliberately disabled in development. This page covers all hardening controls and how to configure them correctly.
TLS configuration
Section titled “TLS configuration”The Gateway requires TLS when CARACAL_ENV=production (the default). Provide a certificate and private key:
TLS_CERT_FILE=/etc/caracal/tls/server.crtTLS_KEY_FILE=/etc/caracal/tls/server.keyBoth variables must be set together. Setting one without the other causes the Gateway to panic at startup with:
TLS_CERT_FILE and TLS_KEY_FILE must both be setIf neither is set and INSECURE_HTTP is not true, the Gateway panics:
TLS_CERT_FILE/TLS_KEY_FILE required; set INSECURE_HTTP=true to run plaintextMinimum TLS version: TLS 1.2, enforced by the Go runtime (tls.Config{MinVersion: tls.VersionTLS12}). Cipher suite selection uses Go’s defaults for TLS 1.2 and TLS 1.3.
Certificate format: Standard PEM. Provide a full certificate chain (leaf + intermediates) in TLS_CERT_FILE if your CA requires chain presentation.
Development escape hatches
Section titled “Development escape hatches”Two flags disable TLS and plain-HTTP restrictions for local development:
| Variable | Effect | Forbidden when |
|---|---|---|
INSECURE_HTTP=true | Run the Gateway without TLS | CARACAL_ENV=production |
INSECURE_STS=true | Allow the Gateway to connect to STS over HTTP | CARACAL_ENV=production |
Both variables are explicitly rejected in production mode. The Gateway panics at startup if either is set with CARACAL_ENV=production.
Do not set these flags in any environment that handles real credentials or tokens.
Production mode enforcement
Section titled “Production mode enforcement”CARACAL_ENV=production activates the full set of production safety rules. The Gateway validates all of the following at startup and panics if any constraint is violated:
INSECURE_HTTPmust not betrueINSECURE_STSmust not betrueJTI_FAIL_OPENmust not betrueTLS_CERT_FILEandTLS_KEY_FILEmust both be setPORTmust be8081
CARACAL_ENV=dev disables these checks and allows the escape hatches. Any value other than production or dev is rejected.
SSRF defenses
Section titled “SSRF defenses”The Gateway proxies requests to upstream resource URLs fetched from Postgres. To prevent server-side request forgery:
Private upstream blocking (default):
By default, ALLOW_PRIVATE_UPSTREAMS=false prevents the Gateway from forwarding requests to private IP ranges (RFC 1918: 10.x, 172.16–31.x, 192.168.x) and loopback addresses. Any resource with a private upstream URL is rejected at proxy time.
Upstream allowlist (production with private upstreams):
If your architecture requires the Gateway to reach private upstreams (e.g., internal microservices), set:
ALLOW_PRIVATE_UPSTREAMS=trueUPSTREAM_HOST_ALLOWLIST=api.internal.company.com,payments.internal.company.comUPSTREAM_HOST_ALLOWLIST is a comma-separated list of permitted hostnames. In production with ALLOW_PRIVATE_UPSTREAMS=true, the allowlist is required. Requests to hosts not on the list are rejected.
Path traversal protection:
The Gateway rejects upstream URLs containing path traversal sequences (..). This check runs before any upstream DNS resolution.
JTI replay protection
Section titled “JTI replay protection”The Gateway tracks JWT IDs (jti claims) in Redis to prevent replay of intercepted per-call mandates. This check is always active. If Redis is unavailable:
JTI_FAIL_OPEN=false(the default): The Gateway rejects requests — safer but reduces availability during Redis outages.JTI_FAIL_OPEN=true: The Gateway allows requests through without the JTI check — forbidden in production.
Request size limits
Section titled “Request size limits”| Variable | Default | Description |
|---|---|---|
MAX_REQUEST_BYTES | 10485760 | Maximum request body (10 MB). Larger requests are rejected with 413. |
Upstream timeouts
Section titled “Upstream timeouts”| Variable | Default | Description |
|---|---|---|
STS_TIMEOUT | 5s | Timeout for STS token exchange calls |
UPSTREAM_TIMEOUT | 30s | Timeout for proxied upstream requests |
READ_HEADER_TIMEOUT | 5s | Time allowed to read HTTP request headers |
READ_TIMEOUT | 30s | Time allowed to read the full request |
WRITE_TIMEOUT | 60s | Time allowed to write the full response |
IDLE_TIMEOUT | 120s | Keep-alive idle connection timeout |
Tune UPSTREAM_TIMEOUT based on your upstream services’ P99 response times. Setting it lower than your upstreams’ worst-case latency causes false timeouts.
Hardening checklist for production
Section titled “Hardening checklist for production”CARACAL_ENV=productionTLS_CERT_FILEandTLS_KEY_FILEset to valid, non-expired certificate and keyINSECURE_HTTPnot set (or explicitlyfalse)INSECURE_STSnot set (or explicitlyfalse)JTI_FAIL_OPENnot set (or explicitlyfalse)STREAMS_HMAC_KEYset to a 32-byte hex secret shared with all publishersALLOW_PRIVATE_UPSTREAMS=falseunless an internal allowlist is configuredUPSTREAM_HOST_ALLOWLISTset ifALLOW_PRIVATE_UPSTREAMS=true- Gateway not directly reachable on its internal Postgres or Redis ports
- Only port 8081 exposed externally (all other services on a private network)
- TLS certificate renewed before expiry (set up automated renewal with Let’s Encrypt or your PKI)
Inter-service communication
Section titled “Inter-service communication”In the Docker Compose stack, services communicate over Docker’s internal network using container names (http://sts:8080, http://api:3000, etc.). No inter-service traffic passes through the Gateway. The Gateway connects to STS via STS_URL and to Postgres/Redis directly.
In production container orchestration, inter-service traffic should use mTLS or be constrained to a private network segment. The current stack does not provide mTLS for inter-service communication — enforce it at the network layer (Kubernetes NetworkPolicy, AWS VPC security groups, etc.).