Skip to content

Redis Streams

Caracal uses Redis as the message bus between services via a transactional outbox pattern. All inter-service events transit through Redis streams before being persisted or acted on. Redis also holds rate-limit state and step-up challenge records.

The custom image at infra/redis/ starts Redis 8 with redis-server /etc/caracal/redis.conf. Key configuration:

bind 0.0.0.0
port 6379
protected-mode yes
# Persistence
appendonly yes
appendfsync everysec
save 3600 1
# Eviction — noeviction protects stream data from silent drops
maxmemory-policy noeviction
tcp-keepalive 60
timeout 0

maxmemory-policy noeviction is required. Any other policy risks silently dropping stream messages under memory pressure, which would cause audit events, revocation signals, or agent lifecycle events to be lost permanently.

appendonly yes with appendfsync everysec gives a worst-case data loss window of 1 second on crash.


Redis uses two logical databases:

DatabaseContents
db 0All Redis streams and outbox messages
db 1Rate-limit keys (rl:{zone_id}:{provider_id}:{user_id}), step-up challenge records (chal:{id})

The split allows different memory policies to be applied to db 1 if needed, since rate-limit and challenge data can tolerate eviction.


infra/redis/provision-streams.sh creates all streams and consumer groups idempotently using XGROUP CREATE ... MKSTREAM. Run it (or let the init container run it) before starting any service.

StreamMAXLENConsumer group(s)Published by
caracal.audit.events1,000,000audit-ingestor, siem-exportAPI outbox
caracal.audit.events.dlq100,000audit-dlq-observerAudit service (on delivery failure)
caracal.policy.invalidate10,000opa-engineAPI outbox
caracal.sessions.revoke10,000sts-revocation, resource-revocationCoordinator outbox
caracal.keys.invalidate10,000sts-keysAPI outbox
caracal.agents.lifecyclecoordinator-relayCoordinator outbox
caracal.invocations.lifecycleinvocations-observerCoordinator outbox
caracal.delegations.invalidatedelegations-observerCoordinator outbox
caracal.providers.ratelimit~1 (approximate)API (direct XADD)

MAXLEN ~ uses approximate trimming (stream is trimmed lazily). This is safe for all streams where the consumer group is caught up — old entries are not needed after delivery.


Neither the API nor the Coordinator writes to Redis streams directly from request handlers. All stream publishes go through a transactional outbox:

  1. The handler writes an outbox row inside the same database transaction as the business operation.
  2. A background dispatcher (OutboxDispatcher in the API, OutboxPublisher in the Coordinator) polls for pending rows, publishes to Redis, and marks rows as published.
  3. If Redis is temporarily unavailable, rows accumulate in the outbox and are delivered on recovery.

This guarantees at-least-once delivery without distributed transaction coordination. Consumers must handle duplicate messages.

API outbox dispatcher tuning:

VariableDefaultEffect
CARACAL_OUTBOX_POLL_MS250How often the dispatcher checks for pending rows
CARACAL_OUTBOX_BATCH32Rows acquired per poll cycle
CARACAL_OUTBOX_LOCK_SEC30How long a row is locked before another dispatcher can claim it
CARACAL_OUTBOX_MAX_ATTEMPTS100After this many failed deliveries, the row is marked dead
CARACAL_OUTBOX_STREAM_MAXLEN100000XADD MAXLEN applied to all outbox-published streams

Coordinator outbox publisher tuning:

VariableDefaultEffect
OUTBOX_INTERVAL_MS1000Poll interval
OUTBOX_BATCH_SIZE50Rows per poll cycle
OUTBOX_MAX_ATTEMPTS10Max attempts before dead

When STREAMS_HMAC_KEY is set, every outbox message written to Redis is signed with an HMAC-SHA256 signature stored in the _sig field. Consumers that have streamHmacKey configured verify this signature using timingSafeEqual before processing. Messages with invalid or missing signatures are skipped and acknowledged.

Set STREAMS_HMAC_KEY on all services that publish or consume streams in production. A missing key does not prevent operation but removes origin verification for stream messages.


caracal.sessions.revoke carries revocation events from the Coordinator (via outbox) to all services that verify mandates. The STS has a built-in consumer group (sts-revocation). External services using @caracalai/revocation-redis or caracalai-revocation-redis use the resource-revocation group.

The revocation stream must be provisioned before the STS starts. Each consumer instance must use a unique consumer name within its group to avoid split-brain delivery.


caracal.audit.events.dlq receives audit events that exceeded AUDIT_MAX_DELIVERIES delivery attempts. Monitor this stream for sustained delivery failures. High DLQ volume indicates a database write problem in the Audit service.

Check DLQ depth:

Terminal window
redis-cli -a $REDIS_PASSWORD XLEN caracal.audit.events.dlq

Each consumer group maintains a PEL of delivered-but-unacknowledged messages. A growing PEL indicates a consumer is processing messages but not acknowledging them (or has crashed mid-processing).

Check PEL size for the audit ingestor:

Terminal window
redis-cli -a $REDIS_PASSWORD XPENDING caracal.audit.events audit-ingestor - + 10

The Audit service’s AUDIT_CLAIM_IDLE_SECS (default 30) controls how long a PEL entry sits before another consumer can reclaim and retry it.


caracal.audit.events with MAXLEN 1,000,000 can consume significant memory depending on event payload size. At ~1 KB per message, 1M entries is ~1 GB. Size your Redis instance to accommodate the maximum expected backlog before the Audit service catches up, plus headroom for all other streams.

Monitor Redis memory:

Terminal window
redis-cli -a $REDIS_PASSWORD INFO memory

If used_memory approaches maxmemory, investigate consumer lag. With noeviction, Redis will start rejecting writes rather than dropping data — which will surface as errors in the outbox dispatcher.