Skip to content

PostgreSQL

All five Caracal services store durable state in a single Postgres database, each under a dedicated role with minimum necessary permissions. The database uses table partitioning for the audit log, immutability triggers for policy versions, advisory locks for distributed coordination, and row-level security on zone tables.

Migrations are at infra/postgres/migrations/ and numbered 0001 through 0016. The API service runs all pending migrations on startup using runMigrations(). Migration execution is idempotent — re-running the API against an up-to-date schema is safe.

Migrations run in sequence. Do not skip or reorder them. To apply migrations in a context without running the full API, use the API container with a flag or invoke the migration runner directly from the codebase.

Current migration range: 0001_init0016_rls_zones.


Each service connects with a dedicated Postgres role scoped to its minimum required permissions:

RolePermissions
caracalStsSELECT on zones, applications, providers, resources, policies, policy versions/sets; SELECT+INSERT+UPDATE on sessions, delegated_grants, secrets, step_up_challenges
caracalApiSELECT+INSERT+UPDATE+DELETE on zones, applications, providers, resources, delegated_grants, secrets, invitations, teams; SELECT+INSERT on policy_versions, policy_set_versions; full access to policy_set_bindings; SELECT on sessions
caracalAuditSELECT+INSERT on audit_events and audit_ingest_alerts; SELECT+INSERT+UPDATE on audit_export_watermark
caracalCoordinatorSELECT+INSERT+UPDATE on agent_sessions, agent_topology, delegation_graph_epochs, caracal_outbox, delegation_edges, agent_services, agent_invocations; SELECT on zones, applications
caracalGatewaySELECT on zones, applications, resources, providers

Create these roles before starting any service. The 0002 migration creates them and grants permissions.


Tracks all active user and application sessions. Indexed on (zone_id, subject_id, status) and (expires_at) WHERE status='active' for efficient expiry sweeps.

One row per directed delegation authority edge. Includes edge_version for optimistic locking and constraints_json for caveat enforcement. Indexed on source, target, resource, and expiry.

Immutable. The policy_versions_immutable trigger prevents UPDATE and DELETE on this table. Policy content can never be changed after creation — only new versions can be added.

Range-partitioned by occurred_at. The Audit service pre-creates monthly partitions for the current month and three future months. Partitions older than AUDIT_RETENTION_DAYS are dropped by the retention rotator.

Append-only in practice. Each event includes:

  • chain_seq: monotonic sequence per zone for ordering
  • content_sha256, prev_content_sha256: payload hashes for chain linkage
  • chain_hmac: HMAC of the chain linkage (keyed with AUDIT_HMAC_KEY)
  • ingest_signature: signature added at ingest time

Transactional outbox tables. Rows are written atomically with the operation they represent and dispatched to Redis streams by background workers. caracal_outbox is used by the Coordinator; event_outbox by the API.

Both use FOR UPDATE SKIP LOCKED for concurrent-safe dispatch without contention.

Tracks step-up authentication challenges. Indexed on (session_id) and (id) WHERE consumed_at IS NOT NULL (unique). A UNIQUE partial index prevents the same challenge being satisfied twice.

Stores encrypted credential material (tokens, passwords). Ciphertext encrypted with ChaCha20 keyed from ZONE_KEK. Nonce stored alongside ciphertext. The dek_id field tracks which key version was used for rotation tracking.


All TypeScript services (API, Coordinator) manage a pg.Pool or pgx pool. Default pool size is 20 connections per service replica. Adjust CARACAL_DB_POOL_MAX (API) and DB_POOL_MAX (Coordinator) based on max_connections in your Postgres instance.

Rule of thumb: (replicas × pool_max) + overhead < max_connections. With 3 replicas of API (pool 20), 2 of Coordinator (pool 20), and STS + Gateway + Audit connecting without pools, target at least 100–120 Postgres connections.

Statement timeouts prevent runaway queries:

VariableDefaultPurpose
CARACAL_DB_STATEMENT_TIMEOUT_MS15000Kills queries running longer than 15 s
CARACAL_DB_IDLE_IN_TX_TIMEOUT_MS30000Kills idle-in-transaction connections after 30 s
CARACAL_DB_CONNECTION_TIMEOUT_MS5000Fails connection acquisition after 5 s
CARACAL_DB_IDLE_TIMEOUT_MS30000Returns idle pool connections after 30 s

Two distributed coordination mechanisms use Postgres advisory locks:

Audit chain head locks (transactional): pg_advisory_xact_lock() serializes writes to the per-zone audit chain. These locks are released automatically when the transaction commits or rolls back. High audit event volume on a single zone can cause contention here — spread load across zones to reduce it.

Audit leader election (session-level):

The Audit service uses session-level advisory locks to elect a single export leader and a single retention leader across replicas:

Lock keyHolderFunction
0x4341524130303031Export leaderRuns hourly Parquet export to S3
0x4341524130303032Retention leaderManages partition creation and drop

pg_try_advisory_lock() returns false immediately if another replica holds the lock. The losing replica skips the job. If the leader crashes, the lock is released when the connection closes and another replica acquires it on the next cycle.


The audit_events table uses RANGE partitioning on occurred_at with one partition per calendar month plus a default partition for overflow. The Audit service’s retention rotator:

  • Pre-creates partitions for the current month and 3 future months on every run.
  • Drops partitions whose entire range falls before the retention cutoff (now() - AUDIT_RETENTION_DAYS).
  • Runs every 6 hours, leader-elected.

Check current partitions:

SELECT relname, pg_size_pretty(pg_relation_size(oid))
FROM pg_class
WHERE relname LIKE 'audit_events_%'
ORDER BY relname;

Caracal does not manage Postgres replication. Use standard Postgres tooling:

  • Enable WAL archiving (wal_level=replica) for point-in-time recovery.
  • Take daily base backups with pg_basebackup or a managed backup tool.
  • Test restores to a separate instance before relying on them.

The audit_events table is the most write-intensive table and the most critical to retain. Ensure its partition data is included in any backup.