Skip to content

Services

Caracal is composed of five services. Two are written in Go (sts, gateway); three are written in TypeScript (api, coordinator, audit). All five share a PostgreSQL database and a Redis instance but own distinct schema responsibilities and domain boundaries.


ServiceLanguagePortPlaneRole
Control-Plane APITypeScript (Fastify)3000ControlManages zones, applications, resources, providers, policies, grants, and sessions
STSGo (net/http)8080DataIssues ES256 JWTs, evaluates OPA policies, emits audit events
GatewayGo (net/http)8081DataReverse-proxies MCP requests; exchanges a fresh mandate on every request
CoordinatorTypeScript (Fastify)4000DataManages agent lifecycle, invocations, and the delegation graph
AuditGo (net/http)9090DataConsumes the audit stream and persists events with chain HMAC integrity

All services connect to the same PostgreSQL database and Redis instance. Schema ownership is strict — services only write to their own tables and read from others’ when needed:

PostgreSQL — primary durable store for all control and data-plane state. The Control-Plane API runs migrations on startup under an advisory lock; no other service runs migrations.

Redis — used for three distinct purposes:

  • Streams — asynchronous event bus between services (transactional outbox pattern).
  • Distributed locks — advisory locks for spawn constraints (Coordinator) and leader election (Audit).
  • Short-lived keys — JTI deduplication (STS, Gateway), step-up challenge state (STS).

┌──────────────────────────────────┐
│ Control-Plane API :3000 │
│ TypeScript / Fastify │
│ Owns: zones, apps, policies, │
│ grants, sessions, secrets │
└──────────────────┬───────────────┘
│ reads (admin API)
┌────────────────────▼────────────────────┐
│ STS :8080 │
│ Go / net/http │
│ Issues ES256 JWTs, runs OPA, │
│ publishes audit events to Redis │
└───┬────────────────────────────────┬────┘
│ called by │ called by
┌─────────▼──────────┐ ┌────────────▼───────────┐
│ Gateway :8081 │ │ Coordinator :4000 │
│ Go / net/http │ │ TypeScript / Fastify │
│ Reverse proxy; │ │ Agent lifecycle, │
│ exchanges mandate │ │ invocations, │
│ on every request │ │ delegation graph │
└────────────────────┘ └────────────────────────┘
┌──────────────────────────────────┐
│ Audit :9090 │
│ Go / net/http │
│ Consumes Redis stream, │
│ persists with HMAC chain, │
│ exports to S3 (optional) │
└──────────────────────────────────┘

Redis stream topology:

StreamProducerConsumer(s)Purpose
caracal.audit.eventsSTSAuditAudit event ingestion
caracal.sessions.revokeAPI (via outbox)Gateway, STSSession revocation propagation
caracal.policy.invalidateAPI (via outbox)STS, GatewayPolicy cache invalidation

The API and Coordinator publish to Redis through a transactional outbox — events are written to a PostgreSQL event_outbox table inside the mutation transaction, then forwarded to Redis by a background dispatcher. This guarantees event delivery even if Redis is temporarily unavailable at write time.


Token exchange (SDK or agent calling STS):

  1. SDK presents an ambient token to POST /oauth/2/token on the STS.
  2. STS fetches the zone’s application and resource records from PostgreSQL.
  3. STS evaluates the zone’s compiled OPA policy.
  4. If allowed, STS signs a per-call JWT (ES256) with the zone’s private key.
  5. STS publishes an audit event to caracal.audit.events.
  6. SDK presents the per-call JWT to the target MCP resource.

Gateway-proxied request:

  1. Client sends an MCP request to Gateway with a bearer token in Authorization.
  2. Gateway verifies the JWT signature using the zone’s JWKS (cached from STS).
  3. Gateway calls POST /oauth/2/token on STS to exchange a fresh per-call mandate.
  4. Gateway forwards the request to the upstream resource with the new mandate.
  5. Gateway streams the upstream response back to the client.

Agent spawn:

  1. SDK calls Coordinator POST /zones/{z}/agents with a bearer JWT (STS-issued, scope agent:lifecycle).
  2. Coordinator enforces spawn limits (max depth 10, max children 10, max 50/zone, max 200/app).
  3. Coordinator persists the agent session record.
  4. Coordinator’s background dispatcher publishes to the outbox stream.

Every service exposes:

  • GET /health — always 200, no dependency checks. Use for liveness probes.
  • GET /ready — checks PostgreSQL and Redis connectivity. Returns 503 if any dependency is unhealthy. Use for readiness probes.

All five services are stateless with respect to request handling. Scale any service horizontally by adding replicas. Caveats:

  • STS: each instance maintains a per-instance cache of decrypted zone signing keys (15 min TTL) and compiled OPA bundles. These are rebuilt from PostgreSQL and Redis on cache miss — no shared memory required, but cold-start evaluation is slower.
  • Gateway: each instance caches JWKS (5 min TTL) and binding tables (30 s refresh). Revocation state is synchronized via the caracal.sessions.revoke Redis stream.
  • Audit: uses a Redis consumer group (audit-ingestor). Add replicas to increase throughput; the consumer group partitions delivery automatically. Leader election via Redis lock determines which replica runs the S3 exporter and retention rotator.