# Caracal > Pre-execution authority enforcement for AI agents. Policies, mandates, and audit for production-grade autonomous systems. Caracal is an open-source system built by Garudex Labs. It issues short-lived signed mandates that bind agents and workloads to policy before protected-resource access. --- --- # Overview # URL: https://docs.caracal.run/get-started/ # Markdown: https://docs.caracal.run/markdown/get-started.md # Type: landing # Concepts: # Requires: --- Caracal gives AI agents and automated workflows short-lived, policy-approved authority instead of long-lived provider credentials in code or environment files. An agent asks for scoped authority when it acts, policy decides before the action reaches a resource, the Gateway or a verified service enforces the result, and audit records what happened. Use this section when you want the fastest path from a clean machine to a real protected call. ## Should You Use Caracal? Caracal fits when agents, services, or automation need to call APIs, tools, SaaS providers, or data systems without directly holding the upstream credential. | Question | Use Caracal when | | --- | --- | | Do agents need credentials? | Agents need scoped access to tools, APIs, providers, or data. | | Do you need policy before execution? | Access must be allowed or denied before the protected action happens. | | Do you need revocation? | Active authority must end centrally without restarting every workload. | | Do you need audit evidence? | You need to explain which app, run, policy, resource, and result were involved. | Caracal is not an LLM framework, prompt router, agent scheduler, static config store, general API gateway, or identity provider. If you only have human users behind a normal login, start with an IdP instead. :::note[Managed option] This section covers the self-hosted open-source edition. If you want managed multi-tenancy, hosted management, SSO, or a supported enterprise deployment, see [Compare Editions](/enterprise/). ::: ## First Success Path ```mermaid flowchart LR Install[Install Caracal] Stack[Start local stack] Setup[Create protected resource] Run[Run workload] Gateway[Call Gateway] Audit[Inspect audit] Install --> Stack --> Setup --> Run --> Gateway --> Audit ``` The onboarding path keeps runtime lifecycle in the `caracal` CLI and product management in the Console: 1. Install `caracal` and `caracal-console`. 2. Start the local stack with `caracal up`. 3. Open Console guided setup with `caracal console`. 4. Create one zone, agent app, resource, policy, and runtime profile. 5. Run a workload with `caracal run --` or an SDK. 6. Call the protected resource through the Gateway. 7. Inspect **audit** and **explain** in the Console. ## Core Terms | Term | Meaning | | --- | --- | | Agent app | The registered workload identity that asks Caracal for authority. | | Agent session | One tracked execution of an agent app. | | Resource | The protected API, tool, provider, service, or data target. | | Policy | The rules that allow or deny requested resource scopes. | | Mandate | The short-lived signed token Caracal issues after policy allows access. | | Gateway | The default boundary that verifies mandates, routes requests, brokers provider credentials when needed, and records action-result audit. | | Audit | The decision and result trail for authorization, execution, revocation, and diagnostics. | You can finish Get Started with only these terms. Use [Concepts](/concepts/) when you need the deeper authority, delegation, revocation, and audit model. ## Choose Your Next Path | Goal | Start here | You will have | | --- | --- | --- | | Evaluate Caracal locally | [Install Caracal](./install-caracal/) | A verified CLI, Console, and local Docker prerequisite. | | Protect one resource end to end | [First Protected Call](./first-protected-call/) | A Gateway-routed resource, active policy, runtime profile, and audit explanation. | | Add Caracal to app code | [Add SDK to Your App](./add-sdk-to-your-app/) | TypeScript, Python, or Go code that opens an agent session and calls the Gateway. | | Fix a blocked first run | [First-Run Troubleshooting](./first-run-troubleshooting/) | A focused checklist for readiness, profile, STS, Gateway, upstream, and audit issues. | | Learn the full model | [Caracal Mental Model](/concepts/model-overview/) | The canonical concept path after first success. | | Develop Caracal itself | [Set Up Locally](/contributing/setup/) | A source-tree development stack and contributor workflow. | ## Next Step Continue with [Install Caracal](./install-caracal/). --- # Install Caracal # URL: https://docs.caracal.run/get-started/install-caracal/ # Markdown: https://docs.caracal.run/markdown/get-started/install-caracal.md # Type: workflow # Concepts: # Requires: --- Install the released tools when you want to run Caracal as a product. The `caracal` CLI owns local runtime lifecycle and workload execution. The Console owns product management: zones, applications, resources, providers, policies, sessions, audit, diagnostics, and Control workflows. ## Install CLI and Console The installer downloads release archives from GitHub, verifies each archive against `SHA256SUMS`, verifies GitHub provenance attestations with `gh`, and installs: - `caracal`: local runtime CLI for `up`, `down`, `status`, `purge`, `run`, and `console`. - `caracal-console`: interactive Console for product-management workflows. To verify a download yourself, or to verify archives and container images fetched directly instead of through the installer, follow [Verify a Release](/security/verify-releases/). ### Linux and macOS ```sh curl -fsSL https://raw.githubusercontent.com/Garudex-Labs/caracal/main/install-console.sh | sh ``` Pin a release: ```sh curl -fsSL https://raw.githubusercontent.com/Garudex-Labs/caracal/main/install-console.sh | \ sh -s -- --version vYYYY.MM.DD ``` By default the installer writes to `~/.local/bin` and requires provenance verification. Override the install path with `--install-dir /path/to/bin` or `CARACAL_INSTALL_DIR=/path/to/bin`. Packaging flows can use `--prefix`, `CARACAL_PREFIX`, `PREFIX`, and `DESTDIR`; for example, `PREFIX=/usr DESTDIR=/tmp/caracal-package sh install-console.sh` stages files under `/tmp/caracal-package/usr/bin`. Use `--no-verify-provenance` only when you explicitly need to skip provenance verification. Uninstall the Unix binaries from the same install directory: ```sh install-console.sh --install-dir ~/.local/bin --uninstall ``` ### Windows ```powershell $installer = "$env:TEMP\install-console.ps1" iwr -UseBasicParsing https://raw.githubusercontent.com/Garudex-Labs/caracal/main/install-console.ps1 -OutFile $installer powershell -ExecutionPolicy Bypass -File $installer ``` Pin a release: ```powershell powershell -ExecutionPolicy Bypass -File $installer -Version vYYYY.MM.DD ``` The Windows installer writes to `%LOCALAPPDATA%\Programs\caracal` by default, requires provenance verification, and adds the install directory to the user `Path`. Open a new shell after installation. Use `-NoVerifyProvenance` only when you explicitly need to skip provenance verification. Uninstall the Windows binaries: ```powershell powershell -ExecutionPolicy Bypass -File $installer -Uninstall ``` The Windows uninstall path also removes the installer-managed user `Path` entry. ## Verify Installation ```sh caracal --version caracal-console --version caracal status --help ``` On Windows PowerShell: ```powershell caracal --version caracal-console --version caracal status --help ``` Use `caracal` for local lifecycle and workload execution only. Use `caracal console` or `caracal-console` for product management. ## Verify Docker Running the open-source stack locally requires Docker 24+ with the Compose v2 plugin: ```sh docker compose version ``` On Windows PowerShell: ```powershell docker compose version ``` The local stack includes PostgreSQL and Redis containers. You do not need to install those databases separately for Get Started. ## Platform Notes | Platform | Architectures | Notes | | --- | --- | --- | | Linux | `amd64`, `arm64` | Requires `curl` or `wget`, `tar`, `sha256sum` or `shasum`, and GitHub CLI `gh` for provenance verification. | | macOS | `amd64`, `arm64` | Requires GitHub CLI `gh`; if Gatekeeper quarantines the binary, remove quarantine from the installed file. | | Windows | `amd64` | Requires GitHub CLI `gh`; use the PowerShell installer and open a new shell after installation. | ## Next Step Continue with [First Protected Call](/get-started/first-protected-call/). --- # First Protected Call # URL: https://docs.caracal.run/get-started/first-protected-call/ # Markdown: https://docs.caracal.run/markdown/get-started/first-protected-call.md # Type: workflow # Concepts: # Requires: --- This walkthrough creates the baseline every later tutorial assumes: a running local stack, one zone, one confidential agent app, one protected resource, one active policy, one runtime profile, and one audited Gateway call. Complete [Install Caracal](/get-started/install-caracal/) first. ## Start Caracal ```sh caracal up caracal status --ready ``` On Windows PowerShell: ```powershell caracal up caracal status --ready ``` Create product objects only after readiness succeeds. If readiness fails, run: ```sh caracal status --ready --json ``` The JSON output shows which service is not ready. ## Start Demo Upstream Start a local HTTP upstream on the same Docker network as Caracal: ```sh docker run --rm -d --name caracalDemoUpstream --network caracalData nginx:stable-alpine ``` On Windows PowerShell: ```powershell docker run --rm -d --name caracalDemoUpstream --network caracalData nginx:stable-alpine ``` Use `http://caracalDemoUpstream:80` as the resource upstream URL in guided setup. If you already have an HTTP service that the Gateway container can reach, you can use that URL instead. ## Create a Protected Resource Open the Console: ```sh caracal console ``` On Windows PowerShell: ```powershell caracal console ``` Choose **guided setup** and use these first-run choices: | Step | First-run choice | | --- | --- | | Scenario | **Internal HTTP API** | | Zone | Create a local evaluation zone. | | Agent app | Create a confidential app for the workload. | | Resource | Use a stable resource ID, required scopes, and the upstream URL you chose. | | Provider | Skipped for the internal HTTP API scenario. | | Access policy | Let the Console create and activate the starter policy. | | Review | Enable **write profile files** when this machine should receive the runtime profile and secret file. | The result view shows the zone ID, application ID, resource ID, app client secret, active policy set, profile path, secret-file path, generated profile, and first-success commands. ## Save the Runtime Profile Guided setup can write the runtime profile and secret file for you. Enable **write profile files** when you want the Console to create them on the current machine. If file writing is disabled, copy the generated TOML profile and one-time app client secret from the result view into the paths the Console shows. Keep the secret file readable only by the owner. The generated profile is the source of runtime configuration for: - `CARACAL_CONFIG=... caracal run -- ` - `new Caracal()` in TypeScript and `Caracal()` in Python - `caracal.New()` in Go Always generate a profile from the objects you just created instead of hand-authoring runtime configuration. ## Run Your Workload Use the generated profile path with the command that starts your workload: ```sh CARACAL_CONFIG=/path/to/caracal.toml caracal run -- ./start-agent.sh ``` On Windows PowerShell: ```powershell $env:CARACAL_CONFIG = "C:\path\to\caracal.toml" caracal run -- .\start-agent.ps1 ``` `caracal run` exchanges the configured application secret for resource-specific Caracal access tokens and injects them into the environment variables named in the profile. The workload reads those variables and presents the token to the Gateway or protected service. For SDK-based workloads, the same generated profile is read by the SDK constructors (`new Caracal()`, `Caracal()`, `caracal.New()`). ## Call the Gateway For a Gateway-routed HTTP resource, send the request to the Gateway URL from the result view and include: | Header | Value | | --- | --- | | `Authorization` | `Bearer ` | | `X-Caracal-Resource` | The resource ID from guided setup. | Run the exact `curl` or PowerShell command from the Console result view first. It uses the resource ID, protected path, and profile values you just created. ## Inspect Audit Return to the Console: 1. Open **audit**. 2. Find the request you just made. 3. Press Enter on the event, or open **explain** and paste its request ID. The explanation should show the application, resource, requested scopes, policy version, policy set version, Gateway result, and final decision. If the request failed, the explanation identifies whether to fix runtime config, policy, resource routing, or the upstream service. ## Clean Up ```sh docker rm -f caracalDemoUpstream caracal down ``` On Windows PowerShell: ```powershell docker rm -f caracalDemoUpstream caracal down ``` Use `caracal purge` only when you intentionally want to remove local containers, volumes, config, runtime state, examples, and caches. ## Next Step Continue with [Add SDK to Your App](/get-started/add-sdk-to-your-app/). --- # Add SDK to Your App # URL: https://docs.caracal.run/get-started/add-sdk-to-your-app/ # Markdown: https://docs.caracal.run/markdown/get-started/add-sdk-to-your-app.md # Type: workflow # Concepts: # Requires: --- This page shows the smallest SDK integration in TypeScript, Python, and Go. Each example opens an agent session, calls a Gateway-protected resource, and leaves authorization and action-result audit behind. Complete [First Protected Call](/get-started/first-protected-call/) first so you have a generated runtime profile, a resource ID, and a protected path. ## Use the Runtime Profile All SDKs can read the runtime profile generated by Console guided setup. ```sh export CARACAL_CONFIG=/path/to/caracal.toml export CARACAL_RESOURCE_ID=resource://example-api export CARACAL_RESOURCE_PATH=/ ``` On Windows PowerShell: ```powershell $env:CARACAL_CONFIG = "C:\path\to\caracal.toml" $env:CARACAL_RESOURCE_ID = "resource://example-api" $env:CARACAL_RESOURCE_PATH = "/" ``` The profile supplies the zone, application, STS URL, Coordinator URL, app secret file, Gateway URL, and resource bindings. The resource variables choose which configured resource and path these examples call. ## Install the SDK Install the package for your application language. | Language | Install command | Runtime requirement | | --- | --- | --- | | TypeScript | `npm install @caracalai/sdk` or `pnpm add @caracalai/sdk` | Node.js 22+ | | Python | `pip install caracalai-sdk` | Python 3.12+ | | Go | `go get github.com/garudex-labs/caracal/packages/sdk/go` | Go 1.26+ | ## TypeScript ```typescript import { Caracal } from "@caracalai/sdk"; const caracal = new Caracal(); const resourceId = process.env.CARACAL_RESOURCE_ID; const resourcePath = process.env.CARACAL_RESOURCE_PATH; if (!resourceId) throw new Error("CARACAL_RESOURCE_ID is required"); if (!resourcePath) throw new Error("CARACAL_RESOURCE_PATH is required"); await caracal.spawn(async () => { const ctx = caracal.current(); console.log("Agent session:", ctx?.agentSessionId); const response = await caracal.fetch(resourceId, resourcePath, { method: "GET", }); console.log(await response.text()); }); ``` ## Python ```python import asyncio import os from caracalai import Caracal caracal = Caracal() resource_id = os.environ["CARACAL_RESOURCE_ID"] resource_path = os.environ["CARACAL_RESOURCE_PATH"] async def main(): async with caracal.spawn() as ctx: print("Agent session:", ctx.agent_session_id) response = await caracal.fetch( resource_id, resource_path, method="GET", ) print(response.text) asyncio.run(main()) ``` ## Go ```go package main import ( "context" "fmt" "io" "net/http" "os" caracal "github.com/garudex-labs/caracal/packages/sdk/go" ) func main() { c, err := caracal.New() if err != nil { panic(err) } resourceID := os.Getenv("CARACAL_RESOURCE_ID") resourcePath := os.Getenv("CARACAL_RESOURCE_PATH") if resourceID == "" || resourcePath == "" { panic("CARACAL_RESOURCE_ID and CARACAL_RESOURCE_PATH are required") } err = c.Spawn(context.Background(), func(ctx context.Context) error { cc, _ := c.Current(ctx) fmt.Println("Agent session:", cc.AgentSessionID) resp, err := c.Fetch(ctx, http.MethodGet, resourceID, resourcePath) if err != nil { return err } defer resp.Body.Close() body, _ := io.ReadAll(resp.Body) fmt.Println(string(body)) return nil }) if err != nil { panic(err) } } ``` ## SDK Calls | Call | Role | | --- | --- | | `new Caracal()` / `Caracal()` / `New()` | Loads the generated profile or environment configuration and prepares token exchange. | | `spawn()` / `Spawn()` | Creates an agent session, binds context for the work, and terminates the session when the block exits. | | `fetch()` / `Fetch()` | Sends one request to a resource through the Gateway with context and authority injected. | | `gatewayRequest()` / `GatewayRequest()` | Builds the Gateway URL and `X-Caracal-Resource` header for explicit resource routing. | | `transport()` / `Transport()` | Injects Caracal context, authorization, trace, and baggage headers on outbound calls. | `spawn()` creates an agent session under the application from your runtime profile. You reuse that one application for every agent session your workload spawns; you do not register an application per agent. See [Should I create one application per agent?](/reference/faq/#faq-006). If policy denies the exchange, STS returns HTTP 403 and the SDK surfaces an error. Open Console **audit** or **explain** with the request ID to see the policy and diagnostics that determined the result. ## Next Step Use [First-Run Troubleshooting](/get-started/first-run-troubleshooting/) if the SDK cannot read the profile, exchange authority, call the Gateway, or find audit evidence. Otherwise continue to [Tutorials](/tutorials/). --- # First-Run Troubleshooting # URL: https://docs.caracal.run/get-started/first-run-troubleshooting/ # Markdown: https://docs.caracal.run/markdown/get-started/first-run-troubleshooting.md # Type: workflow # Concepts: # Requires: --- Use this page when the Get Started path fails before your first protected call or first SDK integration succeeds. For production incidents and deeper operational diagnosis, use [Troubleshoot by Symptom](/operations/troubleshooting/). ## Readiness Failures Run: ```sh caracal status --ready --json ``` On Windows PowerShell: ```powershell caracal status --ready --json ``` | Symptom | Check | | --- | --- | | Docker command fails | Confirm Docker Desktop or Docker Engine is running and `docker compose version` succeeds. | | A service is not ready | Wait for the dependency named in the JSON output, then rerun readiness. | | Ports are already in use | Stop the conflicting local process or change the runtime port configuration before starting Caracal. | | Stack state looks stale | Run `caracal down`, then `caracal up`. Use `caracal purge` only when you intentionally want to remove local state. | ## Profile Issues | Symptom | Check | | --- | --- | | `CARACAL_CONFIG` is missing | Set it to the profile path shown by guided setup. | | Profile path is wrong | Open the generated TOML file and confirm it matches the current zone, application, Gateway, STS, and secret-file path. | | Secret file is rejected | Ensure the secret file exists and is readable only by the current user. | | SDK cannot load configuration | Start the process from a shell that has `CARACAL_CONFIG` set, or pass the profile path using the SDK-supported configuration mechanism. | Linux and macOS: ```sh export CARACAL_CONFIG=/path/to/caracal.toml ``` Windows PowerShell: ```powershell $env:CARACAL_CONFIG = "C:\path\to\caracal.toml" ``` ## Console File Writes If guided setup does not write files: 1. Copy the generated TOML profile from the result view. 2. Save it at the profile path shown by the Console. 3. Copy the one-time app client secret into the secret-file path shown by the Console. 4. Restrict the secret file so only the current user can read it. 5. Rerun the first-success command from the result view. The app client secret is shown once for newly created apps. If you lose it, create a new secret from the Console and update the profile. ## STS 403 An STS 403 means the workload authenticated but did not receive a mandate. | Check | Fix | | --- | --- | | Active policy set | Activate the starter policy set created by guided setup. | | Resource ID | Use the resource ID from the guided setup result view. | | Scopes | Request only scopes covered by the starter policy. | | App secret | Regenerate the app secret if the configured secret file no longer matches the app. | | Audit request ID | Open Console **audit** or **explain** with the request ID to see the policy diagnostic. | ## Gateway 403 A Gateway 403 means the Gateway rejected the request before forwarding it upstream. | Check | Fix | | --- | --- | | Authorization header | Send `Authorization: Bearer `. | | Resource header | Send `X-Caracal-Resource` with the resource ID from guided setup. | | Token freshness | Rerun the workload to obtain a fresh short-lived mandate. | | Revocation | Confirm the session, app, or delegation edge was not revoked. | | Route binding | Confirm the resource has the Gateway route and upstream URL you intended. | ## Upstream Unreachable | Symptom | Check | | --- | --- | | Connection refused | The upstream service is not listening, or the Gateway cannot reach that host and port. | | DNS failure | Use a hostname visible from the Gateway container, not only from the host shell. | | Demo upstream missing | Start `caracalDemoUpstream` on the `caracalData` Docker network or use your own reachable service. | | Wrong path | Use the first request path from the Console result view before trying custom paths. | ## Missing Audit Events | Check | Fix | | --- | --- | | Wrong request ID | Copy the request ID from the STS, Gateway, SDK, or Console output. | | Wrong zone | Select the zone used by guided setup. | | Request never reached STS or Gateway | Confirm the workload used the generated profile and the Gateway URL from that profile. | | Audit ingestion lag | Wait briefly and refresh the Console audit view. | ## Next Step After the first run succeeds, continue with [Tutorials](/tutorials/) or the language-specific [SDK guides](/guides/). --- # Tutorials # URL: https://docs.caracal.run/tutorials/ # Markdown: https://docs.caracal.run/markdown/tutorials.md # Type: workflow # Concepts: # Requires: --- Tutorials are the guided path after [Get Started](/get-started/). They turn your first local success into the next developer tasks: protect a real API, call it from your app, trace one request, then choose the production integration guide that matches your stack. ## Tutorial Path ```mermaid flowchart LR API[Protect real API] SDK[Connect app] Trace[Trace request] Path[Choose production path] API --> SDK --> Trace --> Path ``` | Outcome | Tutorial | What you use | | --- | --- | --- | | Put an enforced boundary in front of one real HTTP service or provider route. | [Protect Your First Real API](./protect-an-api/) | Console **resource**, **provider**, **policy**, Gateway, and audit. | | Wire application code through Caracal with the generated profile. | [Connect Your App with the SDK](./connect-an-agent/) | TypeScript, Python, or Go SDK; agent session; Gateway. | | Prove why one protected request succeeded or failed. | [Trace One Protected Request](./inspect-a-run/) | Console **audit**, **request trace**, sessions, policy diagnostics, and delegation context. | | Choose the durable guide for your production boundary. | [Choose Your Production Integration Path](./choose-production-path/) | Gateway, SDK, connector, runtime, delegation, audit, and step-up guides. | ## Before You Begin You should have completed [First Protected Call](/get-started/first-protected-call/) or equivalent setup: - a running Caracal stack; - one zone; - one confidential agent app; - one protected resource; - one active policy set; - a generated runtime profile. ## After Tutorials Use [Guides](/guides/) for task-specific implementation details, [SDKs](/sdks/) for package APIs, and [Concepts](/concepts/) when you need the reference model behind policy, delegation, revocation, or audit. --- # Protect Your First Real API # URL: https://docs.caracal.run/tutorials/protect-an-api/ # Markdown: https://docs.caracal.run/markdown/tutorials/protect-an-api.md # Type: workflow # Concepts: # Requires: --- This tutorial moves from the demo resource in Get Started to one API-style resource that belongs to your stack. The recommended first boundary is Gateway-routed HTTP because it centralizes mandate verification, replay and revocation checks, provider credential brokering, routing, and action-result audit. Use connector-verified services only when you own the service boundary and can preserve the same enforcement contract inside that service. ## 1. Choose What Caracal Should Protect | Boundary | Use when | What writes action-result audit | | --- | --- | --- | | Gateway-routed HTTP | The resource is HTTP, REST, MCP-over-HTTP, or another Gateway-routable upstream. | Gateway. | | In-process connector | You own the service process and can verify mandates there. | Your service or adapter after connector verification. | Start with Gateway-routed HTTP unless your service must enforce directly in process. ## 2. Create or Select the Real Resource Open the Console with: ```sh caracal console ``` Use **guided setup** for a first resource or open **resource** when you are adding one to an existing zone. Define: | Field | Guidance | | --- | --- | | Resource identifier | Use a stable ID, for example `resource://billing-api`. | | Scopes | Use short action names such as `billing:read` or `billing:submit`. | | Upstream URL | Use a URL reachable from the Gateway. | | Provider | Select one for the route. Use a `None` provider only when Gateway should enforce access but the upstream needs no brokered credential. | Keep the resource ID. SDK code, policy, Gateway headers, and audit events all refer to it. ## 3. Allow This App to Use This Resource Caracal denies by default. Use Console **guided setup** to create the starter policy, or open **policy** to register and activate a Rego policy set for the zone. A complete first policy should allow only: - the expected agent app; - the expected resource; - the expected scopes; - the expected zone. See [Author Policy Data](/guides/author-policy/) and [Activate a Policy Set](/guides/activate-policy-set/) for the policy contract. ## 4. Choose Gateway or In-Process Enforcement ### Gateway-routed HTTP For Gateway-routed calls, the resource route needs an upstream URL and, when required, a provider credential source. The Gateway receives: | Request part | Meaning | | --- | --- | | `Authorization: Bearer ` | The Caracal access token issued by STS. | | `X-Caracal-Resource: ` | The stable resource identifier. | Use [Protect a Gateway-Routed HTTP API](/guides/protect-gateway-http/) when you need the durable implementation guide for routes, provider auth modes, and validation. ### In-process connector For services you own, mount the connector at the request boundary: | Runtime | Guide | | --- | --- | | Express | [Protect an Express App](/guides/protect-express/) | | FastMCP | [Protect a FastMCP App](/guides/protect-fastmcp/) | | Go `net/http` | [Protect a Go net/http Service](/guides/protect-nethttp/) | | MCP server | [Protect an MCP Server](/guides/protect-mcp/) | Connector verification is not enough by itself for a complete production boundary. The protected service must also handle replay protection where applicable, revocation checks, and result audit for accepted work. ## 5. Verify Allowed, Denied, and Audited Calls Call the resource from an app configured with the generated runtime profile. | Outcome | What to expect | | --- | --- | | Allowed | The call reaches the upstream and audit shows authorization plus action-result evidence. | | Denied | STS returns a denial and request trace shows which policy or missing state caused it. | Open Console **audit** and **request trace** with the request ID. If an accepted Gateway-routed call has authorization audit but no action-result audit, check the Gateway route and resource header. If a connector-verified call lacks result audit, the service-side enforcement path is incomplete. ## Next Step Continue with [Connect Your App with the SDK](../connect-an-agent/) to wire application code through the resource you just protected. --- # Connect Your App with the SDK # URL: https://docs.caracal.run/tutorials/connect-an-agent/ # Markdown: https://docs.caracal.run/markdown/tutorials/connect-an-agent.md # Type: workflow # Concepts: # Requires: --- This tutorial connects application code to Caracal. The examples use a generated runtime profile, open one agent session, and call a Gateway-protected resource. ## 1. Confirm the App and Profile In the Console, confirm that the selected zone has: - one confidential agent app; - a client secret stored in the secret file named by the generated runtime profile; - one protected resource with an active policy; - a Gateway URL and resource binding when using Gateway-routed HTTP. Set the profile path and the resource target for the examples: ```sh export CARACAL_CONFIG=/path/to/caracal.toml export CARACAL_RESOURCE_ID=resource://billing-api export CARACAL_RESOURCE_PATH=/charge ``` ## 2. Connect with the SDK The client constructor is the standard entry point. It loads the generated profile when available, then falls back to environment configuration. See [Configuration Order](/reference/config-precedence/) for the exact resolution order. ### TypeScript ```sh npm install @caracalai/sdk ``` ```typescript import { Caracal } from "@caracalai/sdk"; const caracal = new Caracal(); const resourceId = process.env.CARACAL_RESOURCE_ID!; const resourcePath = process.env.CARACAL_RESOURCE_PATH!; await caracal.spawn(async () => { const request = caracal.gatewayRequest(resourceId, resourcePath); const response = await caracal.transport()(request.url, { method: "POST", headers: { ...request.headers, "Content-Type": "application/json" }, body: JSON.stringify({ amount: 1200 }), }); console.log(await response.text()); }); ``` ### Python ```sh pip install caracalai-sdk ``` ```python import asyncio import os from caracalai import Caracal caracal = Caracal() resource_id = os.environ["CARACAL_RESOURCE_ID"] resource_path = os.environ["CARACAL_RESOURCE_PATH"] async def main(): async with caracal.spawn(): request = caracal.gateway_request(resource_id, resource_path) async with caracal.transport() as client: response = await client.post( request.url, headers=request.headers, json={"amount": 1200}, ) print(response.text) asyncio.run(main()) ``` ### Go ```sh go get github.com/garudex-labs/caracal/packages/sdk/go ``` ```go c, err := caracal.New() if err != nil { panic(err) } resourceID := os.Getenv("CARACAL_RESOURCE_ID") resourcePath := os.Getenv("CARACAL_RESOURCE_PATH") err = c.Spawn(context.Background(), func(ctx context.Context) error { target, err := c.GatewayRequest(resourceID, resourcePath) if err != nil { return err } req, err := http.NewRequestWithContext(ctx, http.MethodPost, target.URL, strings.NewReader(`{"amount":1200}`)) if err != nil { return err } req.Header = target.Header.Clone() req.Header.Set("Content-Type", "application/json") resp, err := c.Transport(nil).Do(req) if err != nil { return err } defer resp.Body.Close() return nil }) if err != nil { panic(err) } ``` ## 3. Understand the Runtime Behavior | SDK call | What happens | | --- | --- | | `new Caracal()` / `Caracal()` / `New()` | Loads configuration and prepares app-secret token exchange. | | `spawn()` / `Spawn()` | Creates an agent session, binds context, and tears it down after the block returns. | | `gatewayRequest()` / `GatewayRequest()` | Builds the Gateway URL and `X-Caracal-Resource` header. | | `transport()` / `Transport()` | Injects Authorization, trace, baggage, and Gateway routing behavior. | `spawn()` is the session boundary. If the function exits or raises, the SDK terminates the agent session and active authority for that session is revoked. One application credential backs every agent session you spawn. A workload registers one application, then opens an agent session per unit of work — you do not register a new application per agent. See [Should I create one application per agent?](/reference/faq/#faq-006). If your app spawns child agents or delegates work, continue to [Implement Multi-Agent Delegation](/guides/delegation/) after this tutorial path. ## 4. Verify Audit Open Console **audit** and find the request. A complete Gateway-routed call produces: 1. an authorization event from STS; 2. an action-result event from Gateway. Open **explain** with the request ID to confirm the application, resource, scopes, policy, policy set version, and decision. ## Next Step Continue with [Trace One Protected Request](../inspect-a-run/) to investigate the request you just made. --- # Trace One Protected Request # URL: https://docs.caracal.run/tutorials/inspect-a-run/ # Markdown: https://docs.caracal.run/markdown/tutorials/inspect-a-run.md # Type: workflow # Concepts: # Requires: --- This tutorial shows how to prove what happened during one protected request. You will find the request, inspect the session context, review audit events, and trace the final decision. The flow assumes you already completed [Protect Your First Real API](../protect-an-api/) and [Connect Your App with the SDK](../connect-an-agent/). ## 1. Find the Request Open the Console: ```sh caracal console ``` Use these Console entries: | Entry | Use it for | | --- | --- | | **audit** | Find recent authorization and action-result events. | | **request trace** | Explain one request ID. | | **agent session** | Find the tracked agent session for the selected zone. | | **authority session** | Inspect underlying authority sessions and parent relationships. | Record the request ID from the SDK, Gateway, STS error, audit event, or Console trace. ## 2. Read the Audit Timeline Open **audit**. The live audit view opens recent events immediately and supports request details, request tracing, advanced filters, reload, and pause/resume. A Gateway-routed protected call usually produces: | Event class | What it proves | | --- | --- | | Authorization event | STS evaluated policy for the app, resource, scopes, agent session state, and delegation context. | | Action-result event | Gateway verified and routed the request, then recorded the upstream result. | If you see authorization events without matching action-result events, check whether the call reached the Gateway. For connector-verified services, confirm the service writes its own result audit after mandate verification. ## 3. Trace the Decision Pick an audit event with a request ID. Press the trace action on the selected event in **audit**, or open **request trace** and paste the request ID. The request trace should show: - application and execution session; - resource and requested scopes; - policy set version and policy rules involved; - allow, deny, or partial decision; - diagnostics from policy evaluation; - delegation edge constraints when present. Use the request trace as the source of truth for why a request got its result. If an app behaves differently than expected, compare the trace against the active policy set, resource configuration, and runtime profile. ## 4. Check Delegation Context If the request used delegated authority, open **delegation** and inspect the edge: | Field | Why it matters | | --- | --- | | Source session | The session that granted authority. | | Target session | The session allowed to use the delegated permission. | | Scopes | The narrowed actions. | | TTL and hop limits | The lifetime and graph depth of the permission. | | Resource constraints | The targets the delegated permission can affect. | Use [Implement Multi-Agent Delegation](/guides/delegation/) when you need to design or debug delegation flows. ## 5. Check Revocation Behavior Revoking a session, grant, delegation edge, or zone/application emergency state invalidates authority anchored to it. Gateway-routed calls check revocation before accepting a request and while streaming responses. Audit records revocation events and marks interrupted action results. ## Wrap Up You can now trace one protected request from application identity to policy decision, Gateway action, delegation context, and revocation state. ## Next Step Continue with [Choose Your Production Integration Path](../choose-production-path/). --- # Choose Your Production Integration Path # URL: https://docs.caracal.run/tutorials/choose-production-path/ # Markdown: https://docs.caracal.run/markdown/tutorials/choose-production-path.md # Type: workflow # Concepts: # Requires: --- Use this bridge after the tutorials. Pick the guide that matches the boundary you will own in production. ## Choose by Boundary | If your production path is... | Continue with | | --- | --- | | Route HTTP traffic through Caracal Gateway | [Protect a Gateway-Routed HTTP API](/guides/protect-gateway-http/) | | Add Caracal session and Gateway calls to app code | [TypeScript SDK](/guides/sdk-typescript/), [Python SDK](/guides/sdk-python/), or [Go SDK](/guides/sdk-go/) | | Protect an Express resource server | [Protect an Express App](/guides/protect-express/) | | Protect a FastMCP resource server | [Protect a FastMCP App](/guides/protect-fastmcp/) | | Protect a Go `net/http` resource server | [Protect a Go net/http Service](/guides/protect-nethttp/) | | Protect an MCP server without a dedicated connector | [Protect an MCP Server](/guides/protect-mcp/) | | Run an existing CLI or worker under injected runtime tokens | [Run an Agent with caracal run](/guides/runtime-run/) | | Model zones, apps, resources, and customer boundaries | [Model Your Application in Caracal](/guides/modeling-recipes/) | | Configure provider credentials or OAuth | [Define Resources and Providers](/guides/resources-providers/) and [Provider Recipes](/guides/provider-recipes/) | | Debug denies or unexpected allows | [Debug Authorization Decisions](/guides/authorize-access/) | | Delegate work between agents | [Implement Multi-Agent Delegation](/guides/delegation/) | | Export or query audit evidence | [Tail and Query the Audit Stream](/guides/audit-stream/) | | Require fresh proof for sensitive actions | [Step-Up Re-Authentication](/guides/step-up/) | ## Choose by Team Role | Role | Start with | | --- | --- | | App engineer | SDK guide for your language, then Gateway or connector guide. | | Platform engineer | [Production Integration Patterns](/guides/enterprise-runtime-patterns/), then operations pages. | | Security reviewer | [Debug Authorization Decisions](/guides/authorize-access/), [Audit and Request Traces](/concepts/audit-ledger/), and [Review the Threat Model](/security/threat-model/). | | Policy owner | [Author Policy Data](/guides/author-policy/) and [Activate a Policy Set](/guides/activate-policy-set/). | ## Next Step Open [Guides](/guides/) and follow the path that matches your boundary. --- # Guides # URL: https://docs.caracal.run/guides/ # Markdown: https://docs.caracal.run/markdown/guides.md # Type: page # Concepts: # Requires: --- Use Guides after [Get Started](/get-started/) or the [Tutorials](/tutorials/) when you know the task you need to complete. You do not need to read every concept first; each guide links to the canonical concept page when the model matters. ## Choose by Task | Task | Start with | | --- | --- | | Map your architecture onto Caracal | [Model Your Application in Caracal](/guides/modeling-recipes/) | | Serve many of your own customers from one deployment | [Serve Your Own Customers](/guides/serve-customers/) | | Define protected targets and upstream credentials | [Define Resources and Providers](/guides/resources-providers/) and [Provider Recipes](/guides/provider-recipes/) | | Write and activate authorization logic | [Author Policy Data](/guides/author-policy/) and [Activate a Policy Set](/guides/activate-policy-set/) | | Debug an authorization result | [Debug Authorization Decisions](/guides/authorize-access/) | | Add Caracal to app code | [TypeScript SDK](/guides/sdk-typescript/), [Python SDK](/guides/sdk-python/), or [Go SDK](/guides/sdk-go/) | | Run an existing process with Caracal tokens | [Run an Agent with caracal run](/guides/runtime-run/) | | Protect a Gateway-routed HTTP upstream | [Protect a Gateway-Routed HTTP API](/guides/protect-gateway-http/) | | Protect a resource server in process | [Express](/guides/protect-express/), [FastAPI](/guides/protect-fastapi/), [FastMCP](/guides/protect-fastmcp/), [Go net/http](/guides/protect-nethttp/), or [MCP server](/guides/protect-mcp/) | | Add delegation, audit export, or step-up | [Delegation](/guides/delegation/), [Audit Stream](/guides/audit-stream/), or [Step-Up Re-Authentication](/guides/step-up/) | | Plan a production integration | [Production Integration Patterns](/guides/enterprise-runtime-patterns/) | ## Recommended Order ```mermaid flowchart LR Model["Model app"] Resource["Define resources"] Policy["Author policy"] Activate["Activate policy"] App["Integrate app"] Protect["Protect boundary"] Debug["Trace and debug"] Model --> Resource --> Policy --> Activate --> App --> Protect --> Debug ``` ## Surface Boundaries Use the right surface for each task: | Surface | Use for | | --- | --- | | `caracal up`, `down`, `status`, `purge`, `run`, `console` | Local runtime lifecycle and subprocess injection. | | Console | Human-facing zone, application, provider, resource, policy, session, audit, explanation, delegation, and diagnostic workflows. | | Admin API and `@caracalai/admin` | Automation for the same control-plane objects. | | SDKs and connectors | Application integration, context propagation, mandate exchange, and mandate verification. | ## Before You Start You need a running Caracal runtime, a zone, an application, at least one resource, and an active policy set. [First Protected Call](/get-started/first-protected-call/) creates that baseline. --- # Model Your Application in Caracal # URL: https://docs.caracal.run/guides/modeling-recipes/ # Markdown: https://docs.caracal.run/markdown/guides/modeling-recipes.md # Type: page # Concepts: # Requires: --- The [concept pages](/concepts/) define each noun, and [Provider Recipes](/guides/provider-recipes/) give concrete upstream auth setups. This page helps you decide how to map your architecture onto Caracal before you create production objects. Every recipe states the modeling decision, what maps to each noun, and the trade-off. Use the [Caracal Mental Model](/concepts/model-overview/) as the vocabulary reference while you read. ## Choose the Zone Boundary The most common question is “what is a zone — an environment, a customer, a team, or a product area?” A zone is none of those by default. It is the isolation boundary that owns signing keys, policy sets, sessions, audit, and authority data. You decide what trust boundary it represents. Use a **separate zone** whenever two workloads must not share signing keys, policy activation, or audit trails. Use a **shared zone with separate resources** when workloads belong to the same trust boundary but target different upstreams. :::note[FAQ] [What should a zone represent?](/reference/faq/#faq-003) and [does the open-source edition include managed multi-tenancy?](/reference/faq/#faq-004) ::: | Question | If yes, lean toward | | --- | --- | | Must these workloads have independent signing keys and JWKS? | Separate zones | | Must a policy change for one never affect the other? | Separate zones | | Must audit and explain traces never mix? | Separate zones | | Do they share keys, policy owners, and audit, but call different upstreams? | One zone, separate resources | | Do they share an upstream but need different actions? | One resource, separate scopes | Keep [resource identifiers](/concepts/resource-grant/) stable across zones so policy, grants, and audit refer to the same target even when upstream URLs differ. ## Single Application, Single Environment The simplest deployment: one team, one runtime, a handful of upstreams. | Noun | Mapping | | --- | --- | | Zone | One zone for the whole deployment. | | Application | One managed application per durable workload; one DCR application per isolated, externally-launched identity (per tenant, job, or integration). | | Resource | One resource per protected upstream, with action-oriented scopes. | | Grant | One grant per application and subject that may request a resource's scopes. | Trade-off: lowest operational overhead. Add zones only when you need key, policy, or audit isolation. ## Per-Environment Zones Separate production, staging, and development so a policy or key change in one cannot affect another. | Noun | Mapping | | --- | --- | | Zone | One zone per environment: `prod`, `staging`, `dev`. | | Resource | The same stable identifier in each zone, such as `resource://pipernet`, pointing at that environment's upstream URL. | | Policy set | Authored and activated independently per zone. | | Keys | Each zone has its own signing key and JWKS. | Trade-off: clean blast-radius isolation and independent key rotation, at the cost of registering resources and activating policy in each zone. Automate this with the Admin API so environments stay consistent. ## Multiple Customers or Workspaces A platform with many customer workspaces and one shared agent service must choose how customer isolation maps onto zones. The open-source edition gives you the **zone** as the isolation primitive; you provision and automate customer onboarding yourself through the Admin API. Managed tenant, team, and SSO lifecycle features are covered in [Compare Editions](/enterprise/). | Model | When to use | Trade-off | | --- | --- | --- | | Zone per customer | Customers require isolated signing keys, isolated audit, and policy that one customer's change can never affect another's. | Strongest isolation; you automate per-zone provisioning and key rotation, and one shared agent service authenticates separately into each zone. | | Shared zone, resource per customer | Customers share a trust boundary and policy owner but target distinct upstreams or data sets. | Lower overhead; isolation is enforced by policy and grants, not by keys or audit separation. | | Shared zone, customer in policy input | Customer is a runtime attribute of the same resource, carried in the request and checked by policy. | Lowest overhead; relies entirely on policy correctness, so audit and keys are shared. | Decide on the strongest isolation a customer actually requires, then pick the least complex model that satisfies it. Do not encode customer identity into [scope names](/concepts/resource-grant/); keep it in the zone, principal, or policy input. For the end-to-end pattern of serving many customers from one shared zone, see [Serve Your Own Customers](/guides/serve-customers/). ## High-Sensitivity Resources For resources whose compromise is unacceptable — payouts, key material, production data deletion. | Option | Mapping | | --- | --- | | Dedicated zone | Put the sensitive resource in its own zone with distinct signing keys and a separate audit trail. | | Tight scopes | Split actions into the smallest scopes, such as `payments:read` and `payments:refund`, so grants stay minimal. | | Step-up | Require a fresh proof before the sensitive action through [Step-Up Re-Authentication](/guides/step-up/). | Trade-off: a dedicated zone gives the cleanest audit and key separation; in-zone tight scopes plus step-up are lighter and often enough. Combine them for the highest-risk targets. ## App-Only Agents and User-Delegated Agents The principal type decides who the authority belongs to and which provider flow fits. | Dimension | App-only service agent | User-delegated agent | | --- | --- | --- | | Principal | Service identity bound to an application. | User subject session the agent acts on behalf of. | | Session | Application-bound exchange subject. | Subject session, then spawned agent and delegation sessions. | | Upstream credential | Shared broker credential the agent never sees: `oauth2_client_credentials`, `api_key`, or `bearer_token`. | Per-user delegated consent: `oauth2_authorization_code`. | | Audit attribution | Application and agent session. | User subject plus the agent and delegation chain. | Policy can tell these apart with `input.principal.registration_method`, `input.principal.agent_session_id`, and `input.principal.labels`, so you author one policy set that handles both without SDK-specific branching. See [Identities and Applications](/concepts/principal/). ## Per-User OAuth or Shared Credential Choose how the upstream credential is held. | Choose | When | Provider kind | | --- | --- | --- | | Per-user OAuth grant | Each end user must consent, and the upstream call must act as that user. | `oauth2_authorization_code` | | Shared service credential | The agent acts as the application, not a specific user. | `oauth2_client_credentials`, `api_key`, or `bearer_token` | For per-user grants, Caracal owns `client_id`, `redirect_uri`, `state`, and PKCE; use the provider `connect` action to mint a consent URL for a user and resource, and `disconnect` to revoke it. Reconnecting the same user, resource, and provider replaces the active grant rather than duplicating it. The concrete field tables are in [Provider Recipes](/guides/provider-recipes/). ## Validate the Model After you map your architecture, confirm it end to end before relying on it: - Author and activate a policy set that allows the intended application, subject, resource, and scopes. - Run [Check Provider Readiness](/examples/provider-preflight/) to verify resource-to-provider binding and that the active policy set returns `allow`. - Send a successful request, a denied request, and a revoked-session request, then confirm each has a clear audit trail in the Console. ## Related - [Zones](/concepts/zone/) - [Identities and Applications](/concepts/principal/) - [Resources and Grants](/concepts/resource-grant/) - [Define Resources and Providers](/guides/resources-providers/) - [Provider Recipes](/guides/provider-recipes/) - [Production Integration Patterns](/guides/enterprise-runtime-patterns/) - [Compare Editions](/enterprise/) --- # Serve Your Own Customers # URL: https://docs.caracal.run/guides/serve-customers/ # Markdown: https://docs.caracal.run/markdown/guides/serve-customers.md # Type: page # Concepts: # Requires: --- [Model Your Application in Caracal](/guides/modeling-recipes/) compares the isolation models at a high level. This page is the end-to-end pattern for the most common case: your application is a single product that serves many of *its own* customers, and you want clean per-customer authority without standing up infrastructure for each customer. The short answer: run **one zone for your deployment** and represent **each customer as a subject**. Customer separation lives below the zone — in subjects, sessions, agents, and delegation — which is exactly where Caracal models per-actor authority. You do not create a zone per customer. :::note[FAQ] [What should a zone represent?](/reference/faq/#faq-003) and [does the open-source edition include managed multi-tenancy?](/reference/faq/#faq-004) ::: ## Where Each Customer Lives in the Model | Layer | Owns | Per-customer? | | --- | --- | --- | | Zone | Signing keys, policy set, resources, providers, audit trail | No — one per deployment | | Application | Your product's service identity | No — shared by all customers | | Subject session | The authenticated customer the work is for (`sub`) | Yes — one per customer | | Agent session | One agent run, bound to a subject session, carrying a customer correlation key in metadata | Yes — one per customer task | | Delegation edge | The scoped authority that agent holds for one resource | Yes — least privilege per task | A zone is the trust boundary that owns keys, policy, and audit. A customer is not a trust boundary of its own here; it is an *attribute* of the subject and the work, checked by policy and recorded in audit. Keep customer identity in the subject and metadata — never in [scope names](/concepts/resource-grant/). ## Step 1: One Zone, One Application Provision a single zone for the deployment and one managed application for your product, as in [Model Your Application in Caracal](/guides/modeling-recipes/). Set the application identity once in the environment; it is shared across all customers. ```bash CARACAL_ZONE_ID="" CARACAL_APPLICATION_ID="" CARACAL_APP_CLIENT_SECRET="" ``` The zone owns one policy set and one audit trail. Every customer's authority is decided and recorded inside it. ## Step 2: Mint a Subject per Customer When a customer signs in, mint a subject session whose `sub` is your stable customer identifier. The subject is the customer's identity inside Caracal: STS records the session as a user session, and the `sub` flows into every downstream agent, delegation, and audit event. Use a customer id that never changes for the life of the account, so audit history stays attributable even after profile changes. Do not reuse one `sub` across two customers — separation is only as strong as your subject issuance. ## Step 3: Spawn a per-Customer Agent with a Correlation Key Spawn the agent for a customer's request inside the one zone, bound to that customer's subject session, and stamp the customer id into the agent's metadata. The metadata travels with the agent session and makes per-customer attribution a direct lookup instead of a guess. ```python async with caracal.spawn( metadata={"customer_id": customer_id}, ) as ctx: # Every gateway and provider call in this block carries a scoped, # non-root mandate for this customer's subject. await do_work(ctx) ``` `subject_session_id` references the STS session, so it is the authority anchor; the customer id in `metadata` is the business correlation key. Use a single, consistent metadata key — `customer_id` is the recommended convention — so audit queries and dashboards can filter by customer without per-app special cases. :::caution[Attribution is not enforcement] Agent metadata travels with the session and lands in audit, but it is **not** part of policy input. Policy sees the subject's claims (`input.context.subject_claims`), the agent's labels (`input.principal.labels`), and the delegation edge — never the metadata map. Use metadata to *find* a customer's activity; use subject claims or labels when policy must *restrict* it. ::: For fan-out work, give each child agent the least authority it needs with [delegation](/guides/delegation/), keeping one customer's blast radius contained even though all customers share the zone. ### Customer Labels for App-Only Workloads When the work is app-initiated — a billing sweep, a dunning cycle, a scheduled job acting on a customer's records without that customer signing in — there is no subject session to anchor the customer. Carry the customer on the agent session itself with a label: ```python async with caracal.spawn( grant=Grant.narrow(["billing:collect"]), labels=[role, f"customer:{customer_id}"], metadata={"customer_id": customer_id}, ) as ctx: await collect_overdue(ctx) ``` Labels drive the platform decision contract's confinement: a `confinement` data document caps every customer-labeled agent to the customer-record surface — whatever its role would otherwise allow: ```text # caracal:data-document package caracal.authz import rego.v1 confinement := [{ "label_prefix": "customer:", "scopes": ["billing:collect", "billing:read"], }] ``` Publish this `confinement` document and a worker spawned for one customer can never mint authority outside that surface, even by accident. The [Lynx Capital example](/examples/lynx-capital/) ships this confinement data with tests. ## Step 4: Differentiate Authority per Customer All customers share one zone policy set, but authority is per-request and label-aware. The agent's role labels and the delegation edge are in the platform decision contract's input, and your `grants` map roles to scopes per resource. Model plan or tier differences as roles: give the scale-plan role the resource scope, withhold it from the others, and label each customer's session with the role it earns. ```text # caracal:data-document package caracal.authz import rego.v1 grants := { "resource://payouts": { "application": "payments", "roles": {"scale-plan": ["payouts:run"]}, }, } ``` A session spawned for a scale-plan customer carries the `scale-plan` label and mints `payouts:run`; a starter-plan session never holds that role, so the platform contract denies it. Express customer differences as roles in `grants` and prefixes in `confinement` — not as a separate policy set per customer. There is one active policy set per zone. See [Author Policy Data](/guides/author-policy/). ## Step 5: Attribute and Revoke per Customer Every decision is written to the zone audit ledger with the application, session, agent session, and delegation chain. The customer id rides in the agent metadata you set in Step 3, and the subject is reachable through the session, so you can answer "what did this customer do, with what authority" from the audit trail. See [Audit and Request Traces](/concepts/audit-ledger/). To build a per-customer read-only view — for an internal support console or a customer-facing activity page — query the shared trail by the convention you already established: filter agent sessions on the `customer_id` metadata key (or the `customer:` label), then collect each session's decisions, delegations, and Gateway events. Because every session carries the key from spawn, the view is a filter over one audit trail, not a separate per-customer store. The same filter scopes Console searches and SIEM exports. To cut a customer off, revoke that customer's subject session and its agents and delegation edges; cascade revocation tears down the chain beneath them. There is no single "purge everything for a customer" call, so iterate the customer's sessions and agents. See [Sessions and Revocation](/concepts/sessions-revocation/). ## Limits to Plan For One shared zone serves many customers well, with two ceilings to design around. The agent concurrency caps and the rate-limit scope are in [Defaults and Limits](/reference/defaults-and-limits/#agent-and-delegation-limits). - **Concurrent agents per zone** are capped, so a single zone bounds how many customer agents can run at once. If you need more simultaneous customer agents than one zone allows, add applications within the zone, or move the busiest customers to their own zone. - **STS rate limiting is per zone, resource, and acting application** — not per customer. Because customers share your application identity, one heavy customer draws on the shared budget. Keep this in mind for fairness, and isolate a customer into its own zone if it must have a guaranteed independent budget. ## When to Use a Zone per Customer Instead Stay with one shared zone unless a customer genuinely requires hard isolation: independent signing keys, an audit trail that can never mix with others, or a policy change that can never affect another customer. Those needs are the **zone per customer** model in [Model Your Application in Caracal](/guides/modeling-recipes/#multiple-customers-or-workspaces). In the open-source edition you provision and automate those zones yourself through the Admin API, and your one application authenticates separately into each. A scoped Control key cannot create zones — it is bound to the single zone it was issued in. Managed per-tenant lifecycle, where customer onboarding mints and operates a zone automatically, is an Enterprise Edition feature; see [Compare Editions](/enterprise/). ## Validate the Pattern - Sign in as two different customers, run the same workflow, and confirm each produces a distinct subject and agent session in the Console. - Author grant data that gives one customer's role a resource and withholds it from another, then confirm both decisions in the audit trail. - Revoke one customer's session and confirm its in-flight agents lose authority while the other customer is unaffected. ## Related - [Model Your Application in Caracal](/guides/modeling-recipes/) - [Zones](/concepts/zone/) - [Identities and Applications](/concepts/principal/) - [Implement Multi-Agent Delegation](/guides/delegation/) - [Author Policy Data](/guides/author-policy/) - [Sessions and Revocation](/concepts/sessions-revocation/) - [Audit and Request Traces](/concepts/audit-ledger/) --- # Define Resources and Providers # URL: https://docs.caracal.run/guides/resources-providers/ # Markdown: https://docs.caracal.run/markdown/guides/resources-providers.md # Type: page # Concepts: # Requires: --- Resources tell Caracal what protected target is being accessed. Providers describe how Gateway authenticates to that target, including no-credential and Caracal mandate flows. ## When to create each object | Object | Create it when | | --- | --- | | Resource | You need a stable protected target, Caracal resource scopes, an upstream URL, a Gateway application, and one upstream credential provider binding. | | Provider | A resource needs an explicit upstream auth mode: none, Caracal mandate, OAuth 2.0 authorization code, OAuth 2.0 client credentials, API key, or bearer-token upstream auth. | | Gateway application | You want Gateway-originated upstream calls to use a specific Caracal application identity. | Resource fields should answer “what is being protected and where does Gateway send traffic?” Provider fields should answer “what credential or identity does Gateway attach upstream?” Keep credential details such as client secrets, token endpoints, API-key placement, bearer tokens, identity forwarding, and runtime injection on the provider. Keep target details such as resource identifier, Caracal resource scopes, upstream URL, Gateway application, and the selected upstream credential provider on the resource. :::note[FAQ] [What is the difference between a resource and a provider?](/reference/faq/#faq-009) and [is an application secret the same as a provider credential?](/reference/faq/#faq-013) ::: ## Console workflow 1. Start the local runtime with `caracal up`. 2. Open `caracal console`. 3. Select `resource`. 4. Create a resource with a stable identifier, such as `resource://pipernet`. 5. Add action-oriented scopes, such as `payments:read` and `payments:refund`. 6. Set the upstream URL, Gateway application, and upstream credential provider for the protected service. 7. Select a `None` provider when Gateway should enforce Caracal access but the upstream expects no credential. 8. Select `Caracal mandate` for resources that verify Caracal tokens directly, or choose a provider-native type when the upstream needs external credentials. 9. Attach exactly one upstream credential provider to the resource. 10. For a path-addressed REST upstream, declare the operations the Gateway may invoke and keep operation authority `enforced`; choose `transport_uniform` for single-surface transports such as MCP. ## Provider auth modes | Provider type | Required main fields | Runtime behavior | | --- | --- | --- | | None | No credential fields. | Gateway strips caller auth and forwards no upstream credential; use only when Gateway is the enforcement point and the upstream expects no credential. | | Caracal mandate | No credential fields. | Gateway forwards the Caracal resource mandate in `Authorization: Bearer ...`; the resource verifies issuer, audience, scopes, target, expiry, and revocation. | | OAuth 2.0 authorization code | Authorization endpoint, token endpoint, redirect URI, client ID, client secret, optional upstream OAuth scopes. | Creates user/resource provider grants through a browser consent callback, then refreshes delegated provider tokens inside STS. | | OAuth 2.0 client credentials | Token endpoint, client ID, client secret unless OAuth client authentication is `none`, optional upstream OAuth scopes. | STS obtains and caches service-to-service provider tokens for Gateway upstream calls. | | API key | Header name and API key. | Gateway forwards the sealed provider key in the configured header, with an optional auth scheme prefix. | | Bearer token | Bearer token. | Gateway forwards the sealed provider token as `Authorization: Bearer ...` unless Advanced routing configures another header or scheme. | OAuth 2.0 authorization-code providers are for delegated user consent. Register the exact Caracal callback URI as the provider redirect URI, such as `https://api.hooli.example/v1/zones/z1/provider-grants/oauth/callback`, then use the provider `connect` action to create an authorization URL for a user, resource, and Caracal resource scope set. Upstream OAuth scopes are main-form optional fields because they are a common part of consent. OAuth authorization parameters are Advanced key/value entries for provider-specific browser parameters, such as `access_type=offline` and `prompt=consent`; Caracal always owns reserved OAuth fields such as `client_id`, `redirect_uri`, `state`, and PKCE challenge values. OAuth token parameters are Advanced key/value entries for provider-specific token endpoint parameters; Caracal owns credentials, grants, refresh tokens, and scope parameters. OAuth token endpoint hosts constrain the code exchange and later refresh calls, and token endpoints must resolve to routable public addresses. The callback endpoint is intentionally public so external providers can redirect browsers back to Caracal. Security comes from short-lived Redis-backed state, one-time state consumption, PKCE, resource/upstream credential provider binding checks, scope checks, sealed token storage, HTTPS-only token exchange, redirect blocking, host allow-listing, and private-address rejection. In cloud deployments, all API replicas must share Redis so a callback can validate state created by any replica. Reconnecting the same user, resource, and provider replaces the active provider grant instead of creating duplicate active credentials. Use the provider `disconnect` action to revoke the active delegated provider grant for a user and resource. OAuth 2.0 client credentials keeps common machine-to-machine setup in the main form and moves provider-specific controls to Advanced. Use upstream OAuth scopes when the token endpoint expects OAuth scopes, OAuth token audience when the provider requires an `audience` parameter, OAuth resource indicator when the provider expects a resource indicator, and OAuth token parameters only for documented provider-specific token endpoint parameters. OAuth token endpoint hosts constrain outbound token acquisition; Console infers the host from the token endpoint when the Advanced field is blank. API key providers use one required routing field: the exact header the upstream expects. Use `X-API-Key` or another vendor header when the API wants the raw key value, use `Authorization` plus the Advanced auth scheme `Bearer` for bearer-style API keys, and use a vendor scheme such as `Token` only when the provider documents that format. `auth_header` is not part of API-key provider config; OAuth and bearer-token providers use that field because their default upstream header is `Authorization`. Bearer token providers are for static, pre-issued access tokens that Caracal does not mint or refresh. The main form only needs the token; Advanced options are for uncommon upstreams that expect a non-default header or a scheme other than `Bearer`. The token is sealed at creation time, list and detail APIs expose only `secret_config_keys`, and Gateway replaces caller-supplied auth before forwarding the provider credential upstream. Provider secrets are sealed at creation time and are not returned by list or detail APIs. Every Gateway resource binds one upstream credential provider, and one provider can be reused by many resources. None providers have no secret and send no credential. Caracal mandate providers have no provider-native secret and no `forward_caracal_identity` setting because the mandate is already the upstream credential. Use Caracal mandate for internal services, partner services, external integrations, or network-level resources that choose to trust Caracal mandates through a verifier. ## Operation authority A resource declares which upstream operations the Gateway may invoke and how strictly that surface is enforced. This authority is native to the platform: the Gateway and STS enforce it directly, so adopters configure data, not policy control flow. | Field | Meaning | | --- | --- | | `operations` | The operations the Gateway may forward, each a `{method, path, scope}` triple. `method` and `path` match the upstream call; `scope` is the Caracal resource scope the mandate must carry for that operation, and must be one of the resource's declared scopes. | | `operation_enforcement` | `enforced` denies any Gateway operation not listed in `operations` and requires its scope on the mandate. `transport_uniform` treats the upstream as a single surface and relies on the mint-time scope check, for protocols such as MCP that address every call through one transport path. | When `operation_enforcement` is `enforced`, the Gateway authorizes only declared operations and denies everything else with `operation_not_permitted` before any upstream call. A resource that is `enforced` with an empty `operations` list is fully closed: every Gateway operation is denied until operations are declared. New resources created through the Control API default to `enforced`, so authority stays closed until you describe it; guided Console setup opens resources as `transport_uniform` because it does not collect per-operation detail, and you tighten them from the resource editor once the operation set is known. Because the floor is enforced in the platform rather than in adopter policy, a policy can further restrict an operation but can never widen authority beyond the declared operations and their scopes. ## Identifier and scope guidance | Field | Good pattern | | --- | --- | | Resource identifier | Stable audience URI, such as `resource://pipernet`; do not use the `provider://` namespace. | | Caracal resource scope | `domain:action`, such as `pipernet:read`. | | Upstream URL | The network URL Gateway can reach; it may change without changing policy audiences. | | Gateway application | The managed Caracal application identity Gateway exchanges as to obtain this resource's upstream credential. It does not control which agents may call through the Gateway — callers are authorized by policy on their own mandate. DCR applications cannot be bound here — they are short-lived single-session credentials, not durable Gateway identities. | | Upstream credential provider | The provider record Gateway uses to attach no credential, a Caracal mandate, OAuth tokens, API keys, or bearer tokens. | | Authorized operation | `{method, path, scope}`, such as `{ "method": "POST", "path": "/api/create_payout", "scope": "pipernet:payout" }`; the scope must be one of the resource's Caracal resource scopes. | | Operation enforcement | `enforced` for path-addressed REST upstreams; `transport_uniform` for single-surface transports such as MCP. | | Provider identifier | `provider://lowercase-slug` stable name for the upstream credential system. | Do not put provider-specific credential details on a resource. Do not use mutable deployment hostnames as policy identifiers unless they are the actual authority boundary. Keep the resource identifier stable even if the upstream URL or upstream credential provider binding changes. :::note[FAQ] [Why must the resource identifier stay stable if the upstream URL can change?](/reference/faq/#faq-010) ::: ## Validate the setup - Open the resource in the Console and confirm scopes are present. - Author and activate a policy that allows the application and subject to request the scopes. - Send a Gateway request and inspect the audit event. For Caracal mandate upstreams, also confirm the resource uses a Caracal verifier or connector. ## Related guides - [Provider Recipes](/guides/provider-recipes/) - [Author Policy Data](/guides/author-policy/) - [Debug Authorization Decisions](/guides/authorize-access/) - [Protect a Gateway-Routed HTTP API](/guides/protect-gateway-http/) - [Activate a Policy Set](/guides/activate-policy-set/) --- # Provider Recipes # URL: https://docs.caracal.run/guides/provider-recipes/ # Markdown: https://docs.caracal.run/markdown/guides/provider-recipes.md # Type: page # Concepts: # Requires: --- This page turns the abstract [provider auth modes](/guides/resources-providers/) into concrete recipes. Each recipe states the enforcement boundary, the provider kind, the exact fields, and the resource binding so you can reproduce a working setup without guessing. Provider recipes live in the documentation, not the Console. The Console stays focused on the fields a given setup needs; this page is the reference you keep open beside it. ## Choose the enforcement boundary first The most common mistake is treating these four boundaries as interchangeable. They are not — they differ in where enforcement happens, who writes the action-result audit, and who holds the upstream credential. | Boundary | Enforcement point | Credential holder | Action-result audit | Use when | | --- | --- | --- | --- | --- | | Gateway-mediated | Caracal Gateway | Gateway (sealed provider secret) | Gateway | The upstream is HTTP-routable and you want Caracal to broker the credential and enforce every call. | | Connector-verified | Your service process | Your service | Your service after connector verification | You own the service and must enforce mandates in process. | | Application-managed call | Your application code | Your application | Your application | You call the provider directly and only attach Caracal context for attribution. Not Caracal-enforced. | | Runtime token injection | Your runtime, eligibility gated by Caracal | Your runtime process | Your application | An existing client or CLI must receive a brokered token at runtime, and the provider sets `allow_runtime_injection=true`. | Prefer **Gateway-mediated** unless you have a specific reason to enforce elsewhere. The recipes below note which boundary they use. ## OpenAI API key (Gateway-mediated) Broker an OpenAI key so the agent never holds it. | Field | Value | | --- | --- | | Provider kind | `api_key` | | `header_name` | `Authorization` | | `auth_scheme` | `Bearer` | | `api_key` | Your OpenAI secret key (sealed at creation). | | Resource identifier | `resource://openai` | | Resource upstream URL | `https://api.openai.com` | | Resource scopes | `openai:chat`, `openai:embeddings` | The Gateway forwards `Authorization: Bearer ` upstream and strips caller auth. The agent calls the Caracal resource, not OpenAI directly. ## Google Workspace (OAuth authorization code) Delegated user consent for a Google API, refreshed inside STS. | Field | Value | | --- | --- | | Provider kind | `oauth2_authorization_code` | | `authorization_endpoint` | `https://accounts.google.com/o/oauth2/v2/auth` | | `token_endpoint` | `https://oauth2.googleapis.com/token` | | `redirect_uri` | Your exact Caracal callback, e.g. `https://api.example.com/v1/zones//provider-grants/oauth/callback` | | `client_id` / `client_secret` | From the Google Cloud OAuth client. | | `scopes` | e.g. `https://www.googleapis.com/auth/drive.readonly` | | `allowed_token_hosts` | `oauth2.googleapis.com` | | `authorization_params` (Advanced) | `access_type=offline`, `prompt=consent` for refresh tokens. | | Resource upstream URL | `https://www.googleapis.com` | Caracal owns `client_id`, `redirect_uri`, `state`, and PKCE. Use the provider `connect` action to mint the consent URL for a user and resource. :::note[FAQ] [When should I use per-user OAuth instead of a shared provider credential?](/reference/faq/#faq-014) ::: ## GitHub (bearer token, with OAuth note) For automation, the simplest path is a pre-issued token. | Field | Value | | --- | --- | | Provider kind | `bearer_token` | | `bearer_token` | A GitHub fine-grained personal access token (sealed). | | `allowed_token_hosts` | `api.github.com` | | Resource identifier | `resource://github` | | Resource upstream URL | `https://api.github.com` | | Resource scopes | `github:repo-read`, `github:issues-write` | For delegated per-user consent instead of a shared token, use `oauth2_authorization_code` with `authorization_endpoint=https://github.com/login/oauth/authorize` and `token_endpoint=https://github.com/login/oauth/access_token`. ## Slack (OAuth client credentials or bearer) For a Slack bot token that does not expire, use a bearer provider. | Field | Value | | --- | --- | | Provider kind | `bearer_token` | | `bearer_token` | The Slack bot/user OAuth token (sealed). | | `allowed_token_hosts` | `slack.com` | | Resource identifier | `resource://slack` | | Resource upstream URL | `https://slack.com/api` | | Resource scopes | `slack:chat-write`, `slack:channels-read` | For rotating machine tokens, use `oauth2_client_credentials` with Slack's token endpoint and `scopes`. ## Internal API (Caracal mandate or none) For services you own, you usually do not need an upstream credential. | Setup | Provider kind | Use when | | --- | --- | --- | | Mandate-aware | `caracal_mandate` | The internal service verifies Caracal tokens directly (issuer, audience, scopes, expiry, revocation). The Gateway forwards the resource mandate as `Authorization: Bearer ...`. | | No upstream credential | `none` | The Gateway is the enforcement point and the upstream expects no credential. | | Field | Value | | --- | --- | | Resource identifier | `resource://internal-billing` | | Resource upstream URL | `https://billing.internal.example` | | Resource scopes | `billing:read`, `billing:submit` | ## Runtime token injection When an existing CLI or client cannot route through the Gateway but still needs a brokered credential, set `allow_runtime_injection=true` on the provider. This is a distinct security boundary: the token is injected into your runtime, and enforcement of the actual call is your application's responsibility, not the Gateway's. Use it deliberately and confirm eligibility with the preflight below. ## Validate before the first call After you create the provider, bind it to the resource, and activate a policy, run [Check Provider Readiness](/examples/provider-preflight/) (`examples/providerPreflight`). It validates control-plane and Gateway readiness, the resource-to-provider binding, application validity, provider configuration completeness, scope coverage, token endpoint and upstream reachability, runtime-injection eligibility, and that the active policy set actually returns `allow` for your application, resource, and scopes — with a concrete remediation for every failure. ## Related - [Define Resources and Providers](/guides/resources-providers/) - [Author Policy Data](/guides/author-policy/) - [Debug Authorization Decisions](/guides/authorize-access/) - [Protect a Gateway-Routed HTTP API](/guides/protect-gateway-http/) --- # Author Policy Data # URL: https://docs.caracal.run/guides/author-policy/ # Markdown: https://docs.caracal.run/markdown/guides/author-policy.md # Type: page # Concepts: # Requires: --- The platform decision contract owns every authorization decision inside the STS. You do not write `result` — you author **policy data documents** that tell the contract which application owns a resource, which roles hold which scopes, and how to confine or restrict authority. This guide writes those documents and validates them before activation. ## Prerequisites - A zone, application, and resource. - Access to the Console or Admin API. - The resource identifier and scopes you want to grant. ## Start from a grant The core document is `grants`: it names the application that owns a resource view and the scopes each role may hold. Pair it with `app_ids`, which binds the application key you use in `grants` to the control-plane id the STS sees as `input.principal.id`. ```text # caracal:data-document package caracal.authz import rego.v1 app_ids := { "pipernet": "app-pipernet", } grants := { "resource://pipernet": { "application": "pipernet", "roles": {"reader": ["pipernet:read", "pipernet:write"]}, }, } ``` This is the canonical "application A may call resource B with scopes C" pattern, expressed as data. The platform contract allows a mint only when the acting application owns the view, the agent's role label grants the scope, and the delegation edge narrows to it. You declare the grant; the contract enforces the narrowing. ## Start from a template You do not have to write each document by hand. Caracal ships a built-in catalog of data-document starters, served at `/v1/policy-templates` and through the Admin SDK: ```ts import { AdminClient } from "@caracalai/admin"; const admin = new AdminClient({ apiUrl: process.env.CARACAL_API_URL!, adminToken: process.env.CARACAL_ADMIN_TOKEN!, }); const templates = await admin.policyTemplates.list(); const starter = await admin.policyTemplates.get("resource-grants"); ``` | Template | Use it for | | --- | --- | | `application-bindings` | Map each application key used in `grants` to its control-plane application id. | | `resource-grants` | Declare the owning application and per-role scope sets for a resource view. | | `label-confinement` | Cap every session carrying a label prefix to a fixed scope set. | | `zone-restriction` | A deny overlay that freezes the zone while an entry is present. | For agent-assisted authoring, use the [policy authoring playbook](https://github.com/Garudex-Labs/caracal/tree/main/playbooks/policies). It guides a coding agent to model the use case as grant, binding, and confinement data inside the documented contract. ## How your data maps to the request The decision contract evaluates a fixed input contract and resolves it against your data. The acting application is the principal — there is no `input.application` or `input.grant` object. See the full [Policy Input Contract](/concepts/policy/#policy-input-contract) for every field. | Input the contract reads | Resolved against | | --- | --- | | `input.principal.id` | `app_ids` — to find the application key used in `grants`. | | `input.principal.labels` | `grants[...].roles` and `confinement` label prefixes. | | `input.resource.identifier` | the top-level key in `grants`. | | `input.context.requested_scopes` | the role's scope set and any matching `confinement` rule. | | `input.delegation_edge.scopes` | the narrowing floor every requested scope must sit inside. | ## Validate before versioning Use the Console policy workflow to paste the document and run validation. For automation, validate through the Admin API or `@caracalai/admin`: ```ts import { AdminClient } from "@caracalai/admin"; const admin = new AdminClient({ apiUrl: process.env.CARACAL_API_URL!, adminToken: process.env.CARACAL_ADMIN_TOKEN!, }); const validation = await admin.policies.validate(policySource); if (!validation.valid) { throw new Error(validation.detail); } ``` Validation enforces the data-document contract: the package must be `caracal.authz`, the first line must carry the `# caracal:data-document` directive, the document must define at least one data rule, and it must **not** define `result`. Validation also checks the schema version, balanced syntax, and forbidden built-ins. Because the platform contract owns the decision, a data document can never authorize on its own. ## Why data, not decisions Authorization logic — delegation narrowing, role and grant checks, label confinement, bootstrap isolation — is identical for every adopter and is the part most dangerous to get wrong. A single typo in a hand-written rule (`if { false }` → `if { true }`) silently turns a deny into an allow-all. Caracal removes that footgun by owning the logic in a signed, versioned platform decision contract and letting you supply only data. Every policy is a data document marked with `# caracal:data-document` on its first line. The document carries only `data` and is **forbidden from defining `result`**, so it can never decide an authorization: ```text # caracal:data-document package caracal.authz import rego.v1 restrict := {} ``` A `restrict` entry can only subtract authority, and `confinement` can only narrow it. Neither can widen what the contract already allows, so a careless data change fails closed. ### Preview how the document parses A successful validation returns a `preview` describing exactly what the engine parsed, so you can confirm the backend reads your data the way you intend before activating it: ```ts const { preview } = await admin.policies.validate(policySource); // preview = { // package: "caracal.authz", // rules: ["app_ids", "grants"], // the data documents you defined // default_result: false, // data documents never define result // decisions: [], // the platform contract owns every decision // inputs_referenced: [], // data_referenced: [], // } ``` Use `rules` to confirm the document defines the data tables you intended. The preview is a static read of the source; for an end-to-end decision run a [simulation](/guides/activate-policy-set/) with representative input against the platform decision contract. ### Preview how the policy behaves A successful validation returns a `preview` describing exactly what the engine parsed, so you can confirm the backend reads your policy the way you intend before activating it: ```ts const { preview } = await admin.policies.validate(policySource); // preview = { // package: "caracal.authz", // rules: ["result"], // default_result: true, // a default deny fallback exists // decisions: ["allow", "deny"], // outcomes the policy can emit // inputs_referenced: ["input.context.requested_scopes", "input.resource.identifier"], // data_referenced: ["data.allowed_scopes"], // } ``` Use `inputs_referenced` to confirm you are gating on the right contract fields, and `data_referenced` to see which external `data` tables you must supply. The preview is a static read of the source; for an end-to-end decision run a [simulation](/guides/activate-policy-set/) with representative input. ## Iterate from a denied request When a real request is denied, you do not have to guess the input. The audit explain endpoint reconstructs a redaction-safe policy input for every denied decision: ```ts const trace = await admin.audit.explain(zoneId, requestId); const input = trace.denied[0]?.policy_input; ``` Feed that `input` straight into [simulation](/guides/activate-policy-set/) against a candidate policy-set version to confirm your fix before activating it. The [Iterate Policy Safely](/examples/policy-iterate/) automates this denial-to-simulation loop. Actor and subject claims are never written to audit, so add any claim-dependent fields to the input before simulating. ## Keep policies reviewable - Default to deny. - Return diagnostics for denies and step-up triggers. - Keep resource identifiers stable and scopes action-oriented. - Keep decision logic in policy rather than duplicating allow tables across rules. - Split policies by ownership only when separate review or activation is useful. Next, [activate the policy in a policy set](/guides/activate-policy-set/). --- # Activate a Policy Set # URL: https://docs.caracal.run/guides/activate-policy-set/ # Markdown: https://docs.caracal.run/markdown/guides/activate-policy-set.md # Type: page # Concepts: # Requires: --- The STS evaluates the active policy-set version for a zone. Activation is explicit so policy changes can be reviewed, simulated, and rolled out without mutating old versions. ## The mental model Four immutable layers separate authoring from what runs: | Layer | What it is | Mutable? | | --- | --- | --- | | Policy | A named Rego document. | The name; content is versioned. | | Policy version | One immutable snapshot of policy content. | No. | | Policy-set version | An immutable bundle of policy versions. | No. | | Active policy-set version | The one bundle the STS evaluates for a zone. | The pointer moves on activation. | You edit by creating a **new** version and moving the active pointer to it. Nothing already evaluated is ever rewritten, so every audit decision ties back to the exact policy-set version and manifest hash that produced it. ## Activation flow ```mermaid flowchart LR Policy["Create policy"] --> Version["Create immutable policy version"] Version --> Set["Add policy-set version"] Set --> Simulate["Simulate"] Simulate --> Activate["Activate"] Activate --> Audit["Audit policy decisions"] ``` ## Console workflow 1. Open `caracal console`. 2. Select `policy` and create or update the Rego policy. 3. Select `policy set` and create a set for the zone. 4. Add the policy version to a new policy-set version. 5. Simulate the version with a representative input. 6. Activate the version when the simulated decision matches the intended behavior. 7. Use `audit` and `request trace` after the first real request. ## Automation workflow ```ts import { AdminClient } from "@caracalai/admin"; const admin = new AdminClient({ apiUrl: process.env.CARACAL_API_URL!, adminToken: process.env.CARACAL_ADMIN_TOKEN!, }); const policy = await admin.policies.create(process.env.CARACAL_ZONE_ID!, { name: "payments-read", content: policySource, }); const set = await admin.policySets.create(process.env.CARACAL_ZONE_ID!, "payments"); const version = await admin.policySets.addVersion(process.env.CARACAL_ZONE_ID!, set.id, [ { policy_version_id: policy.version.id }, ]); const simulation = await admin.policySets.simulate(process.env.CARACAL_ZONE_ID!, set.id, version.id, sampleInput); if (!simulation.would_activate) { throw new Error(simulation.explanation.reason); } await admin.policySets.activate(process.env.CARACAL_ZONE_ID!, set.id, version.id); ``` ## Validation checklist | Check | Expected result | | --- | --- | | Policy validation | `valid: true` and no blocking warnings. | | Simulation | Representative input returns the intended decision. | | Activation | Zone has the expected active policy-set version. | | First exchange | Audit shows the new policy as a determining policy. | If activation changes expected access, keep the old policy-set version ID in the rollout notes so you can promote it again if needed. ## Iterate from real denials Every denied decision links to the policy-set version that produced it, and the audit explain endpoint reconstructs the policy input for that denial. Stage the fix as a new policy-set version, then simulate the denied input against it: ```ts const trace = await admin.audit.explain(zoneId, requestId); const input = trace.denied[0]?.policy_input; const candidate = await admin.policySets.addVersion(zoneId, set.id, manifest); const check = await admin.policySets.simulate(zoneId, set.id, candidate.id, input); if (check.result?.decision === "allow") { await admin.policySets.activate(zoneId, set.id, candidate.id); } ``` The [Iterate Policy Safely example](/examples/policy-iterate/) wraps this loop as a runnable script. --- # Debug Authorization Decisions # URL: https://docs.caracal.run/guides/authorize-access/ # Markdown: https://docs.caracal.run/markdown/guides/authorize-access.md # Type: page # Concepts: # Requires: --- Use this guide when a request is denied, unexpectedly allowed, missing audit evidence, or routed to the wrong protected resource. For production incidents and dependency failures, use [Troubleshoot by Symptom](/operations/troubleshooting/). ## Debug Flow ```mermaid flowchart LR Request["Request ID"] --> Trace["Request trace"] Trace --> Resource["Resource and scopes"] Resource --> Policy["Active policy set"] Policy --> Session["Session and revocation"] Session --> Route["Gateway route"] Route --> Audit["Audit evidence"] ``` ## Start with the Request ID Find the request ID from the SDK error, STS response, Gateway response, Console audit event, or application log. Open Console **request trace** and paste it. The trace should show: - application and execution session; - requested resource and scopes; - policy set version; - determining policies; - diagnostics; - final decision; - Gateway result when the request reached Gateway. If you do not have a request ID, reproduce the request with the generated profile and capture the SDK, STS, or Gateway output. ## Common Authorization Failures | Symptom | Check | Fix | | --- | --- | --- | | Exchange is denied | Active policy does not allow the application, subject, resource, or scopes. | Update the Rego policy, simulate the denied input, and activate a new policy-set version. | | Scope is missing | Resource does not define the scope, or policy allowlist omits it. | Add the scope to the resource and policy deliberately. | | Wrong resource is evaluated | App used the wrong resource ID or Gateway header. | Use the resource ID from Console and `X-Caracal-Resource`. | | Policy change has no effect | New policy version is not in the active policy-set version. | Create and activate a new policy-set version. | | Access continues after revocation | Resource server is not consuming revocation state, or Gateway route is not used. | Confirm Gateway-mediated routing or shared revocation consumers for connectors. | | Gateway returns 403 | Mandate is expired, missing, revoked, or scoped for another resource. | Rerun the workload for a fresh mandate and confirm resource bindings. | | Upstream succeeds without audit | Request bypassed Gateway or connector result audit. | Route through Gateway or emit service-side action-result audit after connector verification. | | No audit event appears | Request never reached STS/Gateway, wrong zone is selected, or audit ingestion is delayed. | Confirm profile, selected zone, and request ID; refresh audit after a short wait. | ## Iterate from a Denial Every denied decision links to the policy-set version that produced it. The audit explain endpoint reconstructs a redaction-safe policy input for denied decisions: ```ts const trace = await admin.audit.explain(zoneId, requestId); const input = trace.denied[0]?.policy_input; ``` Feed that input into [simulation](/guides/activate-policy-set/) against a candidate policy-set version before activating the fix. Actor and subject claims are never written to audit, so add any claim-dependent fields before simulating. ## Authorization Design Rules - Allow only the scopes an application and subject need. - Keep policies resource-specific instead of allowing broad cross-resource access. - Express context-sensitive conditions in policy rather than widening scopes. - Revoke active sessions when a subject leaves a workflow or an application no longer needs a resource. - Use diagnostics for expected denial paths so request traces explain what to change. ## Related Pages - [Define Resources and Providers](/guides/resources-providers/) - [Author Policy Data](/guides/author-policy/) - [Activate a Policy Set](/guides/activate-policy-set/) - [Resources and Grants](/concepts/resource-grant/) - [Policies and Policy Sets](/concepts/policy/) - [Sessions and Revocation](/concepts/sessions-revocation/) --- # Integrate the TypeScript SDK # URL: https://docs.caracal.run/guides/sdk-typescript/ # Markdown: https://docs.caracal.run/markdown/guides/sdk-typescript.md # Type: page # Concepts: # Requires: --- Use `@caracalai/sdk` in Node applications that need agent sessions, delegation, Gateway routing, or Caracal context propagation. ## Install ```bash npm install @caracalai/sdk ``` ## Configure The SDK resolves configuration in this order: 1. `new Caracal({ clientSecret: ... })` explicit options. 2. `new Caracal({ configPath })`. 3. `CARACAL_CONFIG` or the default generated profile. 4. Environment variables. The Console guided setup can write the profile and secret file for you. ## Connect and spawn ```ts import { Caracal } from "@caracalai/sdk"; const caracal = new Caracal(); await caracal.spawn(async () => { const headers = await caracal.headersAsync(); await fetch("https://api.example.com/tickets", { headers, }); }); ``` `spawn()` creates an agent session, binds the Caracal context while the callback runs, and terminates the session when the callback exits. To make agents distinguishable in policy and audit, pass `labels`. These become `input.principal.labels`, so several agents under one application stay separable without one application per agent. `labels` are descriptive, for policy and audit, not grants; authority always comes from scopes and delegation. The session lifecycle is handled for you: `spawn()` records a `task` session, while `spawnService()` records a heartbeat-leased `service`. ```ts await caracal.spawn(async () => { const headers = await caracal.headersAsync(); await fetch("https://api.example.com/tickets", { headers }); }, { labels: ["refund-agent"] }); ``` See [If many agents share one managed application, can policy and audit still tell them apart?](/reference/faq/#faq-008). ## Long-lived service agents Daemons and workers that outlive a single request use `spawnService()` instead of `spawn()`. It returns a handle you own: keep the session alive by calling `heartbeat()` on a timer and retire it with `close()`. Each `heartbeat()` **renews the lease** — it extends the deadline the coordinator measures liveness against; it does not merely check status. A service session is reaped by the coordinator only if it stops heartbeating before its lease expires. Unlike a `task`, a `service` is **not** subject to the wall-clock TTL sweeper; its lifetime is governed entirely by the lease, so a faithfully heartbeated service runs until you close it. ```ts const svc = await caracal.spawnService({ labels: ["billing-worker"] }); try { while (running) { await svc.heartbeat(); await doWork(await caracal.headersAsync()); await sleep(30_000); } } finally { await svc.close(); } ``` When your code spends long stretches awaiting a provider (a streaming response, a slow tool), renewing the lease from the same loop can starve. Pass `heartbeatIntervalMs` to renew the lease from an independent timer, so the lease stays current even while your code is blocked on a long `await`: ```ts const svc = await caracal.spawnService({ labels: ["voice-worker"], heartbeatIntervalMs: 30_000 }); try { for await (const chunk of stream(await caracal.headersAsync())) { // lease renews in the background await handle(chunk); } } finally { await svc.close(); } ``` Auto-heartbeat is opt-in (omit `heartbeatIntervalMs` to renew manually). Transient renewal errors are logged and retried on the next tick rather than crashing the worker. A renewal cannot run while the event loop is blocked synchronously; in that case the lease correctly lapses, which is the liveness signal working as intended. ## Spawn a narrowed child ```ts import { Grant } from "@caracalai/sdk"; await caracal.spawn(async () => { await caracal.spawn(async () => { const headers = await caracal.headersAsync(); await fetch("https://api.example.com/tickets", { headers }); }, { grant: Grant.narrow(["tickets:read"], { resourceId: "https://api.example.com/tickets", constraints: { maxHops: 1, budget: 5 }, ttlSeconds: 600, }), }); }); ``` A plain `spawn()` runs the child under its parent's effective authority — the application's authority for a root parent, or the parent's narrowed slice when the parent was itself narrowed (transitive least-privilege). Pass `grant: Grant.narrow(...)` only when the child should hold a smaller subset, or `Grant.none()` for a child with no inherited authority. Use `delegate()` when you need to grant authority to an agent session that already exists, typically in another application. ## Route through the Gateway ```ts const request = caracal.gatewayRequest("https://api.example.com/tickets", "/tickets"); await fetch(request.url, { headers: { ...request.headers, ...(await caracal.headersAsync({ allowRoot: true })), }, }); ``` For provider SDKs that accept a custom `fetch`, pass `caracal.transport()` so Caracal headers and Gateway routing are applied automatically. ## Troubleshooting | Symptom | Check | | --- | --- | | `Caracal.fromEnv: missing ...` | Confirm generated profile or required `CARACAL_*` variables are present. | | Root headers rejected | Call `headersAsync({ allowRoot: true })` only when service-root identity is intentional. | | Delegation fails | Ensure the call runs inside `spawn()` or another bound context. | | Gateway request misses resource | Confirm `gateway_url`, resource bindings, and `X-Caracal-Resource`. | Related pages: [Add SDK to Your App](/get-started/add-sdk-to-your-app/) and [Implement Multi-Agent Delegation](/guides/delegation/). --- # Integrate the Python SDK # URL: https://docs.caracal.run/guides/sdk-python/ # Markdown: https://docs.caracal.run/markdown/guides/sdk-python.md # Type: page # Concepts: # Requires: --- Use `caracalai-sdk` in Python services and agents that need async session lifecycle, delegation, Gateway routing, or ASGI context propagation. ## Install ```bash pip install caracalai-sdk ``` ## Connect ```python from caracalai import Caracal caracal = Caracal() ``` `Caracal()` loads `CARACAL_CONFIG`, the default generated profile, or environment variables. Generated profiles are created by Console guided setup. ## Spawn and call a resource ```python import httpx from caracalai import Caracal caracal = Caracal() async with caracal.spawn(): headers = caracal.headers() async with httpx.AsyncClient() as client: await client.get("https://api.example.com/tickets", headers=headers) ``` `spawn()` is an async context manager. It binds the agent context inside the block and releases the session when the block exits. To make agents distinguishable in policy and audit, pass `labels` when you spawn. These become `input.principal.labels`, so several agents under one application stay separable without one application per agent. `labels` are descriptive, for policy and audit, not grants; authority always comes from scopes and delegation. The session lifecycle is handled for you: `spawn()` retires the session when the block exits and records it as a `task`, while `spawn_service()` records a heartbeat-leased `service`. ```python async with caracal.spawn(labels=["refund-agent"]): headers = caracal.headers() ``` See [If many agents share one managed application, can policy and audit still tell them apart?](/reference/faq/#faq-008). ## Long-lived service agents Daemons and workers that outlive a single request use `spawn_service()` instead of `spawn()`. It returns a handle you own: keep the session alive by calling `heartbeat()` on a timer and retire it with `aclose()`. Each `heartbeat()` **renews the lease** — it extends the deadline the coordinator measures liveness against; it does not merely check status. A service session is reaped by the coordinator only if it stops heartbeating before its lease expires. Unlike a `task`, a `service` is **not** subject to the wall-clock TTL sweeper; its lifetime is governed entirely by the lease, so a faithfully heartbeated service runs until you close it. ```python svc = await caracal.spawn_service(labels=["billing-worker"]) try: while running: await svc.heartbeat() await do_work(caracal.headers()) await asyncio.sleep(30) finally: await svc.aclose() ``` Service handles take the same `grant=` as `spawn()`: pass `Grant.narrow(...)` to issue a delegation edge that bounds the service's authority for its whole lifetime, exactly as a narrowed task child. The handle's context carries the edge, and closing the handle retires the session. ```python svc = await caracal.spawn_service( labels=["billing-worker"], grant=Grant.narrow(["ledger:read"], resource_id="resource://ledger"), ) ``` When the main coroutine spends long stretches awaiting a provider (a streaming response, a slow tool), renewing the lease from the same loop can starve. Pass `heartbeat_interval` to renew the lease from an independent background task, so the lease stays current even while your code is blocked on a long `await`: ```python svc = await caracal.spawn_service(labels=["voice-worker"], heartbeat_interval=30) try: async for chunk in stream(caracal.headers()): # lease renews in the background await handle(chunk) finally: await svc.aclose() ``` Auto-heartbeat is opt-in (omit `heartbeat_interval` to renew manually). It renews even while your code awaits, and transient renewal errors are logged and retried on the next tick rather than crashing the worker. A renewal cannot run while the event loop is blocked synchronously (CPU-bound work with no `await`); in that case the lease correctly lapses, which is the liveness signal working as intended. ## Spawn a narrowed child ```python from caracalai import Grant, DelegationConstraints async with caracal.spawn() as parent: async with caracal.spawn( parent_ctx=parent, grant=Grant.narrow( ["tickets:read"], resource_id="https://api.example.com/tickets", constraints=DelegationConstraints(max_hops=1, budget=5), ttl_seconds=600, ), ): headers = caracal.headers() ``` A plain `caracal.spawn()` runs the child under its parent's effective authority — the application's authority for a root parent, or the parent's narrowed slice when the parent was itself narrowed (transitive least-privilege). Pass `grant=Grant.narrow(...)` only when the child should hold a smaller subset, or `grant=Grant.none()` for a child with no inherited authority. Use `async with caracal.bind(ctx)` before handing a captured context to a background task. ## Use httpx transport injection ```python async with caracal.spawn(): async with caracal.transport() as client: await client.get("https://api.example.com/tickets") ``` The transport injects Caracal envelope headers and rewrites configured resource-bound URLs through the Gateway. ## ASGI propagation After a verifier boundary, use the SDK middleware to bind an inbound Caracal envelope into request context: ```python from fastapi import FastAPI from caracalai import Caracal caracal = Caracal() app = FastAPI() app.add_middleware(caracal.context_middleware()) ``` The middleware propagates context. It does not verify JWT signatures or revocation; use a connector or transport verifier at the trust boundary. ## Troubleshooting | Symptom | Check | | --- | --- | | `Caracal.from_env: missing ...` | Confirm profile or required environment variables. | | `headers()` refuses root identity | Bind a context or pass `allow_root=True` only for trusted service-root calls. | | Background task loses context | Capture the context and rebind with `async with caracal.bind(ctx)`. | | Gateway routing misses | Confirm resource bindings and `gateway_url`. | Related pages: [Protect a FastMCP App](/guides/protect-fastmcp/) and [Run an Agent with caracal run](/guides/runtime-run/). --- # Integrate the Go SDK # URL: https://docs.caracal.run/guides/sdk-go/ # Markdown: https://docs.caracal.run/markdown/guides/sdk-go.md # Type: page # Concepts: # Requires: --- Use the Go SDK in services and agents that need Caracal context propagation, agent sessions, delegation, and Gateway-aware HTTP clients. ## Install ```bash go get github.com/garudex-labs/caracal/packages/sdk/go ``` ## Connect ```go package main import ( "context" "log" caracal "github.com/garudex-labs/caracal/packages/sdk/go" ) func main() { client, err := caracal.New() if err != nil { log.Fatal(err) } if err := client.Spawn(context.Background(), func(ctx context.Context) error { headers, err := client.Headers(ctx) if err != nil { return err } _ = headers return nil }); err != nil { log.Fatal(err) } } ``` `New()` loads `CARACAL_CONFIG`, the default generated profile, or environment variables. ## Use an HTTP client ```go httpClient := client.Transport(nil) err := client.Spawn(context.Background(), func(ctx context.Context) error { req, err := http.NewRequestWithContext(ctx, http.MethodGet, "https://api.example.com/tickets", nil) if err != nil { return err } _, err = httpClient.Do(req) return err }) ``` The transport injects Caracal envelope headers from `context.Context`. For explicit Gateway routing, use `GatewayRequest()`. ## Long-lived service agents Daemons and workers that outlive a single request use `Service` instead of `Spawn`. It returns a handle you own: keep the session alive by calling `Heartbeat` on a timer and retire it with `Close`. A service session is reaped by the coordinator if it stops heartbeating before its lease expires. ```go svc, err := client.SpawnService(context.Background(), caracal.ServiceOptions{ Labels: []string{"billing-worker"}, }) if err != nil { return err } defer svc.Close(context.Background()) for running { if err := svc.Heartbeat(context.Background()); err != nil { return err } doWork(svc.Context) time.Sleep(30 * time.Second) } ``` ## Spawn a narrowed child ```go err := client.Spawn(context.Background(), func(ctx context.Context) error { return client.Spawn(ctx, func(child context.Context) error { _, _ = client.Current(child) return nil }, caracal.SpawnOptions{ Grant: caracal.Grant{ Mode: caracal.GrantModeNarrow, Scopes: []string{"tickets:read"}, Constraints: &caracal.DelegationConstraints{MaxHops: 1, Budget: 5}, TTLSeconds: 600, }, }) }) ``` A plain `client.Spawn` runs the child under the application's authority. Set `SpawnOptions.Grant` to `caracal.GrantNarrow(...)` only when the child should hold a least-privilege subset, or `caracal.GrantNone()` for a child with no inherited authority. Use `Current(ctx)` to inspect the bound Caracal context and `Headers(ctx)` to project it to outbound HTTP headers. ## Troubleshooting | Symptom | Check | | --- | --- | | Missing config error | Confirm generated profile or required environment variables. | | `Headers` without context fails | Call inside `Spawn`, `Delegate`, or pass `RootOptions{AllowRoot: true}` intentionally. | | Delegation fails | Ensure delegation runs from a context with an active agent session. | | Gateway URL error | Confirm the generated profile includes `gateway_url`. | Related pages: [Protect a Go net/http Service](/guides/protect-nethttp/) and [Agent Delegation](/concepts/delegation/). --- # Run an Agent with caracal run # URL: https://docs.caracal.run/guides/runtime-run/ # Markdown: https://docs.caracal.run/markdown/guides/runtime-run.md # Type: page # Concepts: # Requires: --- `caracal run` starts a local subprocess with Caracal runtime environment variables and profile-backed credentials. Use it for development, demos, and controlled local runs. ## Prepare a profile 1. Run `caracal up`. 2. Open `caracal console`. 3. Select `guided setup`. 4. Fill in the upstream URL, resource identifier, runtime profile, profile path, secret file, and first request path. 5. Enable **write profile files**. 6. Save the generated profile and secret file. The generated profile is the source for SDK client constructors and `caracal run`. ## Run a command ```bash caracal run -- npm start ``` The child process receives the Caracal variables needed to locate the profile, exchange credentials, and route configured resources through the Gateway. ## Validate the run | Check | Command or surface | | --- | --- | | Runtime is ready | `caracal status --ready` | | Profile is readable | `caracal run -- env | grep CARACAL` | | First request succeeds | Use the guided setup first request path. | | Audit captured the request | Console `audit` or `request trace`. | ## Troubleshooting | Symptom | Fix | | --- | --- | | SDK cannot find config | Confirm `CARACAL_CONFIG` points to the generated profile. | | Secret file rejected | Restrict file permissions and avoid setting both inline and file secrets. | | Gateway routing misses the resource | Confirm resource bindings and upstream prefixes in the profile. | | Request denied | Check policy-set activation, scopes, and audit diagnostics. | Related pages: [First Protected Call](/get-started/first-protected-call/), [Add SDK to Your App](/get-started/add-sdk-to-your-app/), and [Audit and Request Traces](/concepts/audit-ledger/). --- # Protect a Gateway-Routed HTTP API # URL: https://docs.caracal.run/guides/protect-gateway-http/ # Markdown: https://docs.caracal.run/markdown/guides/protect-gateway-http.md # Type: page # Concepts: # Requires: --- Use Gateway-routed protection when the upstream is HTTP-routable and you want Caracal to enforce every request before it reaches the target. The agent receives a short-lived Caracal mandate; Gateway verifies it, resolves the resource route, attaches the configured upstream credential when needed, forwards the request, and writes action-result audit. ## When to Use Gateway | Use Gateway when | Use another guide when | | --- | --- | | The upstream is HTTP, REST, MCP-over-HTTP, or another Gateway-routable target. | The service must verify mandates inside its own process. | | You want provider credentials to stay inside the trusted Gateway/STS boundary. | The application must call the provider directly and only needs attribution. | | You want action-result audit from the central Gateway. | The resource server owns result audit after connector verification. | For in-process enforcement, use [Protect an Express App](/guides/protect-express/), [Protect a FastMCP App](/guides/protect-fastmcp/), [Protect a Go net/http Service](/guides/protect-nethttp/), or [Protect an MCP Server](/guides/protect-mcp/). ## Create or Select the Resource Open the Console: ```sh caracal console ``` Create or select a resource with: | Field | Guidance | | --- | --- | | Resource identifier | Stable audience URI, such as `resource://billing-api`. | | Scopes | Action-oriented scopes, such as `billing:read` and `billing:submit`. | | Upstream URL | Network URL the Gateway can reach. | | Gateway application | The Caracal application identity Gateway uses for upstream exchanges. | | Upstream credential provider | The provider record Gateway uses to attach no credential, a Caracal mandate, OAuth token, API key, or bearer token. | Keep the resource identifier stable even when upstream hosts or provider bindings change. ## Choose Provider Auth Mode | Provider type | Use when | | --- | --- | | None | Gateway enforces Caracal access and the upstream expects no credential. | | Caracal mandate | The upstream verifies Caracal mandates directly. | | OAuth client credentials | Gateway needs a machine OAuth token. | | OAuth authorization code | Gateway needs per-user delegated OAuth consent. | | API key | Gateway attaches a sealed static API key in a configured header. | | Bearer token | Gateway attaches a sealed pre-issued bearer token. | Use [Define Resources and Providers](/guides/resources-providers/) for field definitions and [Provider Recipes](/guides/provider-recipes/) for concrete OpenAI, Google, GitHub, Slack, and internal API examples. ## Route Calls Through Gateway Gateway-routed requests include: | Request part | Value | | --- | --- | | URL | Gateway URL from the generated profile or Console result view. | | `Authorization` | `Bearer `. | | `X-Caracal-Resource` | Resource identifier. | | Path | The protected path you configured for the upstream. | SDKs can build the Gateway request and inject headers for you: - TypeScript: `caracal.gatewayRequest()` and `caracal.transport()` - Python: `caracal.gateway_request()` and `caracal.transport()` - Go: `GatewayRequest()` and `Transport()` ## Verify Authorization and Audit Run these checks before treating the route as ready: | Check | Expected result | | --- | --- | | Allowed request | Gateway forwards to upstream and records action-result audit. | | Missing mandate | Gateway rejects the request. | | Missing scope | STS or Gateway rejects the request. | | Wrong resource header | Gateway rejects or routes to a different configured resource. | | Revoked session | Gateway rejects the old mandate. | Open Console **audit** and **request trace** with the request ID. A complete Gateway-routed call has both authorization evidence from STS and action-result evidence from Gateway. ## Troubleshooting | Symptom | Check | | --- | --- | | Gateway returns 403 | Token expiry, resource ID, scopes, revocation, and `X-Caracal-Resource`. | | Upstream is unreachable | Upstream URL must be reachable from the Gateway container or deployment. | | Provider credential not attached | Resource must bind exactly one upstream credential provider. | | Authorization audit exists but action-result audit is missing | The request did not reach Gateway or failed before route execution. | Related pages: [Debug Authorization Decisions](/guides/authorize-access/) and [Tail and Query the Audit Stream](/guides/audit-stream/). --- # Protect an Express App # URL: https://docs.caracal.run/guides/protect-express/ # Markdown: https://docs.caracal.run/markdown/guides/protect-express.md # Type: page # Concepts: # Requires: --- Use `@caracalai/mcp-express` when an Express app should verify Caracal mandates before route handlers run. ## Install ```bash npm install express @caracalai/mcp-express @caracalai/transport-mcp @caracalai/revocation ``` Use a Redis-backed revocation store in production. The in-memory store is only suitable for local development and tests. ## Add middleware ```ts import express from "express"; import { caracalAuth, type CaracalRequest } from "@caracalai/mcp-express"; import { createMandateVerifier } from "@caracalai/transport-mcp"; import { InMemoryRevocationStore } from "@caracalai/revocation"; const app = express(); const verifier = createMandateVerifier({ issuer: process.env.CARACAL_ISSUER!, audience: process.env.CARACAL_AUDIENCE!, zoneId: process.env.CARACAL_ZONE_ID!, revocations: new InMemoryRevocationStore(), }); await verifier.warmup(); app.use("/mcp", caracalAuth({ verifier }, { requiredScopes: ["mcp:tool:call"], requiredTargets: ["https://mcp.example.com"], })); app.post("/mcp/tools/list", (req: CaracalRequest, res) => { res.json({ principal: req.caracalClaims?.sub, tools: ["search", "summarize"], }); }); ``` The middleware attaches verified claims to `req.caracal` and `req.caracalClaims`, plus the propagation context at `req.caracalContext`. ## Enforce the right boundary | Option | Use it for | | --- | --- | | `requiredScopes` | Tool or route-level scope checks. | | `requiredTargets` | Resource-target checks. | | `requireAgent` | Reject non-agent mandates. | | `requireDelegation` | Require delegated authority. | | `maxHopCount` | Limit delegation depth. | ## Validate 1. Exchange for a mandate that targets the resource. 2. Call the protected route with `Authorization: Bearer `. 3. Remove a required scope and confirm the route returns `403`. 4. Revoke the session and confirm the route rejects the old mandate. Related pages: [Mandates](/concepts/mandate/) and [Sessions and Revocation](/concepts/sessions-revocation/). --- # Protect a FastMCP App # URL: https://docs.caracal.run/guides/protect-fastmcp/ # Markdown: https://docs.caracal.run/markdown/guides/protect-fastmcp.md # Type: page # Concepts: # Requires: --- Use `caracalai-mcp-fastmcp` when a Python FastMCP app should reject invalid, expired, insufficient, or revoked Caracal mandates. ## Install ```bash pip install "caracalai-mcp-fastmcp[fastmcp]" caracalai-revocation ``` Use `caracalai-revocation-redis` for shared production revocation. The in-memory store is only for local development and tests. ## Create an authenticator ```python from caracalai_mcp_fastmcp import CaracalAuth from caracalai_revocation import InMemoryRevocationStore auth = CaracalAuth( issuer="https://sts.example.com", audience="https://mcp.example.com", expected_zone_id="zone_prod", required_scopes=["mcp:tool:call"], required_targets=["https://mcp.example.com"], revocations=InMemoryRevocationStore(), require_agent=True, ) async def startup(): await auth.warmup() ``` ## Verify before running a tool ```python from caracalai_mcp_fastmcp import CaracalAuthError async def handle_tool_call(token: str, payload: dict): try: claims = await auth(token) except CaracalAuthError as err: return {"error": err.code, "error_description": err.description, "error_hint": err.hint} return { "subject": claims.sub, "result": await run_tool(payload), } ``` Wire this check into the FastMCP auth or request hook used by your server. The important boundary is that the mandate is verified before the tool handler performs work. ## Validate | Test | Expected result | | --- | --- | | Missing bearer token | `missing_token` or framework 401. | | Wrong audience | `invalid_token`. | | Missing scope | `insufficient_scope`. | | Revoked session | `session_revoked`. | | Non-agent token with `require_agent=True` | `agent_required`. | Related pages: [Protect an MCP Server](/guides/protect-mcp/) and [Step-Up Re-Authentication](/guides/step-up/). --- # Protect a Go net/http Service # URL: https://docs.caracal.run/guides/protect-nethttp/ # Markdown: https://docs.caracal.run/markdown/guides/protect-nethttp.md # Type: page # Concepts: # Requires: --- Use the Go net/http connector when a Go service should verify Caracal mandates at the handler boundary. ## Install ```bash go get github.com/garudex-labs/caracal/packages/connectors/nethttp/go go get github.com/garudex-labs/caracal/packages/revocation/go ``` ## Wrap a handler ```go package main import ( "encoding/json" "net/http" "time" mcpnethttp "github.com/garudex-labs/caracal/packages/connectors/nethttp/go" revocation "github.com/garudex-labs/caracal/packages/revocation/go" transportmcp "github.com/garudex-labs/caracal/packages/transport/mcp/go" ) func main() { revocations := revocation.NewInMemoryStore(24 * time.Hour) verifier := transportmcp.NewVerifier(transportmcp.Options{ Issuer: "https://sts.example.com", Audience: "https://api.example.com", ZoneID: "zone_prod", Revocations: revocations, }) protected := mcpnethttp.VerifierMiddleware(verifier.Require(transportmcp.Options{ RequiredScopes: []string{"tickets:read"}, RequiredTargets: []string{"https://api.example.com/tickets"}, }))(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { claims, ok := mcpnethttp.ClaimsFromContext(r.Context()) if !ok { http.Error(w, "missing claims", http.StatusUnauthorized) return } _ = json.NewEncoder(w).Encode(map[string]string{"subject": claims.Sub}) })) http.Handle("/tickets", protected) _ = http.ListenAndServe(":8080", nil) } ``` ## Enforce constraints | Option | Use it for | | --- | --- | | `RequiredScopes` | Route or operation permission. | | `RequiredTargets` | Resource target matching. | | `RequireAgent` | Agent-only endpoints. | | `RequireDelegation` | Delegated-only endpoints. | | `RequireChainContains` | Application path requirements. | | `MaxHopCount` | Delegation depth limit. | ## Production revocation The in-memory store does not share revocations across instances. Use the Redis revocation connector and consume `caracal.sessions.revoke` for production resource servers. ## Validate 1. Call without a bearer token and expect `401`. 2. Call with a valid mandate and expect the handler response. 3. Remove a required scope and expect `403`. 4. Mark the session revoked and expect `session_revoked`. Related pages: [Mandates](/concepts/mandate/) and [Sessions and Revocation](/concepts/sessions-revocation/). --- # Protect an MCP Server # URL: https://docs.caracal.run/guides/protect-mcp/ # Markdown: https://docs.caracal.run/markdown/guides/protect-mcp.md # Type: page # Concepts: # Requires: --- Use the MCP transport packages when your MCP framework does not have a dedicated Caracal connector or when you want to build your own boundary. ## TypeScript verifier ```bash npm install @caracalai/transport-mcp @caracalai/revocation ``` ```ts import { createMandateVerifier } from "@caracalai/transport-mcp"; import { InMemoryRevocationStore } from "@caracalai/revocation"; const verifier = createMandateVerifier({ issuer: "https://sts.example.com", audience: "https://mcp.example.com", zoneId: "zone_prod", revocations: new InMemoryRevocationStore(), }); export async function verifyToolRequest(authorization: string | undefined) { const result = await verifier.authorization(authorization, { requiredScopes: ["mcp:tool:call"], requiredTargets: ["https://mcp.example.com"], requireAgent: true, }); if (!result.ok) { throw new Error(`${result.error.code}: ${result.error.description}`); } return result.principal; } ``` ## Python verifier ```bash pip install caracalai-transport-mcp caracalai-revocation ``` ```python from caracalai_transport_mcp import authenticate, extract_bearer from caracalai_revocation import InMemoryRevocationStore revocations = InMemoryRevocationStore() async def verify_tool_request(authorization: str | None): token = extract_bearer(authorization) result = await authenticate( token or "", issuer="https://sts.example.com", audience="https://mcp.example.com", required_scopes=["mcp:tool:call"], expected_zone_id="zone_prod", revocations=revocations, require_agent=True, required_targets=["https://mcp.example.com"], ) if result.error is not None: raise RuntimeError(f"{result.error.code}: {result.error.description}") return result.principal ``` ## Verification checklist | Check | Why it matters | | --- | --- | | Issuer and audience | Prevents accepting mandates from the wrong zone or target. | | Required scopes | Enforces tool-level authority. | | Required targets | Prevents cross-resource token reuse. | | Agent or delegation requirements | Keeps user-root and delegated calls separate. | | Revocation store | Rejects revoked sessions and delegation edges. | ## Production revocation Use Redis-backed revocation packages for multi-instance MCP servers: - TypeScript: `@caracalai/revocation-redis` - Python: `caracalai-revocation-redis` - Go: `github.com/garudex-labs/caracal/packages/connectors/redis/go` Run a consumer for the `caracal.sessions.revoke` stream so every resource server instance learns about revoked anchors. Related pages: [Protect an Express App](/guides/protect-express/) and [Protect a FastMCP App](/guides/protect-fastmcp/). --- # Tail and Query the Audit Stream # URL: https://docs.caracal.run/guides/audit-stream/ # Markdown: https://docs.caracal.run/markdown/guides/audit-stream.md # Type: page # Concepts: # Requires: --- Use audit when a request is denied, step-up is triggered, delegation behaves unexpectedly, or an operator needs evidence for a run. ## Console workflow 1. Open `caracal console`. 2. Select `audit`. 3. Filter by zone, decision, event type, or request ID. 4. Open the event detail to inspect metadata, determining policies, and diagnostics. 5. Select `request trace` and provide the request ID for a decision trace. ## Automation workflow ```ts import { AdminClient } from "@caracalai/admin"; const admin = new AdminClient({ apiUrl: process.env.CARACAL_API_URL!, adminToken: process.env.CARACAL_ADMIN_TOKEN!, }); const events = await admin.audit.list(process.env.CARACAL_ZONE_ID!, { decision: "deny", limit: 25, }); const trace = await admin.audit.explain(process.env.CARACAL_ZONE_ID!, events[0].request_id!); console.log(trace.final_decision, trace.denied); ``` ## Useful filters | Filter | Use it for | | --- | --- | | Request ID | Stitch together STS, Gateway, connector, and explain events. | | Decision | Find denies, allows, or partial decisions. | | Event type | Focus on policy, session, delegation, step-up, or admin changes. | | Time window | Investigate a run, incident, or rollout. | ## What to capture in incidents - request ID; - zone ID; - application ID; - resource identifier; - requested scopes; - final decision; - determining policies; - diagnostics; - session or delegation edge IDs when present. Related page: [Audit and Request Traces](/concepts/audit-ledger/). --- # Implement Multi-Agent Delegation # URL: https://docs.caracal.run/guides/delegation/ # Markdown: https://docs.caracal.run/markdown/guides/delegation.md # Type: page # Concepts: # Requires: --- Use delegation when one agent needs to hand a narrower slice of authority to another agent. The SDK creates the edge and carries the context; the Console shows the graph and impact. `spawn()` always creates the child under the **same application** as the parent — `spawn()` never moves a child into a different application. Use a narrowing grant only when the child should hold *less* than the parent. To hand authority across applications, use `delegate()` against a peer session that already exists under that other application. ## Implementation flow ```mermaid flowchart LR Parent["Parent agent session"] --> Spawn["Spawn child with narrowing grant"] Spawn --> Edge["Bounded delegation edge"] Edge --> Child["Run child with delegated context"] Child --> Exchange["Exchange for resource mandate"] Exchange --> Audit["Inspect audit and graph"] ``` ## TypeScript example ```ts import { Caracal, Grant } from "@caracalai/sdk"; const caracal = new Caracal(); await caracal.spawn(async () => { await caracal.spawn(async () => { const headers = await caracal.headersAsync(); await fetch("https://api.example.com/tickets", { headers }); }, { grant: Grant.narrow(["tickets:read"], { resourceId: "https://api.example.com/tickets", constraints: { maxHops: 1, budget: 5 }, ttlSeconds: 600, }), }); }); ``` ## Review the graph 1. Open `caracal console`. 2. Select `delegation`. 3. Inspect active edges, inbound edges, outbound edges, and traversal. 4. Use impact before revoking an edge. 5. Check `audit` for the delegated exchange and resource decision. ## Safe constraints | Constraint | Good default | | --- | --- | | Scopes | Small subset of parent authority. | | TTL | Minutes, not days. | | Hop count | `1` unless a deeper graph is intentional. | | Budget | Explicit request or work limit. | | Resource | Single resource whenever possible. | ## Troubleshooting | Symptom | Check | | --- | --- | | `Delegate requires an active agent session` | Call delegation inside `spawn()` or a bound Caracal context. | | Resource denies delegated request | Confirm policy accepts the edge constraints and required scopes. | | Chain validation fails | Confirm the expected application appears in the delegation path. | | Revocation did not affect child | Confirm cascade revocation and resource-server revocation consumers. | Related pages: [Agent Delegation](/concepts/delegation/) and [Delegation Constraints](/concepts/constraint/). --- # Step-Up Re-Authentication # URL: https://docs.caracal.run/guides/step-up/ # Markdown: https://docs.caracal.run/markdown/guides/step-up.md # Type: page # Concepts: # Requires: --- Step-up happens when policy denies an exchange with a diagnostic that asks for fresh proof. The STS returns `interaction_required` and a challenge ID instead of a mandate. ## Flow ```mermaid flowchart LR Exchange["Exchange for sensitive resource"] --> Error["interaction_required"] Error --> Proof["Complete external proof"] Proof --> Satisfy["Satisfy challenge"] Satisfy --> Retry["Retry exchange with challenge proof"] Retry --> Mandate["Mandate issued"] ``` ## TypeScript handling ```ts import { OAuthClient, InteractionRequiredError } from "@caracalai/oauth"; const oauth = new OAuthClient( process.env.CARACAL_STS_URL!, process.env.CARACAL_ZONE_ID!, process.env.CARACAL_APPLICATION_ID!, ); try { await oauth.exchange(subjectToken, "resource://pipernet", { scopes: ["pipernet:refund"], }); } catch (error) { if (error instanceof InteractionRequiredError) { console.log(error.challengeId, error.acrValues); } else { throw error; } } ``` ## Satisfy the challenge Use the Console for human approval or the Admin API when an external proof system has already completed. The Admin API supports challenge inspection and satisfaction under the zone step-up challenge routes. ```bash curl -X POST \ "$CARACAL_API_URL/v1/zones/$CARACAL_ZONE_ID/step-up-challenges/$CHALLENGE_ID/satisfy" \ -H "Authorization: Bearer $CARACAL_ADMIN_TOKEN" ``` The approver is recorded as the authenticated admin actor (`admin:`); it is never supplied in the request body, so approval attribution cannot be forged. The challenge cannot be self-approved by its own session subject. ## Retry exchange Retry token exchange only after the challenge is satisfied. The STS validates the challenge proof and returns a mandate when policy and session checks pass. ## Troubleshooting | Symptom | Check | | --- | --- | | No `challenge_id` | Confirm the policy emits a `step_up_required` diagnostic. | | Challenge cannot be satisfied | Confirm it has not expired, was not consumed, and is not self-approved. | | Retry still denied | Inspect `request trace`; policy may require a different resource, scope, or challenge type. | | Too many attempts | Wait for the STS step-up throttle window or investigate repeated proof failures. | Related pages: [Step-Up Challenges](/concepts/step-up/) and [Author Policy Data](/guides/author-policy/). --- # Production Integration Patterns # URL: https://docs.caracal.run/guides/enterprise-runtime-patterns/ # Markdown: https://docs.caracal.run/markdown/guides/enterprise-runtime-patterns.md # Type: page # Concepts: # Requires: --- Production deployments usually combine the Gateway, connectors, SDKs, revocation consumers, audit export, and policy automation. Keep the same authority model, then place enforcement where it best fits the system. ## Pick an Enforcement Pattern | Pattern | Use when | Primary guide | | --- | --- | --- | | Gateway-routed HTTP | You can route traffic through Caracal before it reaches the upstream. | [Protect a Gateway-Routed HTTP API](/guides/protect-gateway-http/) | | In-process connector | The resource server owns verification inside its framework. | [Protect an Express App](/guides/protect-express/), [Protect a FastMCP App](/guides/protect-fastmcp/), [Protect a Go net/http Service](/guides/protect-nethttp/) | | MCP transport | You need framework-neutral MCP bearer verification. | [Protect an MCP Server](/guides/protect-mcp/) | | SDK transport injection | Provider SDK accepts custom fetch, transport, or HTTP client hooks. | [SDK guides](/guides/) | | Batch or queue worker | Work runs outside request/response but still needs scoped mandates. | [Run an Agent with caracal run](/guides/runtime-run/) | ## Reference Architecture ```mermaid flowchart LR Agent["Agent runtime"] --> SDK["SDK"] SDK --> STS["STS"] STS --> OPA["Policy set"] SDK --> Gateway["Gateway"] Gateway --> API["HTTP upstream"] SDK --> Connector["In-process connector"] Connector --> MCP["MCP tools"] STS --> Audit["Audit ledger"] Gateway --> Audit Connector --> Audit Audit --> SIEM["SIEM / warehouse"] Control["Console / Admin API"] --> STS Control --> Gateway ``` ## Production Checklist | Area | Requirement | | --- | --- | | Zones | Separate production, staging, and customer trust boundaries. | | Keys | Rotate zone signing keys and verify JWKS refresh behavior. | | Profiles | Generate runtime profiles through guided setup; keep secret files locked down. | | Revocation | Use shared revocation stores and stream consumers, not process-local memory. | | Policy | Validate, simulate, version, activate, and audit policy-set changes. | | Audit | Export decisions and diagnostics to operational monitoring or SIEM. | | Step-up | Route sensitive approvals through an external proof system or reviewer workflow. | ## Integration Notes - For FastAPI or ASGI services, use the Python SDK context middleware for propagation and the MCP transport primitives for mandate verification. - For gRPC, carry Caracal headers or metadata through the gateway/client interceptor layer and verify mandates at the service edge. - For service mesh deployments, keep Caracal authority at the application layer; mesh identity does not replace resource scopes or policy decisions. - For provider SDKs, use the SDK `transport()` or language equivalent when the provider accepts a custom HTTP transport. ## Validate After Rollout Run a successful request, a denied request, a revoked-session request, and a step-up request. Confirm each one has an audit trail and that operators can explain the final decision from the Console. --- # Understand the Model # URL: https://docs.caracal.run/concepts/ # Markdown: https://docs.caracal.run/markdown/concepts.md # Type: page # Concepts: # Requires: --- Caracal gives agents short-lived, policy-approved authority instead of long-lived secrets. Read this section after your first integration when you need the vocabulary behind guides, SDKs, operations pages, and API reference material. ## Read This Section in Order | Start here | Use it to understand | | --- | --- | | [Caracal Mental Model](/concepts/model-overview/) | The smallest useful picture of Caracal. | | [Authority and Enforcement](/concepts/authority-model/) | Where decisions happen before requests reach a target. | | [Zones](/concepts/zone/) | The tenant boundary that owns keys, policies, resources, sessions, and audit. | | [Identities and Applications](/concepts/principal/) | The identities that authenticate, run, and delegate. | | [Resources and Grants](/concepts/resource-grant/) | What can be accessed and which scopes are granted. | | [Policies and Policy Sets](/concepts/policy/) | Rego rules evaluated by the STS during token exchange. | | [Step-Up Challenges](/concepts/step-up/) | How sensitive actions require fresh approval. | | [Mandates](/concepts/mandate/) | The short-lived JWT that carries approved authority. | | [Agent Delegation](/concepts/delegation/) | How one agent passes bounded authority to another. | | [Delegation Constraints](/concepts/constraint/) | The limits attached to delegated authority. | | [Sessions and Revocation](/concepts/sessions-revocation/) | How active authority is ended and propagated. | | [Audit and Request Traces](/concepts/audit-ledger/) | The event trail behind decisions and runs. | ## Core Flow ```mermaid flowchart LR Principal["Principal or agent"] --> SDK["SDK / Gateway request"] SDK --> STS["STS token exchange"] STS --> Policy["Active policy set"] Policy --> Mandate["Mandate JWT"] Mandate --> Gateway["Gateway or connector"] Gateway --> Resource["Protected resource"] STS --> Audit["Audit ledger"] Gateway --> Audit ``` The same model appears across the product: - Onboarding uses the Console guided setup to create the first zone, application, resource, policy, and runtime profile. - Guides use the SDKs, Console, Admin API, and connectors to build repeatable integrations. - Operations pages use the same terms when explaining keys, revocation, audit, and runtime health. ## Term Map | Term | Short definition | Canonical page | | --- | --- | --- | | Zone | Tenant boundary for authority data and signing keys. | [Zones](/concepts/zone/) | | Application | Registered client or agent workload. | [Identities and Applications](/concepts/principal/) | | Principal | User, service, or agent identity bound to a session. | [Identities and Applications](/concepts/principal/) | | Resource | Protected API, MCP server, tool group, or upstream target. | [Resources and Grants](/concepts/resource-grant/) | | Grant | Binding from an application and user to resource scopes. | [Resources and Grants](/concepts/resource-grant/) | | Policy | Versioned Rego logic evaluated at token exchange. | [Policies and Policy Sets](/concepts/policy/) | | Policy set | Versioned bundle of policies activated for a zone. | [Policies and Policy Sets](/concepts/policy/) | | Mandate | Short-lived JWT issued by the STS after policy approval. | [Mandates](/concepts/mandate/) | | Delegation edge | Bounded authority transfer between agent sessions. | [Agent Delegation](/concepts/delegation/) | | Revocation anchor | Session, root session, agent session, or delegation edge checked by resource servers. | [Sessions and Revocation](/concepts/sessions-revocation/) | ## What to Read Next After the concepts, use [Guides](/guides/) for task-focused procedures or [SDKs](/sdks/) for language-specific reference. --- # Caracal Mental Model # URL: https://docs.caracal.run/concepts/model-overview/ # Markdown: https://docs.caracal.run/markdown/concepts/model-overview.md # Type: page # Concepts: # Requires: --- Caracal answers one question: **should this principal or agent receive scoped authority for this resource right now?** It answers that question during token exchange, records the decision, and returns a mandate only when the active policy set allows the request. ```mermaid flowchart TD Zone["Zone"] --> App["Application"] Zone --> Resource["Resource"] Zone --> PolicySet["Active policy set"] App --> Session["Session"] Session --> Exchange["Token exchange"] Resource --> Exchange PolicySet --> Exchange Exchange -->|"allow"| Mandate["Mandate"] Exchange -->|"deny / step-up"| Audit["Audit event"] Mandate --> Audit ``` ## Six Core Nouns | Noun | What it means | | --- | --- | | Zone | The tenant boundary for configuration, signing keys, policy, sessions, and audit. | | Application | A registered client, service, or agent workload. | | Principal | The acting identity, such as a user, service, or agent. | | Resource | The protected target: API, MCP server, tool group, or upstream service. | | Policy | Rego logic that evaluates the requested exchange. | | Mandate | The short-lived JWT that proves authority to a Gateway or connector. | Most of these are configured directly. A *grant* is a permission binding for resource scopes that policy can read during evaluation; it is not the mandate a resource server verifies. ## One Application, Many Agents An application and an agent are different layers, and they scale differently: - An **application** is the credentialed security boundary — operator-provisioned or dynamically registered, holding the secret Caracal authenticates. It is created deliberately. - An **agent session** is the runtime unit — created the moment a workload acts, with no secret and no registration step. One application backs many agent sessions. A durable workload registers **one managed application**, then spawns, delegates, and fans out as many agent sessions as it needs under that single credential. You do not create an application per agent; per-agent attribution comes from the agent session, not a new application. See [Should I create one application per agent?](/reference/faq/#faq-006). ## Three Runtime Verbs | Verb | Meaning | | --- | --- | | Exchange | Ask the STS to convert existing identity into a resource mandate. | | Spawn | Open an agent session derived from a subject session. | | Delegate | Pass constrained authority from one agent session to another. | ## One Decision Point The STS is the decision point. It evaluates the active policy set with the principal, session, resource, scopes, grant, delegation edge, and step-up context. If policy allows the request, the STS signs a mandate. If policy denies the request, no mandate is issued. Resource servers still verify mandates locally through the Gateway or connectors. That keeps every request protected even after token exchange succeeds. ## Why This Model Matters - Long-lived provider secrets stay out of agents. - Authority expires quickly and can be revoked. - Delegation carries typed constraints instead of informal trust. - Every allow, deny, step-up, and revocation path is auditable. Next, read [Authority and Enforcement](/concepts/authority-model/) to see where each enforcement layer fits. --- # Authority and Enforcement # URL: https://docs.caracal.run/concepts/authority-model/ # Markdown: https://docs.caracal.run/markdown/concepts/authority-model.md # Type: page # Concepts: # Requires: --- Caracal separates **granting authority** from **using authority**. - The STS grants authority by issuing a short-lived mandate after policy evaluation. - The Gateway or connector uses authority by verifying the mandate before forwarding a request or running a tool. ## Enforcement Flow ```mermaid sequenceDiagram participant Client as Agent or app participant STS as STS participant OPA as Active policy set participant Guard as Gateway or connector participant Resource as Protected resource participant Audit as Audit ledger Client->>STS: Exchange subject token for resource scopes STS->>OPA: Evaluate principal, session, grant, resource, delegation OPA-->>STS: allow, deny, diagnostics, or step-up requirement STS->>Audit: Record decision alt allowed STS-->>Client: Mandate JWT Client->>Guard: Request with mandate Guard->>Guard: Verify signature, claims, expiry, scopes, revocation Guard->>Resource: Forward approved request Guard->>Audit: Record resource decision else denied or step-up required STS-->>Client: Error or interaction_required challenge end ``` ## Control Layers | Layer | Responsibility | Source of truth | | --- | --- | --- | | Zone | Owns keys, policies, resources, sessions, and audit data. | Console or Admin API | | Grant | Declares which application and user may request resource scopes. | Console or Admin API | | Policy set | Makes the final allow, deny, or step-up decision. | Rego policy versions | | Mandate | Carries approved authority as a short-lived signed JWT. | STS | | Gateway or connector | Verifies mandate claims and revocation before use. | Runtime and resource server config | | Audit ledger | Records decisions, diagnostics, and request correlation. | API, Console, and storage | ## Default Posture Caracal should be treated as deny-by-default: 1. A resource must be registered. 2. A principal must have an applicable grant or delegation path. 3. The active policy set must allow the requested resource and scopes. 4. The resource server must verify the mandate and revocation anchors. Missing configuration, invalid signatures, expired mandates, insufficient scopes, revoked sessions, and failed delegation checks all stop the request. ## Gateway and Connector Roles Use the Gateway when you want Caracal to front an HTTP upstream and mediate requests centrally. Use a connector when the resource server should verify mandates inside its own framework, such as Express, FastMCP, net/http, or MCP transport. Both patterns share the same authority model. The difference is only where mandate verification runs. ## Enforce, Propagate, or Attribute The SDK exposes several call paths. They are not interchangeable: each does a different job, and only some **enforce** authority. Pick by what you need at that boundary. | Call path | Role | Verifies the mandate? | Use when | | --- | --- | --- | --- | | `caracal.fetch` / `gateway_request` through the Gateway | **Enforce** | Yes — the Gateway verifies claims and revocation before forwarding | Caracal fronts an HTTP upstream and mediates centrally | | Direct Gateway call (`curl` to the Gateway URL) | **Enforce** | Yes — same Gateway verification | Non-SDK clients or debugging the enforced path | | Connector verify (`context_middleware(verifier=…)`, Express, FastMCP, net/http, MCP transport) | **Enforce** | Yes — the connector verifies inside the resource server | The resource server should enforce in its own framework | | `caracal.transport` / `context_middleware()` (no verifier) | **Propagate** | No — carries and binds the envelope only | A Gateway or connector already enforced upstream and you only need context to flow | | App-managed provider call (your code calls the provider SDK directly) | **Attribute** | No — Caracal records who acted, but does not gate the call | You want audit attribution without routing through the Gateway | Rule of thumb: **use the Gateway or a connector verifier to enforce; use transport/propagation to carry identity after enforcement; treat app-managed provider calls as attribution only.** If a path says "no" under *verifies the mandate*, something upstream must have already enforced it. ## Next Step Read [Zones](/concepts/zone/) to understand the tenant boundary that owns authority data. ## Related Pages - [Policies and Policy Sets](/concepts/policy/) explains the decision contract. - [Mandates](/concepts/mandate/) explains the issued token. - [Sessions and Revocation](/concepts/sessions-revocation/) explains how active authority ends. --- # Zones # URL: https://docs.caracal.run/concepts/zone/ # Markdown: https://docs.caracal.run/markdown/concepts/zone.md # Type: page # Concepts: # Requires: --- A zone is the main isolation boundary in Caracal. It groups the configuration and runtime state needed to decide, issue, verify, revoke, and audit authority. ## What a Zone Owns | Area | Zone-owned data | | --- | --- | | Identity | Applications, credentials, subject sessions, and agent sessions. | | Authorization | Resources, providers, grants, policies, policy sets, and step-up challenges. | | Cryptography | Zone signing keys and JWKS used to verify mandates. | | Delegation | Delegation edges, constraints, depth limits, and cascade revocation state. | | Audit | Decision events, diagnostics, request IDs, and explain traces. | ## Why Zones Exist Zones let teams run separate environments, tenants, or trust domains without mixing authority data. Common zone boundaries include: - production, staging, and development environments; - separate customers in a hosted deployment; - isolated product areas with different policy owners; - high-sensitivity resources that need distinct keys and audit trails. When several boundaries could apply at once, use [Model Your Application in Caracal](/guides/modeling-recipes/) to choose between a separate zone, a shared zone with separate resources, and a customer attribute in policy input. To serve many of your own customers from one zone, follow [Serve Your Own Customers](/guides/serve-customers/). :::note[FAQ] [What should a zone represent?](/reference/faq/#faq-003) ::: ## Zone Lifecycle ```mermaid flowchart LR Create["Create zone"] --> Register["Register applications and resources"] Register --> Policy["Author and activate policy set"] Policy --> Run["Issue mandates and run agents"] Run --> Audit["Review audit and explain traces"] Audit --> Tune["Update grants, policy, and resources"] Tune --> Run ``` Zone setup is normally managed through the Console. The Admin API exposes the same objects for automation. ## Key and Policy Isolation Each zone has its own signing-key and JWKS context. Resource servers verify mandates against the issuer, audience, and expected zone. Policy activation is also zone-scoped: activating a policy set in one zone does not affect another zone. ## Dynamic Client Registration A zone gates whether workloads may self-register short-lived [DCR applications](/concepts/principal/#managed-and-dcr-applications). DCR is **off by default**; an operator turns it on per zone (`dcr_enabled`). While enabled, a control-plane workload holding an admin token can mint auto-expiring application identities through the zone DCR endpoint — Console never creates them. Disabling DCR on a zone that still has live DCR applications requires an explicit decision, because turning the gate off does not by itself revoke identities already issued: | Choice | Effect | | --- | --- | | Keep live | Blocks new registrations; existing DCR applications stay valid until their own expiry. | | Revoke live | Blocks new registrations and immediately archives live DCR applications, revoking their sessions and terminating related agent access. | The Console prompts for this choice when you disable DCR with live applications, and the Admin API takes the same decision through the `dcr_shutdown` field on a zone patch (`zones.patch(id, { dcr_enabled: false, dcr_shutdown: 'keep_live' | 'revoke_live' })`). Use `zones.dcrStatus(id)` to read the live DCR application count before deciding. ## Operational Guidance - Keep production and non-production authority in separate zones. - Name zones after the trust boundary, not a single service. - Keep resource identifiers stable because policies, grants, and audit traces refer to them. - Rotate zone signing keys using the operations workflow, then confirm resource servers load the current JWKS. ## Next Step Read [Identities and Applications](/concepts/principal/) to understand who acts inside a zone. ## Related Pages - [Resources and Grants](/concepts/resource-grant/) - [Model Your Application in Caracal](/guides/modeling-recipes/) - [Audit and Request Traces](/concepts/audit-ledger/) --- # Identities and Applications # URL: https://docs.caracal.run/concepts/principal/ # Markdown: https://docs.caracal.run/markdown/concepts/principal.md # Type: page # Concepts: # Requires: --- A principal is the acting identity. An application is the registered client or workload that authenticates and requests authority for that principal. ## Principal Types | Principal | Typical source | How Caracal sees it | | --- | --- | --- | | User | Login, workforce identity, or upstream IdP. | Subject session with user-bound claims. | | Service | Workload credential or client secret. | Application-bound session or exchange subject. | | Agent | Spawned runtime session. | Agent session with parent and delegation context. | Principals are not enough on their own. A request also needs an application, session, resource, scopes, and policy approval. ## Application Roles Applications represent software that can participate in Caracal flows: - an agent runtime that spawns child agents; - a backend service that requests mandates; - a Gateway application that fronts protected upstreams; - a connector-protected resource server; - a managed or dynamically registered client. Applications have registration metadata, a server-owned token credential, and a registration method. Managed applications are operator-provisioned for known durable software, including the runtime that spawns and fans out child agents. Dynamically registered applications are short-lived, isolated identities created through the zone DCR endpoint when a separate auto-expiring credential boundary is needed — for example a per-tenant or per-integration identity — rather than for ordinary spawn fan-out. ## Applications Are the Credential Boundary; Agents Are the Runtime Unit ```mermaid flowchart TD App["Managed application
(credential boundary)"] --> S1["Agent session"] App --> S2["Agent session"] App --> S3["Agent session"] S2 --> C1["Spawned child session"] S3 --> C2["Delegated agent session"] ``` An application is registered, holds a server-owned secret, and is the identity Caracal authenticates. An agent session is created at runtime by the workload that already holds that secret; it costs nothing to create and carries the parent, subject, labels, and delegation context. One application backs many agent sessions, so a durable workload uses **one managed application** and spawns, delegates, and fans out as many agent sessions as it needs. You do not register an application per agent — see [Should I create one application per agent?](/reference/faq/#faq-006). ## Managed and DCR Applications | Method | Identity boundary | Operational rule | | --- | --- | --- | | Managed | One durable service, orchestrator, Gateway, or agent runtime — including the runtime that spawns and fans out child agents. | Created in the Console (or Admin API) and reused across many agent sessions for the same workload. Spawned children run as `agent` sessions under this same application. | | DCR | One short-lived, isolated application identity (per tenant, per integration, or per externally-issued ephemeral identity). | Created **programmatically only** through the zone DCR endpoint (Admin API / admin SDK `applications.dcr()`), never in the Console; always expires, binds to exactly one agent session, and is not a spawn parent for further sessions. The Console lists DCR applications read-only. | The coordinator records each agent session's `lifecycle`, which is either `task` (the default for every `spawn()`) or `service` (a durable, heartbeat-leased handle created by [`service()`](/reference/sdk-python/#service)). The value names the runtime lifecycle, not the actor — every runtime actor is an agent session; `lifecycle` only says whether it runs task-scoped or as a heartbeat-leased service. The credential boundary between durable managed identities and short-lived DCR identities is named by `registration_method` (managed vs DCR), not by lifecycle — a DCR application cannot host a `service` session, so its single session is always a `task` session. `lifecycle` carries one behavioral distinction: a `service` session holds a heartbeat lease and is swept when that lease lapses, while a `task` session is task-scoped — it is retired when its work completes (optionally bounded by a TTL). The two are governed by **different reapers**: a `task` is subject to the wall-clock TTL sweeper, and a `service` is **not** — its lifetime is governed solely by the heartbeat lease, so a service that keeps renewing its lease is never terminated on a timer. `lifecycle` does not change what an agent may do or which authority it carries. The spawn topology enforces two structural rules so a child can never outlast or out-scope what its position allows: - A `task` parent cannot spawn a `service` child (`task_agent_cannot_spawn_service`). A task is bounded — it ends with its work or its TTL — so it must not mint a durable, lease-renewable session that could outlive it. A `service` parent may spawn either lifecycle. - A DCR session is a **leaf**: it cannot be a spawn parent (`dcr_application_cannot_spawn`) and cannot be a spawned child (`dcr_application_cannot_be_child`). A DCR application is an independently launched, single-session workload identity, not a node in a spawn tree. A **short-lived worker is not a separate lifecycle** — it is an ordinary `task` session created with a TTL. The mainstream tree of a long-lived orchestrator, mid-tier managers, and disposable task workers is modelled under **one managed application**: - the orchestrator is the durable root session (or a [`service()`](/reference/sdk-python/#service) handle when it needs a heartbeat lease), - each manager is a plain `spawn()` that inherits the application's authority, - each task worker is `spawn(grant=Grant.narrow([...]), ttl_seconds=…)` — least-privilege and auto-terminated on block exit, with the TTL sweeper as a backstop. Every node in that tree is a `task` session under the one managed app, distinguished by its `agent_session_id`, `labels`, and delegation context. "Disposable worker" is expressed by the TTL and the narrowed grant, not by a separate lifecycle. DCR is therefore for credential isolation, not for spawn fan-out: it gives a unit of work its own independently revocable, auto-expiring **application** identity, not a shorter-lived agent. A DCR application is reached by **authenticating as it**, not by spawning. The control plane registers the DCR application through the zone DCR endpoint (an Admin API action that returns a one-time secret and a short expiry); the SDK never registers applications. An orchestrator injects those credentials into an independently launched workload, and that workload — configured with the DCR `application_id` and secret — authenticates with the `client_credentials` grant and creates its single root session. The application then expires and is archived. Use this when a unit of work needs its own isolated, independently revocable, auto-expiring identity, such as a per-tenant or per-integration runtime. :::note[FAQ] [What is the difference between an application, principal, and agent session?](/reference/faq/#faq-005) and [when should I use a managed application versus DCR?](/reference/faq/#faq-007) ::: Disabling DCR on a zone has two separate operational meanings. The zone flag always stops future dynamic registration immediately. If live DCR applications already exist, operators must choose whether those identities continue until expiry or are revoked immediately. Revocation archives the live DCR applications, revokes related authority sessions and delegation anchors, and terminates related DCR agent access so STS, Gateway, and Coordinator paths stop accepting those identities. Policies receive `input.principal.registration_method`, `input.principal.agent_session_id`, `input.principal.lifecycle`, and `input.principal.labels`, so policy authors can distinguish durable managed clients from short-lived DCR clients through `registration_method` without relying on SDK-specific behavior. Audit metadata records the application id, application name, registration method, agent session id, parent/delegation context, requested scopes, and resource for each token-exchange decision. ## Telling Agent Sessions Apart Every agent session has one canonical identity: the server-minted `agent_session_id`. It is unique, returned to the caller the moment `spawn()` (or `service()`) creates the session, and stamped onto every token exchange and audit event that session produces. That id — not the application, lifecycle, or labels — is the answer to "which agent did this?". `labels` are a descriptor, not an identity. Many sessions under one application can share the same labels on purpose: that is how fan-out works, where a hundred `["pricing-worker"]` sessions are intentionally interchangeable. When sessions must be told apart by meaning rather than by raw id, give them distinguishing `labels`, attach business correlation in `metadata`, or propagate a `trace_id`. Two sessions that share the same application, labels, and metadata are fungible by design — the unique `agent_session_id` still separates them in audit. The zone audit endpoint filters directly on these fields: query `agent_session_id` to follow one exact session, or `label` to scope to a role across a fleet of sessions. ## Sessions Bind Identity to Time ```mermaid flowchart TD App["Application"] --> Session["Subject session"] Principal["Principal"] --> Session Session --> Spawn["Agent session"] Spawn --> Delegate["Delegated agent session"] Session --> Mandate["Mandate exchange"] Spawn --> Mandate Delegate --> Mandate ``` Sessions make authority revocable. A mandate contains session anchors, and resource servers check those anchors through the revocation layer. ## Naming Guidance - Use **application** for registered software. - Use **principal** for the acting identity. - Use **agent session** for a spawned agent execution context. - Use **subject session** for the original authenticated subject context. - Avoid using "client" unless you are describing OAuth protocol fields. ## Next Step Read [Resources and Grants](/concepts/resource-grant/) to understand what identities can request. ## Related Pages - [Sessions and Revocation](/concepts/sessions-revocation/) - [Agent Delegation](/concepts/delegation/) - [Integrate the TypeScript SDK](/guides/sdk-typescript/) --- # Resources and Grants # URL: https://docs.caracal.run/concepts/resource-grant/ # Markdown: https://docs.caracal.run/markdown/concepts/resource-grant.md # Type: page # Concepts: # Requires: --- A resource is something Caracal protects. A grant is a configured permission binding that says an application and user may request specific scopes for that resource. ## Resources Resources describe protected targets such as: - HTTP APIs behind the Gateway; - MCP servers and tool groups; - internal services protected by Express, FastMCP, or net/http connectors; - provider-backed targets that need credential mediation. | Resource field | Purpose | | --- | --- | | Identifier | Stable policy and token audience target. Always use the `resource://` convention, such as `resource://pipernet`; keep it stable even when the upstream URL changes. | | Upstream URL | Gateway forwarding target. | | Scopes | Named Caracal resource actions that policies and mandates can constrain. | | Gateway application | Managed application identity used by Gateway-mediated resources. | | Upstream credential provider | Resource binding to the provider record used when Gateway attaches no credential, a Caracal mandate, OAuth tokens, API keys, or bearer tokens. | :::note[FAQ] [What is the difference between a resource and a provider?](/reference/faq/#faq-009) and [why must the resource identifier stay stable?](/reference/faq/#faq-010) ::: ## Grants A grant binds: 1. a zone; 2. an application; 3. a user or subject; 4. a resource; 5. one or more scopes. Grants are not the final decision. They are one input to policy. The active policy set can still deny, require step-up, or constrain the exchange. ## Exchange Relationship ```mermaid flowchart LR App["Application"] --> Grant["Grant"] User["User / subject"] --> Grant Resource["Resource"] --> Grant Grant --> STS["STS policy input"] Policy["Active policy set"] --> STS STS -->|"allow"| Mandate["Mandate with resource and scopes"] ``` ## Scope Design Prefer small, action-oriented scopes: | Good | Avoid | | --- | --- | | `payments:read` | `admin` | | `tickets:comment` | `write_all` | | `mcp:tool:call` | `tools` | Use resource identifiers for targets and scopes for actions. Do not encode environment, tenant, or user identity into scope names when those belong in the zone, principal, or policy input. :::note[FAQ] [How should I design scopes?](/reference/faq/#faq-011) and [do I manage grants directly?](/reference/faq/#faq-012) ::: ## Next Step Read [Policies and Policy Sets](/concepts/policy/) to understand how grants become allow, deny, or step-up decisions. ## Related Pages - [Define Resources and Providers](/guides/resources-providers/) - [Model Your Application in Caracal](/guides/modeling-recipes/) - [Debug Authorization Decisions](/guides/authorize-access/) --- # Policies and Policy Sets # URL: https://docs.caracal.run/concepts/policy/ # Markdown: https://docs.caracal.run/markdown/concepts/policy.md # Type: page # Concepts: # Requires: --- The platform decision contract decides whether the STS may issue a mandate. The contract is a signed, versioned Rego module that Caracal owns and embeds in the STS; it is the only thing that emits an authorization decision. You author **policy data documents** — grants, bindings, confinement, and restrictions — that the contract reads. Policies are versioned immutably, bundled into policy sets, and activated per zone. ## Policy Objects | Object | Purpose | | --- | --- | | Policy | Named policy data document with immutable versions. | | Policy version | A specific content hash and schema version. | | Policy set | A named bundle of policy versions. | | Policy set version | A specific manifest of policy versions. | | Active policy set version | The version the STS evaluates for a zone. | ## Decision Contract The platform decision contract owns the `result` object in `package caracal.authz`. It is deny by default and only allows an exchange that matches one of its vetted rules: a bootstrap mandate, a delegated mint narrowed to the delegation edge, or a mandate presented at the Gateway. You never write `result`; you supply the data the contract reads. ```text # caracal:data-document package caracal.authz import rego.v1 grants := { "resource://mercury-bank": { "application": "payments", "roles": {"payment-execution": ["payments:read", "payments:write"]}, }, } ``` The contract reads four adopter documents from `package caracal.authz`: | Document | Shape | Effect | | --- | --- | --- | | `app_ids` | `{binding_key: application_id}` | Binds the application key used in `grants` to the control-plane id the STS sees as `input.principal.id`. | | `grants` | `{resource: {application, roles: {role: [scopes]}}}` | Declares which application owns a resource view and which scopes each role may hold. | | `confinement` | `[{label_prefix, scopes}]` | Caps every agent session whose principal carries a matching label prefix to a fixed scope set. Optional; omitting it confines nothing. | | `restrict` | set of reasons | Deny overlay. Any entry denies every exchange in the zone. Optional; keep it empty to authorize normally. | A zone that supplies no data authorizes nothing: every allow rule collapses to the default deny. Example input values the contract evaluates: | Input | Example | | --- | --- | | `input.principal.id` | `app_lynx_control` (the acting application, user, or agent) | | `input.principal.registration_method` | `managed` or `dcr` | | `input.resource.identifier` | `resource://mercury-bank` | | `input.context.requested_scopes` | `["payments:read", "payments:write"]` | | `input.delegation_edge` | Present when authority was delegated from another agent session. | ## Policy Input Contract Every evaluation receives this exact input shape (`schema_version` `2026-05-20`). There is no `input.application` or `input.grant` object — the acting application is the principal, and grant bounds are enforced by the STS before the contract runs. The platform decision contract evaluates the fields below; your data documents map applications, roles, and scopes onto them. | Path | Type | Meaning | | --- | --- | --- | | `input.principal.type` | string | `Application`, user, service, or agent principal class. | | `input.principal.id` | string | Stable identifier of the acting application/user/agent. | | `input.principal.zone_id` | string | Zone the principal belongs to. | | `input.principal.registration_method` | string | `managed` or `dcr`; match `dcr` to scope dynamically registered workloads. | | `input.principal.lifecycle` | string | `task` or `service` for agent sessions. | | `input.principal.labels` | string[] | Role descriptors set at `spawn(labels=[...])`; drive least-privilege role checks. | | `input.principal.agent_session_id` | string | Agent session that initiated the exchange, when present. | | `input.resource.type` | string | Resource class, e.g. `Resource`. | | `input.resource.id` | string | Internal resource id. | | `input.resource.identifier` | string | Stable audience URI, e.g. `resource://mercury-bank`. | | `input.resource.scopes` | string[] | Scopes the resource defines (the full allowlist). | | `input.action.id` | string | The exchange action, currently `TokenExchange`. | | `input.action.method` | string | Upstream HTTP method (upper-cased) the gateway is authorizing. Present only on gateway-authenticated exchanges; absent on direct exchanges. Gate operation authority against this. | | `input.action.path` | string | Upstream request path the gateway is authorizing. Present only on gateway-authenticated exchanges; absent on direct exchanges. | | `input.session.id` | string | Present when the request is bound to a session. | | `input.delegation_edge` | object | Present only on delegated exchange (see below). | | `input.context.requested_scopes` | string[] | Scopes the caller is requesting now; gate these against the allowlist. | | `input.context.actor_claims` | object | Verified claims of the acting credential. | | `input.context.subject_claims` | object | Verified claims of the delegated subject, when present. | | `input.context.challenge_resolved` | bool | `true` after a step-up challenge is satisfied. | | `input.context.trace_id` | string | Request trace id for audit correlation. | When `input.delegation_edge` is present it carries: `id`, `source_session_id`, `target_session_id`, `issuer_application_id`, `receiver_application_id`, `resource_id`, `scopes` (the narrowed grant), `edge_version`, `path` (the full delegation chain), `graph_epoch`, and `constraints_json` (typed delegation constraints such as a cost budget). The contract enforces hop membership and the scope-subset narrowing against these fields. The contract reads your data documents through `data.caracal.authz.*` — `grants`, `app_ids`, `confinement`, and `restrict`. You supply that data; the platform contract reads it and decides. ## Policy Outcomes | Outcome | Effect | | --- | --- | | `allow` | STS signs a mandate if token and session checks also pass. | | `deny` | STS refuses the exchange and records diagnostics. | | step-up diagnostic | STS returns `interaction_required` with a challenge ID. | Step-up is expressed as a diagnostic such as `{"step_up_required": "mfa"}`. The STS creates a challenge and expects a later exchange with the challenge proof. ## Authoring Rules - Author data documents only; mark each with `# caracal:data-document`. The platform decision contract owns every `result`. - Validate through the Console or Admin API before activation — a data document must define data and must not define `result`. - Activate through a policy set version, not by editing active content in place. - Keep `grants`, `app_ids`, `confinement`, and `restrict` in separate documents so ownership and review stay clear. - Tighten authority through `confinement` and `restrict`; neither can widen what the contract already allows. ## Next Step Read [Step-Up Challenges](/concepts/step-up/) to understand how policies pause sensitive exchanges for fresh proof. ## Related Pages - [Author Policy Data](/guides/author-policy/) - [Activate a Policy Set](/guides/activate-policy-set/) --- # Step-Up Challenges # URL: https://docs.caracal.run/concepts/step-up/ # Markdown: https://docs.caracal.run/markdown/concepts/step-up.md # Type: page # Concepts: # Requires: --- Step-up is Caracal's way to pause a token exchange and require fresh proof before issuing a mandate for a sensitive resource. Policy triggers step-up by returning a diagnostic such as `{"step_up_required": "mfa"}`. The STS converts that diagnostic into an `interaction_required` error with a challenge ID. ## Step-Up Flow ```mermaid sequenceDiagram participant App as App or agent participant STS as STS participant Policy as Active policy set participant Control as Console or Admin API App->>STS: Exchange for sensitive resource STS->>Policy: Evaluate request Policy-->>STS: deny with step_up_required diagnostic STS-->>App: interaction_required with challenge_id Control->>Control: Complete external proof Control->>STS: Mark challenge satisfied App->>STS: Retry exchange with challenge proof STS-->>App: Mandate ``` ## Components | Component | Responsibility | | --- | --- | | Policy | Decides when step-up is required. | | STS | Creates the challenge, throttles failed attempts, and verifies challenge proof during retry. | | Console or Admin API | Lists, inspects, and satisfies challenges after an external proof step. | | SDK or OAuth client | Surfaces `interaction_required` so the application can guide the user or operator. | ## Challenge Lifecycle | State | Meaning | | --- | --- | | Created | STS issued a challenge for a specific zone, session, resource set, and challenge type. | | Satisfied | A different approver or external proof completed the requirement. | | Consumed | STS accepted the proof during retry and issued the mandate. | | Expired or invalid | The challenge can no longer be used. | ## Design Guidance - Use step-up for high-risk resources, sensitive scopes, or unusual context. - Keep the proof step outside policy; policy should decide that proof is needed, not perform the proof. - Prevent self-approval for sensitive challenges. - Include enough diagnostics for the Console and audit views to explain the requirement. - Retry token exchange only after the challenge is satisfied. ## Next Step Read [Mandates](/concepts/mandate/) to understand what the STS issues after an exchange is allowed or satisfied. ## Related Pages - [Step-Up Re-Authentication](/guides/step-up/) - [Policies and Policy Sets](/concepts/policy/) - [Audit and Request Traces](/concepts/audit-ledger/) --- # Mandates # URL: https://docs.caracal.run/concepts/mandate/ # Markdown: https://docs.caracal.run/markdown/concepts/mandate.md # Type: page # Concepts: # Requires: --- A mandate is the token Caracal issues after the STS approves an exchange. It is a short-lived JWT signed with the zone signing key and verified by the Gateway or resource connectors. ## What a Mandate Proves A valid mandate proves: - which zone issued it; - which application and principal are acting; - which session anchors are active; - which resource targets and scopes were approved; - whether authority came from an agent session or delegation edge; - when the authority expires. ## Issuance Path ```mermaid flowchart LR Subject["Subject token or agent context"] --> Exchange["OAuth token exchange"] Exchange --> Policy["Policy evaluation"] Policy -->|"allow"| Mandate["Mandate JWT"] Mandate --> Verify["Gateway or connector verification"] Verify --> Resource["Protected resource"] ``` ## Mandate Use | Use | Verification focus | | --- | --- | | Gateway request | Issuer, audience, zone, resource, scopes, expiry, revocation. | | MCP tool call | Bearer token, required scopes, required targets, agent/delegation constraints. | | SDK outbound call | Context propagation and mandate header injection. | | Delegated exchange | Agent session, delegation edge, scopes, hop count, and constraints. | ## Mandates Are Not API Keys Mandates are intentionally short lived and context bound. They should not be stored as durable credentials, copied into configuration files, or reused across unrelated resources. Resource servers should always verify a mandate at request time. Verification includes signature and claim checks plus revocation-anchor checks for the session, root session, agent session, and delegation edge when present. ## Failure Modes | Failure | Meaning | | --- | --- | | `invalid_token` | Signature, issuer, audience, required claim, or expiry validation failed. | | `scope_insufficient` | The mandate does not contain a required scope. | | `session_revoked` | One of the mandate revocation anchors has been revoked. | | `agent_identity_required` | The resource requires an agent mandate. | | `delegation_required` | The resource requires delegated authority. | | `chain_mismatch` | The delegation chain does not include the required application. | | `hop_count_exceeded` | The delegation path exceeds the configured hop limit. | ## Next Step Read [Agent Delegation](/concepts/delegation/) to understand how agent sessions pass bounded authority. ## Related Pages - [Sessions and Revocation](/concepts/sessions-revocation/) - [Protect an MCP Server](/guides/protect-mcp/) - [Run an Agent with caracal run](/guides/runtime-run/) --- # Agent Delegation # URL: https://docs.caracal.run/concepts/delegation/ # Markdown: https://docs.caracal.run/markdown/concepts/delegation.md # Type: page # Concepts: # Requires: --- Delegation lets one agent session pass a narrower, typed slice of authority to another agent session. Authority follows the **application**. Agents you spawn under the same application already act under that application's authority — no delegation edge is needed for ordinary fan-out. You create a delegation edge in exactly two cases: - To **narrow** authority, so a child holds only a subset of what its parent can do (least privilege). - To carry authority **across applications**, when the receiver is a different application that must consent. It is represented as a graph of directed edges. Each edge connects a source session to a target session, carries scopes and constraints, and can be revoked independently. ## Graph Model ```mermaid flowchart LR User["Subject session"] --> A["Agent session A"] A -->|"edge: read tickets"| B["Agent session B"] B -->|"edge: summarize only"| C["Agent session C"] A -. revoke .-> B B -. cascade .-> C ``` ## Delegation Edge Fields | Field | Purpose | | --- | --- | | Source session | The session that delegates authority. | | Target session | The child or receiving agent session. | | Issuer application | Application creating the delegation. | | Receiver application | Application receiving authority. | | Resource | Optional resource boundary for the edge. | | Scopes | Subset of authority being delegated. | | Constraints | Typed limits such as TTL, hop count, budget, and approval state. | | Status | Active or revoked lifecycle state. | ## Rules - Delegation should narrow authority, not expand it. - Delegation paths must not cycle. - Hop count should be bounded. - Revoking an upstream edge should invalidate downstream authority. - Resource servers should verify delegation claims when they require delegated access. ## What an Edge Bounds — and What It Does Not An edge bounds **the token issued for that edge** and **any further edge chained from it**. It does **not** raise the application's authority ceiling. This distinction is the whole security model, so make it explicit: - When a child is created with a **narrowing grant**, a `source → child` edge is recorded. A token exchange that presents this edge is bounded to the edge's scopes, and the coordinator enforces that any edge chained from the child is a subset of it (`scopes ⊆ parent`, plus TTL, hop, budget, and resource narrowing). - When a child is created with **inherit** (the default) under a **narrowed parent**, the coordinator **mirrors the parent's edge onto the child** in the same spawn transaction — a `parent → child` edge copying the parent's exact scopes, resource, constraints, and expiry. Least-privilege is therefore **transitive by default**: an inherit child cannot exceed its parent's narrowing. - When a child is created with **inherit** under a **root parent** (one that holds the application's full authority with no inbound edge), no edge is recorded and the child runs under the application's authority bounded by policy. There is nothing narrower to carry forward. Worked example: A spawns B with `grant=narrow([tickets:read])`, then B spawns C. - If B spawns C with **inherit**, the coordinator records a `B → C` edge mirroring B's `tickets:read` slice, so C is bounded by B's narrowing — not the application's full authority. - If B spawns C with a **narrowing grant**, a `B → C` edge is recorded and the coordinator rejects it unless `C ⊆ B` — so B can never pass on authority it does not hold. The consequence: the **application plus policy** is the hard, server-enforced boundary every session is contained by. Within one application, transitive inheritance keeps a narrowed subtree narrowed automatically. To place a subtree behind a real cross-application trust boundary, move it under a different application via `delegate()`, which requires receiver consent. All sessions of one application share a credential and a trust domain by design; transitive inheritance contains an *accidentally* un-narrowed descendant, while a separate application contains a *mutually distrusting* one. ## SDK Relationship The SDKs expose one primitive for creating children and one for granting a peer: | Language | Spawn a child | Grant to an existing peer | | --- | --- | --- | | TypeScript | `spawn()` / `spawn({ grant })` | `delegate()` | | Python | `spawn()` / `spawn(grant=…)` | `delegate()` | | Go | `Spawn()` / `Spawn` with `Grant` | `Delegate()` | `spawn()` returns a child running under the **same application's** authority — it never moves a child into a different application. Pass a **narrowing grant** (`Grant.narrow([...])`, `Grant.narrow(...)`, or `GrantNarrow(...)`) to bind a least-privilege delegation edge to the child, or `Grant.none()` for a child with no inherited authority. To hand authority across applications, use `delegate()`, which records an edge to an already-existing peer session under that other application. These helpers propagate session and delegation context so later token exchanges include the correct graph proof. ## Where You Interact With Delegation Delegation is authored **at runtime, from the SDK** — there is no Console form for creating edges, and it is not an internal-only concept. You create edges by calling `spawn(grant=…)` or `delegate(to=…)` in your agent code; the coordinator records and enforces them. You **observe** the resulting graph through audit: every token-exchange decision records the delegation edge id, the chain of hops, and the granted scopes, which the Console audit view and the Admin audit API expose for inspection and filtering. So the split is: **SDK to author, audit/Console to observe** — the mental model maps directly onto two SDK calls, not a separate management surface. ## Next Step Read [Delegation Constraints](/concepts/constraint/) to understand the limits carried by each edge. ## Related Pages - [Implement Multi-Agent Delegation](/guides/delegation/) - [Audit and Request Traces](/concepts/audit-ledger/) --- # Delegation Constraints # URL: https://docs.caracal.run/concepts/constraint/ # Markdown: https://docs.caracal.run/markdown/concepts/constraint.md # Type: page # Concepts: # Requires: --- Typed constraints make delegated authority explicit. They are stored on delegation edges and evaluated by policy, SDK context, and resource-server verification. ## Constraint Types | Constraint | Use it to limit | | --- | --- | | Resource | Which protected target can be reached. | | Scopes | Which actions can be requested. | | TTL | How long the delegated edge remains useful. | | Hop count | How deep the delegation chain may become. | | Budget | How much work, calls, or scope count a child can consume. | | Approval | Whether policy or a reviewer has approved a sensitive edge. | | Chain membership | Which applications must appear in the delegation path. | ## Example Shape ```json { "resource": "https://api.example.com/tickets", "scopes": ["tickets:read", "tickets:comment"], "max_hops": 2, "budget": 5, "expires_at": "2026-06-01T12:00:00Z", "policy_approved": true } ``` The exact constraint object can vary by policy design, but it should stay typed, stable, and auditable. ## Where Constraints Are Enforced ```mermaid flowchart TD SDK["SDK creates edge"] --> Coordinator["Coordinator stores constraints"] Coordinator --> STS["STS includes edge in policy input"] STS --> Policy["Policy evaluates constraints"] Policy --> Mandate["Mandate carries delegation claims"] Mandate --> Connector["Gateway or connector checks claims"] ``` ## Design Guidance - Put durable business rules in policy. - Put per-edge runtime limits in constraints. - Prefer positive allowlists over open-ended deny lists. - Keep constraints small enough to review in audit traces. - Use consistent field names across agents so policies stay readable. ## Next Step Read [Sessions and Revocation](/concepts/sessions-revocation/) to understand how active authority ends. ## Related Pages - [Agent Delegation](/concepts/delegation/) - [Policies and Policy Sets](/concepts/policy/) - [Implement Multi-Agent Delegation](/guides/delegation/) --- # Sessions and Revocation # URL: https://docs.caracal.run/concepts/sessions-revocation/ # Markdown: https://docs.caracal.run/markdown/concepts/sessions-revocation.md # Type: page # Concepts: # Requires: --- Sessions make Caracal authority temporary and revocable. A mandate includes session anchors, and resource servers check those anchors before accepting the mandate. ## Session Types | Session | Role | | --- | --- | | Subject session | Original authenticated user or service context. Also called the *authority session* in the Console. | | Agent session | Runtime context for a spawned agent. | | Delegated session | An agent session that holds an inbound delegation edge — a *state* of an agent session, not a separate object. | ## Revocation Anchors Resource servers check every relevant anchor: - session ID; - root session ID; - agent session ID; - delegation edge ID. If any anchor is revoked, the mandate should be rejected as `session_revoked`. ## Revocation Flow ```mermaid sequenceDiagram participant Control as Console or Admin API participant Store as Postgres participant Stream as Redis revocation stream participant Resource as Gateway or connector Control->>Store: Revoke grant, session, agent, or edge Store->>Stream: Publish revocation anchor Resource->>Stream: Consume caracal.sessions.revoke Resource->>Resource: Cache revoked anchor Resource-->>Resource: Reject matching mandates ``` The Redis stream name is `caracal.sessions.revoke`. Redis-backed connector packages can consume the stream and populate local revocation stores. ## Cascade Behavior Revocation should follow authority: - revoking a subject session invalidates derived agent sessions; - revoking an agent session invalidates its child edges; - revoking a delegation edge invalidates downstream delegated authority; - revoking a grant prevents future exchange and can invalidate active sessions depending on workflow. ## Resource-Server Responsibility The Gateway and connectors must be configured with a revocation store. For development, an in-memory store can be useful. For production, use a shared store and stream consumer so revocations propagate across resource-server instances. ## Next Step Read [Audit and Request Traces](/concepts/audit-ledger/) to understand how decisions and requests are explained. ## Related Pages - [Mandates](/concepts/mandate/) - [Protect an MCP Server](/guides/protect-mcp/) - [Tail and Query the Audit Stream](/guides/audit-stream/) --- # Audit and Request Traces # URL: https://docs.caracal.run/concepts/audit-ledger/ # Markdown: https://docs.caracal.run/markdown/concepts/audit-ledger.md # Type: page # Concepts: # Requires: --- The audit ledger records what Caracal decided, why it decided it, and which request or run produced the event. ## What Gets Audited | Event area | Examples | | --- | --- | | Token exchange | Allow, deny, step-up required, policy diagnostics. | | Gateway and connector use | Resource decision, mandate verification failure, request correlation. | | Policy lifecycle | Policy creation, validation, policy-set activation, simulation. | | Delegation | Edge creation, traversal, impact, revocation cascade. | | Sessions | Spawn, terminate, revoke, expire. | | Administration | Zone, application, resource, provider, grant, and challenge changes. | ## Audit Flow ```mermaid flowchart LR STS["STS"] --> Redis["Audit stream"] Gateway["Gateway / connectors"] --> Redis Admin["Admin API"] --> Redis Redis --> Worker["Ingestion worker"] Worker --> Postgres["Audit tables"] Postgres --> Console["Console audit and explain views"] Postgres --> API["Admin API audit endpoints"] ``` ## How to Use Audit | Question | Where to look | | --- | --- | | Why was a request denied? | Console `request trace` or Admin API explain endpoint by request ID. | | Which policy caused the decision? | Determining policies and diagnostics. | | Did revocation propagate? | Session, delegation, and resource decision events. | | Which run made a request? | Request ID, session ID, agent session ID, and trace context. | | Was step-up required or satisfied? | Step-up challenge and exchange events. | ## Request IDs Request IDs tie multiple events together. Keep the request ID from an SDK, Gateway, STS error, or Console trace whenever debugging. The explain view uses it to collect related decision events and diagnostics. ## Integrity and Retention Audit events should be treated as operational evidence. Configure retention, export, and SIEM forwarding according to your deployment requirements. Do not rely on local process logs as the only authority trail. ## Next Step Use [Guides](/guides/) when you are ready to apply the model. ## Related Pages - [Tail and Query the Audit Stream](/guides/audit-stream/) - [Trace One Protected Request](/tutorials/inspect-a-run/) - [Operations](/operations/) --- # Operate Caracal # URL: https://docs.caracal.run/operations/ # Markdown: https://docs.caracal.run/markdown/operations.md # Type: landing # Concepts: # Requires: --- Use Operations when Caracal is running as infrastructure: Docker Compose for self-hosted installs, Helm for Kubernetes, managed Postgres and Redis, production secrets, rollout gates, observability, incident handling, upgrades, and platform handoff. ## Operating Model ```mermaid flowchart LR Operator[Operator or platform team] --> Runtime[Runtime surface] Runtime --> Compose[Docker Compose] Runtime --> Helm[Helm release] Runtime --> Secrets[Runtime secrets] Runtime --> Checks[Health, readiness, smoke tests] Checks --> Services[API, STS, Gateway, Audit, Coordinator] Services --> Postgres[(PostgreSQL)] Services --> Redis[(Redis Streams)] ``` ## Start by Role | Role | Start with | | --- | --- | | Local or self-hosted operator | [Deploy with Docker Compose](/operations/docker-compose/) | | Kubernetes platform team | [Deploy with Helm](/operations/kubernetes-helm/) and [Deploy on Managed Kubernetes](/operations/cloud-reference-deployments/) | | Security reviewer | [Harden Production](/operations/tls-hardening/) and [Rotate Keys and Secrets](/operations/key-management/) | | SRE or on-call engineer | [Monitor Health and Metrics](/operations/observability/), [Configure Alerts](/operations/alerts/), [Recover from Failures](/operations/failure-modes/), and [Run Failure Drills](/operations/failure-drills/) | | Release owner | [Plan a Platform Rollout](/operations/platform-rollout-kit/), [Deploy Policy Changes](/operations/policy-deployment/), and [Upgrade Caracal](/operations/upgrade/) | ## Deployment Choices | Environment | Recommended path | Notes | | --- | --- | --- | | Local development | `caracal up` / `infra/docker/docker-compose.yml` | Builds local images, binds service ports to `127.0.0.1`, and writes local secrets. | | Self-hosted runtime | `infra/docker/runtime-compose.yml` | Uses versioned GHCR images and mounted secrets. | | Kubernetes | `infra/helm/caracal` | Uses Deployments/StatefulSets, pre-install/pre-upgrade migration Job, ClusterIP Services, optional Ingress, NetworkPolicy, PDBs, HPAs, ServiceMonitor, and PrometheusRule. | ## Core Operational Invariants - Postgres is the durable control-plane store. - Redis Streams move audit, policy invalidation, session revocation, key invalidation, agent, invocation, and delegation events. - STS and Gateway keep audit replay directories so audit emission can drain after Redis/Audit recovery. - Published modes are `rc` and `stable`; they require production-grade HMAC keys and reject unsafe fallbacks. - Product-management operations happen through Console, Admin SDK, or Control API; the top-level runtime CLI only manages local lifecycle and `caracal run`. ## Section Map | Need | Page | | --- | --- | | Compose deployment | [Deploy with Docker Compose](/operations/docker-compose/) | | Helm deployment | [Deploy with Helm](/operations/kubernetes-helm/) | | Runtime profiles | [Configure Service Environment](/operations/env-vars/) and [Choose a Cloud Profile](/operations/cloud-native-profiles/) | | Cloud deployment | [Deploy on Managed Kubernetes](/operations/cloud-reference-deployments/) | | Storage | [Operate PostgreSQL](/operations/postgres/) and [Operate Redis Streams](/operations/redis/) | | Hardening | [Harden Production](/operations/tls-hardening/) | | Rotation | [Rotate Keys and Secrets](/operations/key-management/) | | Scaling | [Scale Capacity](/operations/scale-capacity/) | | Observability | [Monitor Health and Metrics](/operations/observability/) and [Configure Alerts](/operations/alerts/) | | Recovery | [Recover from Failures](/operations/failure-modes/), [Run Failure Drills](/operations/failure-drills/), [Back Up and Retain Data](/operations/backup-retention/), and [Respond to Incidents](/operations/incident-response/) | --- # Deploy with Docker Compose # URL: https://docs.caracal.run/operations/docker-compose/ # Markdown: https://docs.caracal.run/markdown/operations/docker-compose.md # Type: workflow # Concepts: # Requires: --- Caracal ships two Compose shapes: | File | Purpose | | --- | --- | | `infra/docker/docker-compose.yml` | Local development stack that builds repository images. | | `infra/docker/runtime-compose.yml` | Self-hosted runtime stack using versioned GHCR images. | Both stacks run Postgres, Redis, migrations, STS, API, Gateway, Audit, Coordinator, and optional Control. ## Topology ```mermaid flowchart TB User[Operator or workload] --> API[API :3000] User --> STS[STS :8080] User --> Gateway[Gateway :8081] User --> Coordinator[Coordinator :4000] Console[Console] --> API Console --> Coordinator API --> Postgres[(Postgres :5432)] STS --> Postgres Gateway --> Postgres Audit[Audit :9090] --> Postgres API --> Redis[(Redis :6379)] STS --> Redis Gateway --> Redis Coordinator --> Redis Audit --> Redis ``` Local ports are bound to `127.0.0.1` so the OSS stack does not expose services externally by default. ## Prerequisites - Docker with Compose support. - Generated runtime secrets under the expected secrets directory. - For the self-hosted runtime file, `CARACAL_VERSION` and optional `CARACAL_REGISTRY`. ## Local Development Flow ```bash pnpm secrets:init caracal up caracal status --ready bash infra/scripts/smokeTest.sh ``` `caracal up` builds local images in development mode. It writes operator secrets outside the repository under the managed dev secret directory, exports `CARACAL_SECRETS_DIR` to Compose, and keeps the host directory private so only the operator account can mount those files into services. `caracal status --ready` checks dependency readiness, and `smokeTest.sh` probes `/ready` for API, Gateway, STS, Audit, and Coordinator, plus the API `/health` liveness endpoint. ## Self-Hosted Runtime Flow ```bash export CARACAL_VERSION=2026.06.22-rc.1 docker compose -f infra/docker/runtime-compose.yml up -d docker compose -f infra/docker/runtime-compose.yml ps bash infra/scripts/smokeTest.sh ``` The runtime compose file expects secrets under `./secrets/` relative to the compose file unless `CARACAL_SECRETS_DIR` points at a platform-managed secret mount. Keep the secret directory private to the operator account and never place inline secrets in the compose file. To move an existing runtime onto a newer release, use `caracal upgrade`. It stages the pinned images, applies expand-phase migrations against the running database, then rolls the services and waits for readiness — no maintenance window. See [Upgrade Caracal](/operations/upgrade/). ## Operator Secret Boundary Admin, Coordinator, database, Redis, HMAC, and KEK material are operator secrets. Do not place those files in a source workspace, agent workspace, MCP tool workspace, or any directory mounted into untrusted agent containers. | Deployment | Secret location | | --- | --- | | Local development through `caracal up` | Managed dev secret directory outside the repository; passed to Compose as `CARACAL_SECRETS_DIR`. | | Released self-hosted runtime | `$CARACAL_HOME/secrets` or an explicit `CARACAL_SECRETS_DIR`. | | Cloud or orchestrated deployment | Cloud/Kubernetes secret mount or secret-manager projection passed as `CARACAL_SECRETS_DIR` or service-specific `*_FILE` paths. | Run agents with workload credentials only. If an agent must execute untrusted code, run it as a separate OS user, container, VM, or sandbox that cannot read `CARACAL_SECRETS_DIR`, `$CARACAL_HOME/secrets`, Docker socket credentials, or repository control-plane files. ## Control Plugin Control runs as an optional in-process plugin inside the API service rather than a separate container. Set `CARACAL_CONTROL_ENABLED=true` on API to mount the plugin, then toggle availability at runtime with the gate file (`CONTROL_GATE_FILE`). Enable it only when the Console or platform automation is ready to manage Control exposure and credentials. ## Rollback 1. Stop the stack with `caracal down` or `docker compose ... down`. 2. Restore the previous `CARACAL_VERSION`. 3. Start the stack and wait for readiness. 4. Confirm audit replay and outbox queues drain before reopening high-risk traffic. ## Troubleshooting | Symptom | Check | | --- | --- | | Postgres or Redis never becomes healthy | Check secret files, mounted volumes, and host port conflicts on `5432` or `6379`. | | API ready fails | Confirm migrations completed and `DATABASE_URL_FILE`, `REDIS_URL_FILE`, and `CARACAL_ADMIN_TOKEN_FILE` resolve. | | STS or Gateway fails in `rc`/`stable` | Confirm `AUDIT_HMAC_KEY`, `STREAMS_HMAC_KEY`, and `GATEWAY_STS_HMAC_KEY` are hex-encoded and at least 32 bytes. | | Gateway rejects upstreams | Operator-provisioned private upstreams are allowed by default; if set, confirm the host is in `UPSTREAM_HOST_ALLOWLIST`. Cloud metadata, loopback, CGNAT, and multicast are always blocked. | | Audit gaps appear after Redis outage | Keep STS and Gateway replay volumes intact and wait for replay metrics to drain. | ## Next Step Use [Configure Service Environment](/operations/env-vars/) to review required service variables and secret-file settings. --- # Deploy with Helm # URL: https://docs.caracal.run/operations/kubernetes-helm/ # Markdown: https://docs.caracal.run/markdown/operations/kubernetes-helm.md # Type: workflow # Concepts: # Requires: --- The Helm chart in `infra/helm/caracal` renders production-shaped Kubernetes resources for API, STS, Gateway, Audit, Coordinator, and optional Control. ## Rendered Resources | Resource | Purpose | | --- | --- | | Deployment | API, Audit, Coordinator, and Control when enabled. | | StatefulSet | STS and Gateway when replay persistence is enabled. | | Job | Pre-install/pre-upgrade Postgres migrations. | | Service | ClusterIP service per enabled component. | | Ingress | Optional external API and Gateway endpoints. | | NetworkPolicy | Default-deny with explicit internal, storage, DNS, OTLP, HTTPS, and custom egress controls. | | PDB/HPA | Availability and scaling controls per service. | | ServiceMonitor/PrometheusRule | Metrics scraping and built-in alert recipes. | ## Profile Selection The chart ships three overlays on top of `values.yaml`. Pick one explicitly — `values.production.yaml` is the canonical enterprise install path: | Overlay | `global.mode` | Posture | Use for | | --- | --- | --- | --- | | `values.production.yaml` | `stable` | Fail-closed; replicas, PDBs, anti-affinity baked in | **Production (canonical)** | | `values.rc.yaml` | `rc` | Same fail-closed trust boundary; release-candidate build may be functionally unstable | Staging / pre-release validation | | `values.dev.yaml` | `dev` | Relaxed, single-host conveniences | Local evaluation only | Always pass the chosen overlay with `--values`; never install on bare `values.yaml`. ## Minimal Install Shape ```bash helm upgrade --install caracal infra/helm/caracal \ --namespace caracal \ --create-namespace \ --values infra/helm/caracal/values.production.yaml ``` Before installing, provide the runtime Secret named by `secrets.runtimeSecretName` or set `secrets.create=true` with plaintext values only for a controlled evaluation environment. For a managed-cloud install with the runtime Secret provisioned by the External Secrets Operator, a production overlay, and cert-manager TLS ingress, follow [Deploy on Managed Kubernetes](/operations/cloud-reference-deployments/). ## Service Ports | Component | Port | | --- | --- | | API | `3000` | | STS | `8080` | | Gateway | `8081` | | Audit | `9090` | | Coordinator | `4000` | | Control | in-process in API (`3000`) when enabled | ## Production Values to Review | Area | Values | | --- | --- | | Images | `global.registry`, `global.tag`, `global.imagePullPolicy`, `global.mode` | | Secrets | `secrets.runtimeSecretName`, `secrets.database.*`, `secrets.redis.*`, token secret refs | | Network | `networkPolicy.*`, `ingress.gateway.*`, `ingress.api.*` | | Reliability | `services..replicas`, `pdb`, `hpa`, resource requests/limits | | Replay | `replayPersistence.enabled`, `storageClassName`, `size` | | Observability | `serviceMonitor.enabled`, `alertRules.enabled`, `global.otelExporterOtlpEndpoint` | ## Validation Flow ```bash helm lint infra/helm/caracal helm template caracal infra/helm/caracal --values infra/helm/caracal/values.production.yaml >/tmp/caracal.yaml kubectl -n caracal rollout status deploy/caracal-api kubectl -n caracal get pods,svc ``` Run readiness probes through the Service or Ingress endpoints after rollout. ## Rollback ```bash helm -n caracal history caracal helm -n caracal rollback caracal ``` Migrations are forward-only. Rollbacks must use application versions compatible with the already-applied schema. ## Troubleshooting | Symptom | Check | | --- | --- | | Migration Job fails | Check database host, port, user, database name, password key, and `sslMode`. | | Pods fail readiness | Inspect dependency readiness, mounted secrets, and service-specific required env vars. | | Gateway Ingress works but upstream calls fail | Check NetworkPolicy egress and upstream allowlist configuration. | | STS or Gateway loses audit replay | Ensure replay persistence is enabled and PVCs are not deleted during rollouts. | ## Next Step Use [Choose a Cloud Profile](/operations/cloud-native-profiles/) when the Helm deployment needs managed storage, ingress, external secrets, or production network policy. --- # Choose a Cloud Profile # URL: https://docs.caracal.run/operations/cloud-native-profiles/ # Markdown: https://docs.caracal.run/markdown/operations/cloud-native-profiles.md # Type: reference # Concepts: # Requires: --- Cloud-native Caracal deployments should use the Helm chart with externalized state: managed Postgres, managed Redis, platform secrets, ingress, NetworkPolicy, and cluster observability. For a concrete, copy-paste version of this profile — including External Secrets manifests, a production overlay, cert-manager TLS ingress, and a per-cloud (EKS/GKE/AKS) substitution table — see [Deploy on Managed Kubernetes](/operations/cloud-reference-deployments/). ## Profile Map | Profile | Values to adjust | | --- | --- | | Evaluation | `global.mode=rc`, short retention, limited replicas, temporary secrets, no public ingress unless required. | | Production | `global.mode=stable`, managed Postgres/Redis, TLS ingress, NetworkPolicy, PDB/HPA, ServiceMonitor, PrometheusRule. | | Private cluster | Disable public ingress; expose API/Gateway through internal load balancers or service mesh. | | Regulated workload | Enforce external secret manager, immutable backups, audit export, restricted egress, and incident evidence retention. | ## Managed Dependency Contract | Dependency | Requirements | | --- | --- | | Postgres | TLS-capable connection, migration role, durable backups, point-in-time recovery, sufficient connection capacity. | | Redis | Streams support, authentication, memory policy sized for audit/revocation/outbox traffic, persistence according to recovery goals. | | Secrets | Runtime Secret with database, Redis, admin, Coordinator, zone KEK, audit HMAC, stream HMAC, and Gateway-STS HMAC keys. | | Observability | Metrics scraping for all services and alerts equivalent to the chart PrometheusRule. | ## Network Stance The chart NetworkPolicy allows Caracal pod-to-pod traffic and storage egress. Add explicit DNS, HTTPS, identity provider, object store, and provider API egress only as needed. ```yaml networkPolicy: enabled: true allowOpenDns: false allowOpenHttps: false dnsEgress: [] extraEgress: [] extraIngress: [] ``` ## Ingress Stance Expose only the endpoints required by your environment: | Endpoint | When to expose | | --- | --- | | Gateway | Required for protected resource traffic through Caracal Gateway. | | API | Required for Console/Admin clients outside the cluster. | | STS | Expose only when token exchange clients cannot reach it privately. | | Audit/Coordinator/Control | Prefer private access. | The chart currently has optional Ingress templates for Gateway and API. ## Validation 1. Render manifests with production values. 2. Confirm no plaintext secrets are committed. 3. Confirm NetworkPolicy egress covers required provider APIs and object stores. 4. Confirm ServiceMonitor or equivalent metrics scraping is active. 5. Confirm backup and restore evidence exists for Postgres and runtime secrets. ## Next Step Use [Deploy on Managed Kubernetes](/operations/cloud-reference-deployments/) for a concrete External Secrets, cert-manager, managed Postgres, and managed Redis deployment. --- # Deploy on Managed Kubernetes # URL: https://docs.caracal.run/operations/cloud-reference-deployments/ # Markdown: https://docs.caracal.run/markdown/operations/cloud-reference-deployments.md # Type: workflow # Concepts: # Requires: --- This page turns the [Choose a Cloud Profile](/operations/cloud-native-profiles/) map into a concrete, copy-paste deployment. The chart is cloud-agnostic by design; the only per-cloud surface is the secret-store authentication, the storage class, and the ingress annotations. The reference artifacts live in `infra/helm/caracal/examples`: - `values.cloud-managed.yaml` — production overlay with externalized state, cert-manager TLS ingress, ServiceMonitor, and tightened egress. - `external-secrets/` — External Secrets Operator stores for AWS, GCP, and Azure plus one cloud-agnostic `ExternalSecret`. ## Prerequisites | Component | Why | | --- | --- | | Managed Postgres | Authoritative product, policy, audit, and delegation state. | | Managed Redis | Streams for audit, revocation, policy invalidation, and coordination. | | [External Secrets Operator](https://external-secrets.io/) | Provisions `caracal-runtime` from your secret manager so no plaintext secret is rendered. | | [cert-manager](https://cert-manager.io/) | Issues TLS certificates for the Gateway and API ingress. | | Prometheus Operator | Consumes the chart `ServiceMonitor` and `PrometheusRule`. | ## 1. Store Secret Material Load the Caracal secret material into your secret manager under the keys the `ExternalSecret` reads: | Secret manager key | Contents | | --- | --- | | `caracal/postgres` | `username`, `password`, `host`, `port`, `database`. | | `caracal/redis` | `password`, `host`, `port`, `database`. | | `caracal/zone-kek` | Zone key-encryption key. | | `caracal/audit-hmac-key`, `caracal/streams-hmac-key`, `caracal/gateway-sts-hmac-key` | Integrity keys. | | `caracal/admin-token`, `caracal/coordinator-token` | Control-plane tokens. | | `caracal/sts-admin-token`, `caracal/audit-admin-token`, `caracal/metrics-bearer` | Optional service tokens. | ## 2. Apply the Secret Store and ExternalSecret Pick the store for your cloud and apply it with the agnostic `ExternalSecret`: ```bash cd infra/helm/caracal/examples kubectl apply -f external-secrets/secretstore-aws.yaml kubectl apply -f external-secrets/externalsecret-runtime.yaml ``` The `ExternalSecret` composes `databaseUrl` and `redisUrl` and materializes the exact keys the chart mounts, so `secrets.create` stays `false`. ## 3. Install the Chart with the Overlay ```bash helm upgrade --install caracal infra/helm/caracal \ --namespace caracal --create-namespace \ -f infra/helm/caracal/examples/values.cloud-managed.yaml \ --set secrets.database.host= \ --set secrets.redis.host= ``` Stable mode enforces externalized Postgres and Redis hosts and forbids in-chart plaintext secrets, so the render fails fast if either is left at a default. ## Per-Cloud Substitutions | Setting | AWS (EKS) | GCP (GKE) | Azure (AKS) | | --- | --- | --- | --- | | Secret store | `secretstore-aws.yaml` (IRSA) | `secretstore-gcp.yaml` (Workload Identity) | `secretstore-azure.yaml` (Workload Identity) | | Managed Postgres | RDS / Aurora PostgreSQL | Cloud SQL for PostgreSQL | Azure Database for PostgreSQL | | Managed Redis | ElastiCache / MemoryDB | Memorystore for Redis | Azure Cache for Redis | | `replayPersistence.storageClassName` | `gp3` | `premium-rwo` | `managed-csi` | | `ingress.*.className` | `alb` | `gce` | `azure-application-gateway` | | Ingress annotations | ALB scheme/target-type | GCE managed cert or cert-manager | AGIC annotations | Keep `networkPolicy.allowOpenDns` and `allowOpenHttps` `false`; set `dnsEgress` and `extraEgress` to the named CIDRs of your managed Postgres, Redis, identity provider, and provider APIs. ## Upgrades and Rollback Caracal database migrations are forward-only by design. Plan upgrades as roll-forward changes and only roll back application images when they remain schema-compatible. See [Upgrade Caracal](/operations/upgrade/). ## Why No Terraform Module Caracal ships standardized, portable interfaces — External Secrets Operator, cert-manager, and the Helm chart — rather than per-cloud infrastructure-as-code modules. This keeps the deployment surface agnostic and avoids coupling Caracal to a single provider's resource model. Use your own IaC to provision the managed Postgres, Redis, secret manager, and cluster, then apply these artifacts. ## Validation 1. Confirm the `ExternalSecret` reports `SecretSynced` and `caracal-runtime` exists. 2. Render manifests and confirm no plaintext secret is present. 3. Confirm cert-manager issued the Gateway and API TLS certificates. 4. Confirm the `ServiceMonitor` is scraped and `PrometheusRule` alerts are loaded. 5. Run a canary token exchange and a Gateway-protected request, then confirm audit evidence. ## Next Step Use [Harden Production](/operations/tls-hardening/) before opening external API or Gateway ingress to production traffic. --- # Package an Install Kit # URL: https://docs.caracal.run/operations/enterprise-install-kit/ # Markdown: https://docs.caracal.run/markdown/operations/enterprise-install-kit.md # Type: workflow # Concepts: # Requires: --- Use the enterprise install kit pattern when a platform team needs a controlled bundle of Caracal deployment assets, values, secrets instructions, verification steps, and rollback guidance. Keep the open-source and enterprise products isolated: do not import, copy, or depend on files across product roots. ## Kit Contents | Artifact | Purpose | | --- | --- | | Release version | Exact Caracal image tag and chart version. | | Helm values | Environment-specific overrides for registry, tag, secrets, network policy, ingress, resources, replay persistence, and observability. | | Secret manifest | Required keys and where the platform secret manager stores them. | | Runbook | Install, verify, rollback, backup, restore, and incident steps. | | Evidence checklist | Rendered manifest, migration result, readiness output, smoke test, alert wiring, backup proof, and owner sign-off. | ## Required Boundaries - Keep OSS local ports and enterprise local ports non-overlapping in examples. - Do not reference private source paths from open-source docs or configs. - Share behavior only through documented deployment contracts: images, chart values, APIs, event topics, and secrets. - Keep customer-specific values outside the repository. ## Installation Sequence ```mermaid flowchart LR Package[Package kit] --> Review[Security and platform review] Review --> Secrets[Create runtime secrets] Secrets --> Render[Render Helm manifests] Render --> Install[Install or upgrade] Install --> Verify[Readiness, smoke, alerts, backups] Verify --> Handoff[Operations handoff] ``` ## Verification Evidence | Check | Evidence | | --- | --- | | Version pin | Image tags and chart values match the release approval. | | Secrets | Runtime Secret contains database, Redis, admin, Coordinator, zone KEK, audit HMAC, stream HMAC, and Gateway-STS HMAC material. | | Network | NetworkPolicy admits only required ingress and egress. | | Storage | Postgres migrations complete; Redis streams and groups exist. | | Runtime | `/ready` passes for API, STS, Gateway, Audit, and Coordinator. | | Observability | ServiceMonitor and PrometheusRule are installed or equivalent alerts exist. | | Recovery | Backup and restore runbook has been tested for Postgres and runtime secrets. | ## Handoff Package Include links to [Deploy with Helm](/operations/kubernetes-helm/), [Configure Service Environment](/operations/env-vars/), [Back Up and Retain Data](/operations/backup-retention/), [Configure Alerts](/operations/alerts/), and [Respond to Incidents](/operations/incident-response/). ## Next Step Use [Hand Off to Platform Teams](/operations/platform-team-handoff/) when the install bundle is ready for production ownership. --- # Configure Service Environment # URL: https://docs.caracal.run/operations/env-vars/ # Markdown: https://docs.caracal.run/markdown/operations/env-vars.md # Type: reference # Concepts: # Requires: --- Caracal services load configuration from environment variables and `*_FILE` secret variants. Use file-backed secrets in `rc` and `stable` modes. ## Mode Requirements `CARACAL_MODE` selects one of two security postures, not three: - **`dev`** — the only relaxed posture. Permits local conveniences (open metrics, `INSECURE_*` overrides, generated/relaxed settings) for development on a single host. Never use it for anything reachable by others. - **Published (`rc` and `stable`)** — the production posture. Both are **security-identical and fail-closed**: valid HMAC keys are required, Gateway fail-open settings are denied, and incomplete service configuration is rejected before readiness succeeds. The difference between `rc` and `stable` is **maturity, not security**. `rc` is a release-candidate build that enforces the full production trust boundary but may be functionally unstable; run `stable` for production workloads and reserve `rc` for staging or pre-release validation. | Mode | Posture | Use for | | --- | --- | --- | | `dev` | Relaxed (single-host only) | Local development | | `rc` | Published — fail-closed, may be functionally unstable | Staging / pre-release validation | | `stable` | Published — fail-closed | Production | Published modes require valid HMAC keys, deny unsafe Gateway fail-open settings, and reject incomplete service configuration before readiness succeeds. ## Shared Service Variables | Variable | Mode | Used by | Meaning | | --- | --- | --- | --- | | `CARACAL_MODE` | all | All services | `dev`, `rc`, or `stable`. | | `PORT` | all | HTTP services | Fixed service port: API `3000`, STS `8080`, Gateway `8081`, Audit `9090`, Coordinator `4000`. Control runs in-process in API. | | `HOST` | all | TypeScript services | Bind host; defaults to `0.0.0.0` in published modes. | | `LOG_LEVEL` | all | Services | Log verbosity. | | `DATABASE_URL` / `DATABASE_URL_FILE` | all | API, STS, Gateway, Audit, Coordinator | Postgres connection. | | `REDIS_URL` / `REDIS_URL_FILE` | all | API, STS, Gateway, Audit, Coordinator | Redis connection. | | `STREAMS_HMAC_KEY` / `STREAMS_HMAC_KEY_FILE` | rc+stable | API, STS, Gateway, Coordinator | Signs Redis stream messages. | | `AUDIT_HMAC_KEY` / `AUDIT_HMAC_KEY_FILE` | rc+stable | API, STS, Gateway, Audit, Control | Signs or verifies audit events. | | `METRICS_BEARER` / `METRICS_BEARER_FILE` | rc+stable | STS, Gateway, API, Audit, Coordinator | Metrics endpoint bearer token. Generated as the managed `metricsBearer` secret by `caracal up`; in published mode (`rc`/`stable`) the metrics endpoints fail closed (401) without it. | ## Service-Specific Variables | Service | Variables | | --- | --- | | API | `CARACAL_ADMIN_TOKEN`, `ZONE_KEK`, `GATEWAY_STS_HMAC_KEY`, `STS_URL`, `API_BODY_LIMIT_BYTES`, `API_V1_RATE_LIMIT_PER_MIN`, `API_READY_OUTBOX_DEAD_MAX`, `TRUST_PROXY` | | STS | `ISSUER_URL`, `ZONE_KEK`, `ZONE_KEK_PROVIDER`, `GATEWAY_STS_HMAC_KEY`, `STS_ADMIN_TOKEN`, `MAX_GRANT_TTL_SECONDS`, `OPA_POLL_SECONDS`, `AUDIT_REPLAY_DIR` | | Gateway | `STS_URL`, `GATEWAY_STS_HMAC_KEY`, `UPSTREAM_HOST_ALLOWLIST`, `TLS_CERT_FILE`, `TLS_KEY_FILE`, `MAX_REQUEST_BYTES`, `JTI_FAIL_OPEN`, `AUDIT_REPLAY_DIR` | | Audit | `AUDIT_ADMIN_TOKEN`, `AUDIT_RETENTION_DAYS`, `AUDIT_MAX_DELIVERIES`, `AUDIT_CLAIM_IDLE_SECS`, `AUDIT_TAMPER_ROLLING_HOURS`, `AUDIT_EXPORT_S3_*`, readiness thresholds | | Coordinator | `ISSUER_URL`, `STS_URL`, `AGENT_COORDINATOR_SCOPE`, `CARACAL_COORDINATOR_TOKEN`, sweep intervals, retention windows, rate limits | | Control | `STS_JWKS_URL`, `STS_ISSUER_URL`, `CONTROL_AUDIENCE`, `CONTROL_REDIS_URL`, `CONTROL_API_TOKEN`, `CARACAL_API_URL`, rate and replay controls | ## Runtime Workload Variables These variables belong to `caracal run` and SDK profile loading, not to service deployment: | Variable | Meaning | | --- | --- | | `CARACAL_CONFIG` | Explicit runtime profile path. | | `CARACAL_STS_URL` or `CARACAL_ZONE_URL` | Cloud/custom STS URL override for token exchange. | | `CARACAL_GATEWAY_URL` | Cloud/custom Gateway URL override for SDK transports. | | `CARACAL_ZONE_ID` | Zone identifier. | | `CARACAL_APPLICATION_ID` | Confidential application ID. | | `CARACAL_APP_CLIENT_SECRET` | Cloud/custom workload client secret when a file-backed secret is not used. | | `CARACAL_APP_CLIENT_SECRET_FILE` | Cloud/custom workload secret file. | | `CARACAL_RUN_CREDENTIALS` | Inline JSON credential manifest for resource audiences. | | `CARACAL_RUN_CREDENTIALS_FILE` | Cloud/custom JSON credential manifest. | | `CARACAL_RESOURCES` | Comma-separated `resource_id=upstream_prefix` bindings for Gateway routing. | | `CARACAL_RESOURCES_FILE` | JSON resource binding file for Gateway routing. | Local dev and stable runtime launches auto-detect credential files from the OS Caracal config directory. Use the explicit file variables only when a deployment mounts secrets somewhere else. Resource binding files may contain either a flat JSON object such as `{"calendar":"https://api.example.com/v1"}` or an array of `{"resource_id":"calendar","upstream_prefix":"https://api.example.com/v1"}` objects. When both resource variables are set, entries in `CARACAL_RESOURCES` override matching entries from `CARACAL_RESOURCES_FILE`. Local dev and stable runtime launches also resolve service URLs automatically. Use explicit service URL variables only when a deployment does not use the local stack defaults. ## Published-Mode Rules Before a published deployment becomes ready: 1. `GATEWAY_STS_HMAC_KEY` must be hex-encoded and at least 32 bytes for STS and Gateway. 2. `AUDIT_HMAC_KEY` must be present for Audit, Gateway, and Control. 3. Gateway must not run with `JTI_FAIL_OPEN=true`. 4. Gateway allows operator-provisioned private upstreams by default; set `UPSTREAM_HOST_ALLOWLIST` to pin egress to named hosts. 5. STS caps `OPA_POLL_SECONDS` at `300`. ## Troubleshooting | Symptom | Check | | --- | --- | | A service starts locally but fails in Helm | Published modes require stricter HMAC, URL, and secret validation. | | File secret is ignored | Confirm the variable uses the `*_FILE` suffix supported by that service. | | Wrong port error | Use the fixed port for the service; several services reject arbitrary `PORT` values. | | Runtime workload cannot find config | Use [Configure Workloads](/runtime-console/config-file/) rather than service env vars. | ## Next Step Use [Harden Production](/operations/tls-hardening/) to confirm published-mode, network, secret, and upstream safety settings. --- # Harden Production # URL: https://docs.caracal.run/operations/tls-hardening/ # Markdown: https://docs.caracal.run/markdown/operations/tls-hardening.md # Type: workflow # Concepts: # Requires: --- Production Caracal should run in published mode (`CARACAL_MODE=stable`, or `rc` for staging), use TLS at external boundaries, and keep stateful dependencies private. `rc` and `stable` enforce the same fail-closed trust boundary; `rc` is a release-candidate build that may be functionally unstable, so use `stable` for production. ## Hardening Checklist | Area | Required stance | | --- | --- | | External TLS | Terminate TLS at ingress, load balancer, service mesh, or Gateway TLS files. | | Internal services | Prefer private ClusterIP or bridge-network traffic. | | Secrets | Use mounted secret files, not inline env vars. | | Containers | Drop Linux capabilities, set no-new-privileges, use read-only filesystems where supported. | | NetworkPolicy | Admit only required ingress and egress. | | Gateway upstreams | Keep private upstreams disabled unless explicitly allowed and allowlisted. | | Runtime CLI | Do not expose product management as top-level runtime CLI commands. | ## Gateway Upstream Safety Upstream hosts are operator-provisioned through the Control API, never client-supplied, so the Gateway routes to private and on-prem upstreams (internal tools, local MCPs) by default. Dangerous infrastructure ranges that are never a legitimate upstream — cloud metadata (`169.254.0.0/16`), loopback, carrier-grade NAT, and multicast — are always blocked, including under DNS rebinding. To pin egress to an explicit set of hosts as a hardening control, set: ```bash UPSTREAM_HOST_ALLOWLIST=internal-api.example.svc,provider.example.com ``` ## HMAC and Signing Keys Published modes require production-grade keys: | Key | Used for | | --- | --- | | `AUDIT_HMAC_KEY` | Audit event integrity. | | `STREAMS_HMAC_KEY` | Redis stream message signatures. | | `GATEWAY_STS_HMAC_KEY` | Gateway-to-STS service exchange. | | `ZONE_KEK` | Zone secret/key encryption. | HMAC keys must be hex-encoded and at least 32 bytes where validated. ## Kubernetes Controls The Helm chart sets non-root pods, RuntimeDefault seccomp, dropped capabilities, read-only root filesystem for service containers, optional NetworkPolicy, PDBs, HPAs, and optional Ingress TLS. ## Troubleshooting | Symptom | Check | | --- | --- | | Published-mode startup fails | Missing or short HMAC key, unsafe URL, forbidden fail-open setting, or wrong service port. | | Gateway cannot reach upstream | NetworkPolicy egress, DNS egress, HTTPS egress, and upstream allowlist. | | Metrics endpoint denied | Confirm `METRICS_BEARER` and scraper configuration. | | TLS works externally but clients fail token validation | Confirm STS `ISSUER_URL` matches the URL clients use for JWKS and token issuer checks. | ## Next Step Use [Rotate Keys and Secrets](/operations/key-management/) to plan secret storage, key overlap, and rotation evidence. --- # Rotate Keys and Secrets # URL: https://docs.caracal.run/operations/key-management/ # Markdown: https://docs.caracal.run/markdown/operations/key-management.md # Type: workflow # Concepts: # Requires: --- Caracal uses secret material for zone encryption, mandate signing, audit integrity, stream signatures, service exchange, admin access, Coordinator access, database access, and Redis access. ## Key Inventory | Material | Purpose | | --- | --- | | `ZONE_KEK` | Encrypts zone secrets and signing material. | | Zone signing keys | Sign mandates exposed through STS JWKS. | | `AUDIT_HMAC_KEY` | Signs/verifies audit evidence. | | `STREAMS_HMAC_KEY` | Signs Redis stream messages. | | `GATEWAY_STS_HMAC_KEY` | Authenticates Gateway exchanges to STS. | | `CARACAL_ADMIN_TOKEN` | Bootstrap/operator API access. | | `CARACAL_COORDINATOR_TOKEN` | Coordinator agent/delegation management. | | Postgres and Redis passwords | Storage access. | ## Storage Boundary Generated operator secrets are file-backed inside a private operator directory. Local development stores them outside the repository; released self-hosted runtimes store them under `$CARACAL_HOME/secrets` unless `CARACAL_SECRETS_DIR` points to another operator-managed directory. Cloud deployments should use the platform secret manager and mount secrets into services through `*_FILE` paths or `CARACAL_SECRETS_DIR`. Never mount operator secrets into agent workspaces or containers. Agents should receive only application client secrets at provisioning time or short-lived resource mandates from `caracal run`. ## Rotation Principles - Rotate one class of material at a time. - Keep old verification material available until existing tokens, stream messages, or audit replay windows drain. - Verify `/ready`, audit ingestion, revocation propagation, and Gateway exchange after each rotation. - Record who rotated, when, why, new secret location, and rollback window. ## Suggested Rotation Order ```mermaid flowchart LR Prepare[Prepare new secret] --> Deploy[Deploy secret and config] Deploy --> Roll[Roll dependent service] Roll --> Verify[Verify readiness, audit, JWKS, and revocation] Verify --> Retire[Retire old material after safety window] ``` ## Rotation Checklist | Key | Services to roll | Verify | | --- | --- | --- | | `ZONE_KEK` | API and STS after coordinated key plan. | Zone secrets decrypt and STS can read active signing material. | | Zone signing key | STS, then verifiers after JWKS cache windows. | JWKS exposes expected key IDs and old mandates expire or verify during overlap. | | `AUDIT_HMAC_KEY` | Producers and Audit verifier: API, STS, Gateway, Audit, Control. | Audit ingestion accepts new events and rejects mismatched signatures. | | `STREAMS_HMAC_KEY` | Producers and consumers: API, STS, Gateway, Coordinator, Audit. | Consumers accept new stream messages and pending entries drain. | | `GATEWAY_STS_HMAC_KEY` | API, STS, and Gateway. | Gateway exchange succeeds and failed HMAC attempts are denied. | | `CARACAL_ADMIN_TOKEN` | API, Console/Control clients, Control plugin. | Old token fails; Console and automation authenticate with the replacement. | | `CARACAL_COORDINATOR_TOKEN` | Coordinator and Console/operator clients. | Agent/delegation views load and old token fails. | ## Troubleshooting | Symptom | Check | | --- | --- | | JWKS clients reject mandates after rotation | Issuer URL, JWKS cache age, key overlap window, and STS JWKS health. | | Audit DLQ grows | Producer and Audit `AUDIT_HMAC_KEY` mismatch. | | Stream consumers reject messages | Producer and consumer `STREAMS_HMAC_KEY` mismatch. | | Gateway exchange fails | `GATEWAY_STS_HMAC_KEY` mismatch between Gateway and STS. | ## Next Step Use [Operate PostgreSQL](/operations/postgres/) and [Operate Redis Streams](/operations/redis/) to verify durable state and event propagation. --- # Operate PostgreSQL # URL: https://docs.caracal.run/operations/postgres/ # Markdown: https://docs.caracal.run/markdown/operations/postgres.md # Type: workflow # Concepts: # Requires: --- Postgres stores Caracal control-plane state: zones, providers, applications, resources, grants, sessions, policies, audit events, agent sessions, delegation edges, outboxes, gateway bindings, admin tokens, and step-up challenges. ## Storage Model ```mermaid flowchart TB API[API writes product state] --> PG[(Postgres)] Coordinator[Coordinator writes agents and delegation] --> PG Audit[Audit ingests audit events] --> PG STS[STS reads policy, grants, keys, sessions] --> PG Gateway[Gateway reads bindings and revocation snapshots] --> PG PG --> Outbox[event_outbox and caracal_outbox] ``` ## Migrations The migration runner: - applies `/migrations/*.up.sql` files in sorted order; - records versions in `public.schema_migrations` (schema-qualified so it survives migrations that reset `search_path`); - serializes concurrent runners with a Postgres advisory lock; - rejects unsafe migration filenames; - is used by Compose `dbMigrate` and Helm pre-install/pre-upgrade Job. The schema starts from a single consolidated baseline (`0001_baseline.up.sql`). Later releases append expand-only migrations on top of it. Each migration must be backward compatible with the previous application version so it can be applied while that version still serves traffic; contract-phase changes are deferred to a later release and tagged `-- caracal:phase contract`. `validateMigrations.sh` enforces this, which is what lets [`caracal upgrade`](/operations/upgrade/) migrate without a maintenance window. ```bash PGHOST=localhost PGPORT=5432 PGUSER=caracal PGDATABASE=caracal \ PGPASSWORD_FILE="${CARACAL_SECRETS_DIR:-${CARACAL_HOME:-$HOME/.local/share/caracal}/secrets}/postgresPassword" \ bash infra/postgres/scripts/migrate.sh ``` Production tooling is forward-only and does not reference down migrations. ## Verification ```bash bash infra/postgres/scripts/validateMigrations.sh ``` The validation script checks migration idempotency, concurrent migrators, expand-only schema discipline, expected tables, append-only audit permissions, immutable policy-version trigger, fail-closed row-level security, and the rolling audit partition window. ## Capacity Controls | Area | Settings | | --- | --- | | Compose Postgres | `POSTGRES_MAX_CONNECTIONS`, `POSTGRES_SHARED_BUFFERS`, `POSTGRES_EFFECTIVE_CACHE_SIZE`, `POSTGRES_WORK_MEM`, `POSTGRES_LOG_MIN_DURATION_MS` | | Service pools | `DB_POOL_MAX`, statement timeout, idle-in-transaction timeout, connection timeout, idle timeout | | Helm | Use managed Postgres sizing plus service resource requests/limits. | ## Backup and Restore Back up the whole database. Audit events are append-only and partitioned; restore must preserve audit tables, admin audit tables, outboxes, agent/delegation state, gateway bindings, and key material references. ## Troubleshooting | Phase | Symptom | First check | | --- | --- | --- | | Bootstrap | Migration Job fails | Database secret, host, port, user, database, SSL mode, and advisory-lock connectivity. | | Operations | API outbox grows | Redis reachability, outbox dispatcher logs, DB pool saturation, and `event_outbox` dead rows. | | Performance | Audit queries slow | Partition health, indexes, retention, and query time windows. | | Access control | RLS errors | Confirm service role and zone context handling before widening permissions. | ## Next Step Use [Operate Redis Streams](/operations/redis/) to verify stream groups, pending entries, invalidation, revocation, and audit delivery. --- # Operate Redis Streams # URL: https://docs.caracal.run/operations/redis/ # Markdown: https://docs.caracal.run/markdown/operations/redis.md # Type: workflow # Concepts: # Requires: --- Redis Streams move operational events between Caracal services. Redis is not the durable source of product state; Postgres remains authoritative. ## Streams and Groups | Stream | Consumer groups | | --- | --- | | `caracal.audit.events` | `audit-ingestor`, `siem-export` | | `caracal.audit.events.dlq` | `audit-dlq-observer` | | `caracal.policy.invalidate` | `opa-engine` | | `caracal.sessions.revoke` | `sts-revocation` | | `caracal.keys.invalidate` | `sts-keys` | | `caracal.agents.lifecycle` | `coordinator-relay` | | `caracal.invocations.lifecycle` | `invocations-observer` | | `caracal.delegations.invalidate` | `delegations-observer` | | `caracal.providers.ratelimit` | initialized as a one-entry stream | ## Provision ```bash REDIS_HOST=localhost REDIS_PORT=6379 \ REDIS_PASSWORD_FILE="${CARACAL_SECRETS_DIR:-${CARACAL_HOME:-$HOME/.local/share/caracal}/secrets}/redisPassword" \ bash infra/redis/provision-streams.sh ``` The provisioner is idempotent, creates consumer groups with `MKSTREAM`, and warns when an existing stream length exceeds the intended bound. ## Verify ```bash REDIS_HOST=localhost REDIS_PORT=6379 \ REDIS_PASSWORD_FILE="${CARACAL_SECRETS_DIR:-${CARACAL_HOME:-$HOME/.local/share/caracal}/secrets}/redisPassword" \ bash infra/redis/scripts/verify.sh ``` Verification checks stream existence, consumer group existence, and provisioner idempotency. ## Event Path ```mermaid flowchart LR API[API transaction] --> Outbox[event_outbox] Outbox --> Redis[Redis Streams] Coordinator[Coordinator outbox] --> Redis STS --> Redis Gateway --> Redis Redis --> Audit[Audit ingestor] Redis --> STSConsumers[STS invalidation/revocation] Redis --> GatewayConsumers[Gateway revocation] ``` Stream messages are signed with `STREAMS_HMAC_KEY` in published modes. ## Durability and Eviction Redis is a correctness-critical store, not a disposable cache: revocation entries and unacknowledged stream messages must survive memory pressure and restarts. Configure every Caracal Redis with: | Setting | Required value | Why | | --- | --- | --- | | `maxmemory-policy` | `noeviction` | Any `allkeys-*`/`volatile-*` policy can silently drop revocation entries and pending stream messages, weakening revocation correctness. | | `appendonly` | `yes` | Persists stream and revocation state across restarts. | | `appendfsync` | `everysec` | Bounds data loss on crash to roughly one second. | The bundled Redis image (`infra/redis/redis.conf`) already sets these. **External or managed Redis is your responsibility** — many managed tiers default to `allkeys-lru`. On startup each consumer service (STS, Gateway, Audit) reads the live `maxmemory-policy` and logs a warning when it is not `noeviction`, so a misconfigured store is visible in service logs immediately. The check never blocks startup and is skipped when the provider restricts `CONFIG GET`. ## Troubleshooting | Symptom | Check | | --- | --- | | Audit DLQ grows | HMAC failures, malformed producers, Redis memory pressure, and Audit database writes. | | Policy changes do not take effect | `caracal.policy.invalidate` group health and STS policy age metrics. | | Revocation is delayed | `caracal.sessions.revoke` pending entries and Gateway/STS consumer health. | | Provisioner warns about length | Plan an explicit stream retention/reset; provisioning does not rewrite stream policy silently. | ## Next Step Use [Scale Capacity](/operations/scale-capacity/) when storage, streams, or service pools approach operational limits. --- # Scale Capacity # URL: https://docs.caracal.run/operations/scale-capacity/ # Markdown: https://docs.caracal.run/markdown/operations/scale-capacity.md # Type: reference # Concepts: # Requires: --- Scale Caracal around three bottlenecks: token exchange and policy evaluation in STS, upstream proxying in Gateway, and durable writes in Postgres/Audit. ## Scaling Levers | Component | Primary levers | | --- | --- | | API | Replicas, `DB_POOL_MAX`, outbox batch/interval settings, request rate limits. | | STS | Replicas, OPA policy age, Redis invalidation health, `MAX_GRANT_TTL_SECONDS`, replay persistence. | | Gateway | Replicas, `MAX_REQUEST_BYTES`, `STS_TIMEOUT`, `UPSTREAM_TIMEOUT`, upstream allowlist, revocation snapshot health. | | Audit | Replicas, Postgres write capacity, DLQ thresholds, retention, S3 export settings. | | Coordinator | Replicas, DB pool, sweeper intervals, outbox batch, service-agent leases. | | Postgres | Connection limits, indexes, partitions, storage IOPS, backup windows. | | Redis | Stream memory, pending entries, consumer lag, persistence, network latency. | ## Helm Defaults The chart defaults to two replicas for API, STS, Gateway, Audit, and Coordinator. Gateway has a higher maximum HPA ceiling because protected traffic fans through it. | Service | Default port | Default max HPA replicas | | --- | --- | --- | | API | `3000` | `8` | | STS | `8080` | `8` | | Gateway | `8081` | `16` | | Audit | `9090` | `8` | | Coordinator | `4000` | `8` | ## Capacity Signals | Signal | Meaning | | --- | --- | | Postgres pool ratio near `0.9` | Service pool saturation; inspect queries and pool size. | | Audit consumer lag | Audit ingestion cannot keep up with Redis stream input. | | Audit replay backlog age | STS/Gateway cannot emit audit events to Redis/Audit promptly. | | Gateway STS circuit open | Gateway is fast-failing because STS exchange is unhealthy. | | Revocation propagation lag | Access-safety state is not reaching Gateway within the expected window. | | Readiness flapping | Pods or dependencies are unstable under current load. | ## Scale Procedure 1. Identify the bottleneck from metrics and logs. 2. Scale stateless service replicas first when storage is healthy. 3. Increase Postgres and Redis capacity before raising service pools. 4. Verify readiness, lag, replay backlog, and DLQ after each change. 5. Document the new limit and alert threshold. ## Worked Escalation Patterns | Signal | First action | Expected outcome | | --- | --- | --- | | Postgres pool ratio stays near `0.9` | Inspect slow queries, then raise storage capacity or service pool size one step. | `/ready` stabilizes and pool saturation falls before replicas increase further. | | Gateway STS circuit opens | Check STS readiness and exchange latency, then scale STS or reduce Gateway fan-in. | Gateway stops fast-failing protected requests and audit evidence resumes. | | Audit consumer lag grows | Check Postgres write IOPS, partition health, and Audit replicas before extending retention or exports. | Redis pending entries and DLQ growth stop increasing. | | Revocation propagation lags | Check Redis latency, stream consumers, and Gateway snapshot freshness. | Gateway denial decisions reflect current revocation state. | ## Troubleshooting | Symptom | First check | | --- | --- | | Higher replicas make failures worse | Postgres connection pressure or Redis latency. | | Gateway latency spikes | STS exchange latency, upstream timeout, JWKS cache, and private upstream egress. | | Audit cannot catch up | Postgres write IOPS, partition health, Audit replicas, and Redis pending entries. | ## Next Step Use [Monitor Health and Metrics](/operations/observability/) to turn capacity signals into readiness gates and operator dashboards. --- # Monitor Health and Metrics # URL: https://docs.caracal.run/operations/observability/ # Markdown: https://docs.caracal.run/markdown/operations/observability.md # Type: workflow # Concepts: # Requires: --- Every service exposes health/readiness endpoints; most services also expose metrics. Use readiness for automation gates and health for liveness. ## Endpoint Map | Service | Health | Readiness | Metrics | | --- | --- | --- | --- | | API | `/health` | `/ready` | `/metrics` | | STS | `/health` | `/ready` | `/metrics`, `/metrics.json` | | Gateway | `/health` | `/ready` | `/metrics`, `/metrics.json` | | Audit | `/health` | `/ready` | `/metrics`, `/metrics.json` | | Coordinator | `/health` | `/ready` | `/metrics` | | Control | `/health` | `/ready` | Not primary operator surface | ## Metrics Authentication In published builds (`CARACAL_MODE=rc` or `stable`), every metrics endpoint fails closed: it returns `401` unless `METRICS_BEARER` is set and the scraper presents `Authorization: Bearer `. This applies to API, STS, Gateway, Audit, and Coordinator, all of which are reachable on the internal service network. `caracal up` generates the managed `metricsBearer` secret and mounts it into every service via `METRICS_BEARER_FILE`; `caracal doctor` discovers the same secret and authenticates its metrics probes automatically. Point external scrapers at the same secret file. In `dev` mode metrics stay open for local inspection. ## Readiness Ladder ```mermaid flowchart TB Health[Process health] --> Storage[Postgres and Redis] Storage --> Streams[Streams, outbox, revocation, policy invalidation] Streams --> Service[Service-specific readiness] Service --> Smoke[End-to-end smoke test] ``` | Rung | What it proves | User-facing impact when it fails | | --- | --- | --- | | Process health | The service process can answer liveness. | Restart loops or dead containers. | | Postgres and Redis | Durable state and stream/cache dependencies are reachable. | Management writes, token exchange, audit, or revocation can fail. | | Streams and outbox | Event delivery paths are draining. | Decisions may succeed while evidence or invalidation lags. | | Service readiness | Service-specific invariants are met. | That service should not receive production traffic. | | End-to-end smoke test | API, STS, Gateway, Audit, and Coordinator work together. | User workflows may fail even when individual services look healthy. | ## Local Checks ```bash caracal status caracal status --ready bash infra/scripts/smokeTest.sh ``` `smokeTest.sh` probes API `/ready` and `/health`, Gateway `/ready`, STS `/ready`, Audit `/ready`, and Coordinator `/ready`. ## Kubernetes Checks ```bash kubectl -n caracal get pods kubectl -n caracal get servicemonitor,prometheusrule kubectl -n caracal logs deploy/caracal-api ``` Enable `serviceMonitor.enabled` when using Prometheus Operator. Keep chart alert rules enabled or provide equivalent alerts. ## Operator Diagnostics Use Console `diagnostics` for API health, readiness, zone diagnostics, and local preflight checks. Use Console `audit` and `request trace` views for decision investigation. ## Troubleshooting | Symptom | Check | | --- | --- | | Health passes but readiness fails | Dependency, stream, outbox, policy, revocation, or audit readiness. | | Metrics scrape returns unauthorized | Published builds require `METRICS_BEARER`; set it and match the scraper's bearer token. | | Readiness flaps | CPU throttling, OOM, Postgres/Redis latency, probe timeouts, or dependency restarts. | | Smoke test fails only for API `/health` | Confirm the API liveness endpoint responds in the deployment shape being tested. | ## Next Step Use [Configure Alerts](/operations/alerts/) to wire the readiness, audit, revocation, policy, and capacity signals into on-call response. --- # Configure Alerts # URL: https://docs.caracal.run/operations/alerts/ # Markdown: https://docs.caracal.run/markdown/operations/alerts.md # Type: reference # Concepts: # Requires: --- The Helm chart can render PrometheusRule alerts for policy freshness, audit health, outbox health, Gateway exchange, revocation safety, database saturation, and readiness flapping. ## Built-In Alerts | Alert | Severity | First response | | --- | --- | --- | | `CaracalSTSOPACompileErrors` | warning | Inspect policy activation history and STS logs; affected policy loads fail closed. | | `CaracalSTSPolicyBundleStale` | warning | Check Redis policy invalidation and STS policy age. | | `CaracalSTSProviderRefreshErrors` | warning | Check Redis coordination and upstream provider availability. | | `CaracalSTSProviderCircuitOpen` | warning | Restore provider or credential health; provider-backed grants fail closed. | | `CaracalAuditDLQNonEmpty` | warning | Inspect Audit DLQ records and producer HMAC/signature failures. | | `CaracalAuditDLQGrowth` | critical | Stop risky rollouts and recover Audit/Redis/Postgres. | | `CaracalAuditConsumerLagHigh` | warning | Scale Audit or restore Postgres/Redis performance. | | `CaracalGatewayAuditReplayBacklogOld` | warning | Recover Redis/Audit and confirm Gateway replay drains. | | `CaracalSTSAuditReplayBacklogOld` | warning | Recover Redis/Audit and confirm STS replay drains. | | `CaracalAuditTamperDetected` | critical | Treat as a security incident. | | `CaracalAPIOutboxDeadMessages` | critical | Reconcile outbox rows and Redis consumers before continuing rollouts. | | `CaracalAPIOutboxPendingOldest` | warning | Check Redis health, API outbox workers, DB pool, and stream memory. | | `CaracalGatewaySTSExchangeErrors` | warning | Check Gateway readiness, STS readiness, bindings, and service exchange key. | | `CaracalGatewaySTSCircuitOpen` | critical | Restore STS or service exchange before routing protected traffic. | | `CaracalGatewayRevocationSnapshotStale` | critical | Treat as access-safety incident until snapshot freshness returns. | | `CaracalGatewayRevocationPropagationLag` | critical | Restore revocation stream consumers and pending-entry reclamation. | | `CaracalGatewayRevocationReloadErrors` | critical | Fix Postgres/Redis snapshot load before relying on protected routes. | | `CaracalPostgresPoolSaturation` | warning | Inspect long queries, pool sizing, and connection leaks. | | `CaracalReadinessFlapping` | warning | Check dependencies, throttling, OOM, and probe timeouts. | ## CaracalSTSOPACompileErrors STS is reporting OPA policy compilation failures. Inspect policy activation history and STS logs for the failing bundle; affected policy loads fail closed until the bundle compiles successfully. ## CaracalSTSPolicyBundleStale STS policy age is above the configured freshness threshold. Check Redis policy invalidation, PostgreSQL polling, and STS readiness before relying on newly activated policies. ## CaracalSTSProviderRefreshErrors STS provider refresh coordination is failing. Restore Redis coordination and upstream provider availability; provider-backed grants may fail closed while refresh results cannot be coordinated. ## CaracalSTSProviderCircuitOpen STS has opened a provider refresh circuit after repeated provider failures. Restore the provider or credential health before retrying provider-backed grants. ## CaracalAuditDLQNonEmpty The Audit dead-letter queue contains records. Inspect Audit logs, DLQ records, producer HMAC/signature failures, and Redis stream health before treating evidence capture as healthy. ## CaracalAuditDLQGrowth The Audit dead-letter queue is growing. Stop risky rollouts and recover Audit, Redis, or Postgres before continuing changes that depend on complete audit evidence. ## CaracalAuditConsumerLagHigh Audit ingestion is behind the Redis stream. Scale Audit or restore Postgres and Redis performance until consumer lag returns below the configured threshold. ## CaracalGatewayAuditReplayBacklogOld Gateway has audit replay files waiting on disk beyond the configured threshold. Recover Redis or Audit and confirm Gateway replay drains before continuing risky rollouts. ## CaracalSTSAuditReplayBacklogOld STS has audit replay files waiting on disk beyond the configured threshold. Recover Redis or Audit and confirm STS replay drains before continuing risky rollouts. ## CaracalAuditTamperDetected Audit chain verification detected a mismatch, ordering break, or stored HMAC failure. Treat this as a security incident and preserve evidence before attempting recovery. ## CaracalAPIOutboxDeadMessages The API transactional outbox has abandoned messages. Recover Redis or stream consumers, inspect API logs, and reconcile affected outbox rows before continuing rollouts. ## CaracalAPIOutboxPendingOldest The API transactional outbox is not draining promptly. Check Redis health, API outbox workers, database pool saturation, and stream memory pressure. ## CaracalGatewaySTSExchangeErrors Gateway cannot reliably exchange mandates with STS. Check Gateway readiness, STS readiness, route bindings, and the service exchange key before routing protected traffic. ## CaracalGatewaySTSCircuitOpen Gateway is fast-failing STS exchanges after repeated STS-unavailable failures. Restore STS or service exchange health before routing protected traffic that requires exchanged authority. ## CaracalGatewayRevocationSnapshotStale Gateway cannot prove it has a fresh revocation baseline. Treat this as an access-safety incident until Postgres and Redis revocation snapshot loading are healthy. ## CaracalGatewayRevocationPropagationLag Revocation stream messages are reaching Gateway outside the configured safety window. Restore Redis consumer health and confirm pending entries are reclaimed. ## CaracalGatewayRevocationReloadErrors Gateway cannot refresh revocation state from Postgres. Treat this as an access-safety incident until reloads succeed and snapshot freshness returns. ## CaracalPostgresPoolSaturation A service is holding most of its Postgres connection pool. Inspect long-running queries, statement timeouts, pool sizing, and connection leaks before the pool exhausts. ## CaracalReadinessFlapping A Caracal pod is repeatedly toggling Ready or NotReady. Check dependency health, CPU throttling, OOM events, and readiness probe timeouts. ## Alert Routing | Severity | Route | | --- | --- | | Critical access-safety or tamper alerts | Security incident response. | | Critical data-flow alerts | On-call plus release owner; freeze rollouts. | | Warning readiness/capacity alerts | Service owner or platform queue. | ## First-Response Rule When an alert involves audit, revocation, or policy freshness, prefer fail-closed recovery over traffic continuation. Restore evidence and revocation guarantees before resuming high-risk changes. ## Next Step Use [Troubleshoot by Symptom](/operations/troubleshooting/) when an alert or user report starts from an SDK error, HTTP status, or request ID. --- # Troubleshoot by Symptom # URL: https://docs.caracal.run/operations/troubleshooting/ # Markdown: https://docs.caracal.run/markdown/operations/troubleshooting.md # Type: workflow # Concepts: # Requires: --- Most Caracal failures are easy to fix once you know **where** they happened. This page starts from the symptom you can see — usually an SDK error or an HTTP status — and routes you to the failing surface, the object to inspect, and the tool that confirms it. It complements [Debug Infrastructure Issues](/operations/debugging/), which works from infrastructure inward; start here when you have an application-level failure and an error or request ID. ## Locate the Surface First ```mermaid flowchart TD Start[SDK or HTTP call failed] --> HasID{Have a request ID?} HasID -->|Yes| Trace[Open Console request trace] HasID -->|No| Status{caracal status --ready healthy?} Status -->|No| Infra[Infrastructure or readiness issue] Status -->|Yes| Symptom{What did you get back?} Symptom -->|Config or startup error| Config[SDK / runtime config] Symptom -->|401| AuthN[Authentication surface] Symptom -->|403 at token exchange| STS[STS policy decision] Symptom -->|403 at the resource| Verifier[Gateway or in-process verifier] Symptom -->|Call succeeded, no audit| Audit[Audit pipeline] Trace --> Symptom ``` The Console **request trace** is the fastest disambiguator: it ties one request ID to the resource, subject, scopes, determining policies, diagnostics, and result, so you can see whether the denial came from STS policy or from a resource verifier. :::note[FAQ] [What is the difference between a 403 from STS and a 403 from Gateway?](/reference/faq/#faq-016) ::: ## Route by Symptom | Symptom | Surface | Inspect | Tool | | --- | --- | --- | --- | | SDK throws `config_missing` or cannot start | SDK / runtime config | Runtime profile, secret file, and required environment variables. | `caracal status --json`, profile file | | `401` from API or Console | Authentication | Admin or Control token, or workload credential source. | [Use the Console](/runtime-console/console/) credential inputs | | `403` from a token exchange (`access_denied`) | STS policy | Active policy-set activation, application ID, subject, requested scopes, and grant. | `request trace`, Console `audit` | | `403` at the resource (`invalid_token`, `scope_insufficient`) | Gateway or in-process verifier | Mandate validity, issuer/JWKS trust, `X-Caracal-Resource`, binding, and verifier scope checks. | [Proxy Through Gateway](/api/gateway/), verifier logs | | `scope_insufficient` even after a policy allow | Resource definition | Whether the resource defines the scope and the policy authorizes it. | Console `resource`, `request trace` | | Call works but no audit event appears | Audit pipeline | Selected zone, time window, Redis stream health, and Audit readiness. | Console `diagnostics`, `audit` | | Access continues after revocation | Resource verifier | Whether revocation consumers run on the resource server. | [Sessions and Revocation](/concepts/sessions-revocation/) | ## Denied SDK Call Steps When an SDK call returns access denied and you are not sure why, walk these in order. Each step either fixes the problem or hands you to the next surface. 1. **Confirm the runtime is healthy.** Run `caracal status --ready`. If a dependency is down, fix that first. 2. **Confirm configuration.** Check the runtime profile, secret file, selected zone, and resource identifier the SDK is using. A wrong resource ID or zone produces denials that look like policy failures. 3. **Trace the request.** Open Console **request trace** with the request ID. It shows the resource, subject, requested scopes, determining policies, and result. 4. **Check the grant and policy.** Confirm a grant binds the application and subject to the resource scopes, and that the active policy set authorizes them. See [Debug Authorization Decisions](/guides/authorize-access/). 5. **Check policy-set activation.** A policy that exists but is not the active set will not take effect. See [Activate a Policy Set](/guides/activate-policy-set/). 6. **Check the requested scopes.** Confirm the SDK requests scopes the resource defines and the policy allows; a typo'd scope denies silently. 7. **Check the Gateway route or verifier.** If STS allowed the exchange but the resource rejected it, inspect the Gateway binding and upstream allowlist, or the in-process verifier's scope and identity checks. See [Proxy Through Gateway](/api/gateway/). 8. **Confirm the audit trace.** Open Console `audit` for the request ID to read the final decision and diagnostics. ## Diagnostic Bundle Caracal exposes the triage bundle through existing surfaces rather than a separate command. Gather these together when you open an incident or escalate: | Item | Where | | --- | --- | | Config summary and selected zone | `caracal status --json`, runtime profile | | Service readiness and dependencies | Console `diagnostics` (Doctor: `health`, `readiness`) | | Visible zones, resources, and policy sets | Console `diagnostics` (Doctor: `zones`) | | Deployment and operator environment checks | Console `diagnostics` (Doctor: `preflight`) | | Resource-to-provider binding and policy allow | [Check Provider Readiness](/examples/provider-preflight/) (`examples/providerPreflight`) | | Last denial and request trace | Console `audit` and `request trace` | The Console `diagnostics` Doctor view runs `health`, `readiness`, `zones`, and `preflight` in one place, including strict readiness for automation gates. Use it as the single bundle the rest of this page points back to. :::note[FAQ] [Where is the diagnostic bundle or doctor command?](/reference/faq/#faq-017) ::: ## Related Pages - [Error Codes](/reference/errors/) - [Debug Infrastructure Issues](/operations/debugging/) - [Inspect Diagnostics and Audit](/runtime-console/observability/) - [Debug Authorization Decisions](/guides/authorize-access/) - [Recover from Failures](/operations/failure-modes/) ## Next Step Use [Debug Infrastructure Issues](/operations/debugging/) when the symptom points to readiness, storage, streams, service configuration, or deployment state. --- # Debug Infrastructure Issues # URL: https://docs.caracal.run/operations/debugging/ # Markdown: https://docs.caracal.run/markdown/operations/debugging.md # Type: workflow # Concepts: # Requires: --- Debug from the boundary inward: runtime lifecycle, readiness, storage, streams, service config, then request-specific audit evidence. For an application-level failure where you have an error or request ID but not the failing surface, start from the symptom-first [Troubleshoot by Symptom](/operations/troubleshooting/) instead. ## Triage Flow ```mermaid flowchart TD Symptom[Symptom] --> Ready[caracal status --ready or Kubernetes readiness] Ready --> Storage[Postgres and Redis] Storage --> Streams[Redis streams and outboxes] Streams --> Service[Service logs and metrics] Service --> Audit[Audit/explain request evidence] Audit --> Fix[Apply focused remediation] ``` ## Commands | Environment | Commands | | --- | --- | | Local | `caracal status --ready`, `docker compose ps`, `docker compose logs ` | | Helm | `kubectl -n caracal get pods,svc,jobs`, `kubectl -n caracal describe pod `, `kubectl -n caracal logs ` | | Storage | `infra/postgres/scripts/validateMigrations.sh`, `infra/redis/scripts/verify.sh` | | App-level | Console `diagnostics`, `audit`, and `request trace` | ## Common Cases | Symptom | Likely area | | --- | --- | | `401` or `403` from API | Admin token, scope, Control token, or workload credential source. | | STS exchange fails | Zone ID, application ID, grant, policy, client secret, step-up, or STS readiness. | | Gateway fails before upstream | Mandate verification, STS exchange, binding, revocation snapshot, or upstream allowlist. | | Agent views fail | Coordinator URL/token, selected zone, or Coordinator readiness. | | Audit event missing | Redis stream health, Audit readiness, DLQ, replay backlog, or request never reaching protected boundary. | ## Request Investigation 1. Capture request ID, zone ID, subject, resource, and timestamp. 2. Open Console `audit` or `request trace`. 3. Confirm policy decision, scopes, target resource, session, agent session, and delegation edge. 4. Compare token claims with resource-server verifier settings. 5. Check revocation and step-up state when authority appears valid but access fails. ## Escalation Bundle Include readiness output, relevant logs, service versions, Helm values diff, Redis stream status, Postgres migration status, request ID, audit explanation, and any recent secret or policy changes. ## Next Step Use [Recover from Failures](/operations/failure-modes/) when infrastructure debugging identifies a degraded dependency, service, stream, or safety guarantee. --- # Recover from Failures # URL: https://docs.caracal.run/operations/failure-modes/ # Markdown: https://docs.caracal.run/markdown/operations/failure-modes.md # Type: workflow # Concepts: # Requires: --- Caracal is designed to fail closed for access-safety boundaries. Recovery should restore evidence, revocation, and policy freshness before reopening risky traffic. ## Failure Matrix | Failure | User impact | Recovery | | --- | --- | --- | | Postgres unavailable | API/STS/Gateway/Audit/Coordinator readiness fails or degrades. | Restore Postgres, confirm migrations, check pools, replay outboxes. | | Redis unavailable | Streams, revocation, policy invalidation, audit ingestion, and coordination lag. | Restore Redis, verify streams/groups, watch pending entries and replay backlog. | | STS unavailable | Token exchange and Gateway exchanges fail. | Restore STS readiness, policy bundle freshness, JWKS, and HMAC config. | | Gateway unhealthy | Protected upstream traffic fails before provider dispatch. | Check bindings, STS exchange, revocation snapshot, upstream allowlist, and audit replay. | | Audit unhealthy | Evidence ingestion delayed; DLQ or replay grows. | Recover Audit/Postgres/Redis, replay DLQ, verify tamper chain. | | Coordinator unhealthy | Agent/delegation views and lifecycle management fail. | Restore Coordinator readiness, token, DB, Redis, and sweeper health. | | Control unhealthy | Automation dispatch unavailable. | Use Console/Admin SDK directly if appropriate; restore Control gate, token, JWKS, Redis, and API reachability. | ## Recovery Order ```mermaid flowchart TD Freeze[Freeze risky rollouts] --> Storage[Restore Postgres and Redis] Storage --> Services[Restore STS, API, Gateway, Audit, Coordinator] Services --> Evidence[Drain audit replay, DLQ, outboxes] Evidence --> Safety[Confirm revocation and policy freshness] Safety --> Resume[Resume traffic or rollout] ``` ## Data-Flow Reconciliation 1. Check Postgres migrations and expected tables. 2. Check Redis stream groups and pending entries. 3. Check API and Coordinator outbox tables. 4. Check Audit DLQ and audit replay directories. 5. Check Gateway revocation snapshot freshness. ## Troubleshooting | Symptom | Action | | --- | --- | | Requests fail closed after outage | Confirm STS policy freshness, revocation snapshot, and audit replay drain. | | Events appear duplicated | Verify idempotent consumers and dedupe keys before manual cleanup. | | Rollback does not restore service | Schema may have moved forward; roll forward with a compatible fix. | ## Next Step Use [Run Failure Drills](/operations/failure-drills/) to rehearse the recovery path before a production incident. --- # Run Failure Drills # URL: https://docs.caracal.run/operations/failure-drills/ # Markdown: https://docs.caracal.run/markdown/operations/failure-drills.md # Type: workflow # Concepts: # Requires: --- A failure drill injects one fault, confirms the **expected alert** fires, observes readiness behavior, then validates recovery. Run these against a non-production cluster before you depend on Caracal in production. The alerts referenced here are the [PrometheusRule recipes](/operations/alerts/) shipped by the chart; the recovery steps extend [Recover from Failures](/operations/failure-modes/). Caracal fails closed for access-safety boundaries, so a healthy drill ends with evidence, revocation, and policy freshness restored **before** traffic resumes. ## How to Run a Drill 1. Confirm a green baseline: all `/ready` endpoints pass and no alerts are firing. 2. Inject the fault from the drill. 3. Confirm the expected alert transitions to firing within its `for` window. 4. Observe the documented readiness and traffic behavior. 5. Remove the fault and complete the recovery validation. 6. Record time-to-detect and time-to-recover. ## Drill Catalog ### Redis outage | Field | Value | | --- | --- | | Inject | Scale managed Redis to unavailable, or block the Redis egress port. | | Expected alerts | `CaracalAuditConsumerLagHigh`, `CaracalGatewayRevocationSnapshotStale`, `CaracalGatewayRevocationReloadErrors`, and replay backlog via `CaracalGatewayAuditReplayBacklogOld` / `CaracalSTSAuditReplayBacklogOld`. | | Expected behavior | Streams, revocation refresh, and audit ingestion lag; STS and Gateway write audit to replay volumes; readiness degrades for affected services. | | Recover | Restore Redis, verify streams and consumer groups, drain replay backlog and DLQ, confirm revocation snapshot is fresh before resuming risky traffic. | ### Postgres outage | Field | Value | | --- | --- | | Inject | Fail over or block the managed Postgres endpoint. | | Expected alerts | `CaracalPostgresPoolSaturation`, `CaracalAPIOutboxPendingOldest`, and `CaracalReadinessFlapping`. | | Expected behavior | API, STS, Gateway, Audit, and Coordinator readiness fail or degrade; outbox delivery stalls. | | Recover | Restore Postgres, confirm migrations applied, check connection pools, replay outboxes, confirm readiness stabilizes. | ### STS unreachable from Gateway | Field | Value | | --- | --- | | Inject | Scale STS to zero or block Gateway-to-STS traffic. | | Expected alerts | `CaracalGatewaySTSExchangeErrors` then `CaracalGatewaySTSCircuitOpen`. | | Expected behavior | Gateway token exchanges fail closed; protected upstream traffic is rejected before provider dispatch. | | Recover | Restore STS readiness, JWKS, HMAC config, and policy bundle freshness; confirm the circuit closes and a canary exchange succeeds. | ### Stale policy bundle | Field | Value | | --- | --- | | Inject | Pause policy distribution or hold the STS policy bundle past its freshness budget. | | Expected alerts | `CaracalSTSPolicyBundleStale`. | | Expected behavior | STS keeps evaluating the last good bundle; new policy activations do not take effect. | | Recover | Restore distribution, confirm the active policy set version, and verify the alert clears. | ### Bad policy activation | Field | Value | | --- | --- | | Inject | Activate a Rego policy set with a compile error in a test zone. | | Expected alerts | `CaracalSTSOPACompileErrors`. | | Expected behavior | STS rejects the broken bundle and continues on the last good policy; the activation does not widen access. | | Recover | Roll the policy set forward to a valid version and confirm the alert clears. | ### Provider credential refresh failure | Field | Value | | --- | --- | | Inject | Revoke or expire the upstream provider credential the Gateway brokers. | | Expected alerts | `CaracalSTSProviderRefreshErrors` then `CaracalSTSProviderCircuitOpen`. | | Expected behavior | Provider-backed exchanges fail closed; resources without brokered credentials are unaffected. | | Recover | Restore the provider credential, confirm refresh succeeds, and verify the circuit closes. | ### Revocation propagation lag | Field | Value | | --- | --- | | Inject | Revoke a session and delay the revocation stream consumer. | | Expected alerts | `CaracalGatewayRevocationPropagationLag`. | | Expected behavior | Revocation eventually reaches the Gateway; the drill measures the propagation window against your budget. | | Recover | Clear the consumer delay, confirm the revoked session is denied at the Gateway, and verify the alert clears. | ### Audit evidence pipeline backpressure | Field | Value | | --- | --- | | Inject | Pause the audit consumer or fault the audit datastore. | | Expected alerts | `CaracalAuditDLQNonEmpty`, `CaracalAuditDLQGrowth`, `CaracalAPIOutboxDeadMessages`. | | Expected behavior | Evidence ingestion is delayed; DLQ and outbox backlogs grow but are not lost. | | Recover | Restore the consumer and datastore, replay the DLQ and outboxes, and verify the tamper chain is intact. | ### Audit tamper detection | Field | Value | | --- | --- | | Inject | In a disposable environment only, modify a stored audit record out of band. | | Expected alerts | `CaracalAuditTamperDetected`. | | Expected behavior | The integrity check flags the break; treat as a [security incident](/operations/incident-response/). | | Recover | Restore from a trusted backup, identify the blast radius, and follow incident response. | ## After the Drill For every drill, confirm the green baseline returns: readiness passes, the alert clears, the DLQ and replay backlogs are empty, and revocation and policy freshness are confirmed. Capture the detect and recover timings so you can set realistic alert thresholds and on-call expectations. ## Next Step Use [Back Up and Retain Data](/operations/backup-retention/) to make sure recovery evidence, secrets, audit records, and durable state can be restored. --- # Back Up and Retain Data # URL: https://docs.caracal.run/operations/backup-retention/ # Markdown: https://docs.caracal.run/markdown/operations/backup-retention.md # Type: workflow # Concepts: # Requires: --- Backups must preserve both access state and evidence state. A restore that loses audit, revocation, keys, or delegation data can produce unsafe or unauditable behavior. ## What to Protect | Asset | Why it matters | | --- | --- | | Postgres database | Product state, policies, grants, sessions, audit events, agents, delegations, outboxes, gateway bindings. | | Runtime secrets | Database/Redis credentials, admin token, Coordinator token, zone KEK, HMAC keys, service exchange keys. | | STS/Gateway replay volumes | Audit replay files during Redis/Audit outages. | | Redis snapshot or managed backup | Optional operational recovery for stream pending entries; Postgres remains authoritative. | | Audit exports | Long-term evidence and SIEM/compliance integration. | ## Backup Flow ```mermaid flowchart LR PG[(Postgres)] --> Backup[Encrypted backup] Secrets[Secret manager] --> Backup Replay[Replay volumes] --> Backup Audit[Audit export] --> Archive[Immutable archive] Backup --> RestoreTest[Scheduled restore test] ``` ## Retention Controls | Area | Controls | | --- | --- | | Audit database | `AUDIT_RETENTION_DAYS`, partitions, audit export watermarks. | | Coordinator data | `DELEGATION_RETENTION_DAYS`, `OUTBOX_RETENTION_DAYS`, sweeper intervals. | | Redis streams | Provisioner intended max lengths and managed Redis retention. | | Backups | Platform backup policy and legal/compliance requirements. | ## Restore Validation 1. Restore Postgres into an isolated environment. 2. Restore required secrets into the environment secret store. 3. Run migration verification. 4. Start services and verify `/ready`. 5. Confirm audit query, policy-set activation state, Gateway bindings, sessions, agents, and delegation records. 6. Run a canary token exchange and protected Gateway request. ## Troubleshooting | Symptom | Check | | --- | --- | | Restored STS cannot decrypt keys | `ZONE_KEK` does not match the database secrets. | | Audit chain verification fails | Missing audit rows, wrong `AUDIT_HMAC_KEY`, or partial restore. | | Gateway cannot route | Missing gateway binding rows or stale binding revision. | | Revocation state is incomplete | Restore Postgres revocation/session state and replay Redis revocation events where needed. | ## Next Step Use [Respond to Incidents](/operations/incident-response/) to define containment, evidence preservation, and recovery validation. --- # Respond to Incidents # URL: https://docs.caracal.run/operations/incident-response/ # Markdown: https://docs.caracal.run/markdown/operations/incident-response.md # Type: workflow # Concepts: # Requires: --- Treat incidents involving audit tamper evidence, stale revocation state, STS/Gateway fail-open risk, secret exposure, or policy corruption as security incidents. ## Severity Triggers | Trigger | Severity | | --- | --- | | `CaracalAuditTamperDetected` | Critical security incident. | | Gateway revocation snapshot stale or propagation lag | Critical access-safety incident. | | Gateway STS circuit open during protected traffic | Critical availability/access incident. | | Audit DLQ growth or replay backlog aging | Critical evidence pipeline incident when sustained. | | Secret exposure | Critical until rotated and blast radius is understood. | | Policy activation causes unexpected broad allow | Critical authorization incident. | ## Response Flow ```mermaid flowchart TD Detect[Detect alert or report] --> Contain[Contain risky traffic and freeze rollouts] Contain --> Preserve[Preserve logs, audit, DB, Redis, and config evidence] Preserve --> Diagnose[Diagnose root cause] Diagnose --> Recover[Recover service and safety invariants] Recover --> Validate[Validate readiness, audit, revocation, and policy] Validate --> Review[Post-incident review] ``` ## First-Hour Runbook | Symptom | Severity | Contain | Preserve | Validate recovery | | --- | --- | --- | --- | --- | | Audit tamper alert | Critical | Freeze changes and stop destructive cleanup. | Audit tables, exports, DLQ, service logs. | Tamper sweep result, audit ingestion, and timeline are recorded. | | Revocation stale | Critical | Fail closed at Gateway or resource servers. | Redis streams, revocation snapshots, session IDs. | Revoked sessions are denied and readiness is stable. | | Bad policy allow | Critical | Activate last known-good policy set. | Policy set version, request IDs, audit/explain output. | Canary allow/deny decisions match expected policy. | | Secret exposure | Critical | Rotate exposed material and invalidate affected sessions. | Secret location, key IDs, affected services, audit records. | Old material no longer authenticates. | ## Evidence to Preserve - Alert name, firing time, and labels. - Request IDs, zone IDs, resource IDs, policy-set versions, agent session IDs, delegation edge IDs. - Console audit/explain output. - Service logs for API, STS, Gateway, Audit, Coordinator, and Control. - Redis stream status, pending entries, and DLQ contents. - Postgres backup or snapshot before manual remediation. - Helm values or Compose config diff. ## Containment Actions | Incident | Containment | | --- | --- | | Audit tamper | Freeze changes, preserve database and audit exports, rotate only after evidence capture. | | Revocation stale | Fail closed at resource servers or Gateway, restore Redis/Postgres consumers, verify snapshot freshness. | | Bad policy allow | Activate last known-good policy set, verify canary denies/allows, inspect audit impact. | | Secret exposure | Rotate exposed material, roll dependent services, invalidate sessions/grants as needed. | ## Recovery Complete When - Relevant `/ready` endpoints pass. - Audit DLQ/replay/outbox backlogs are understood or drained. - Revocation and policy freshness metrics are healthy. - Canary token exchange and Gateway requests match expected decisions. - Evidence and timeline are preserved for review. ## Next Step Use [Plan a Platform Rollout](/operations/platform-rollout-kit/) when the environment is stable and ready for controlled change. --- # Plan a Platform Rollout # URL: https://docs.caracal.run/operations/platform-rollout-kit/ # Markdown: https://docs.caracal.run/markdown/operations/platform-rollout-kit.md # Type: workflow # Concepts: # Requires: --- Use this runbook for version upgrades, configuration changes, chart changes, secret rotations, ingress changes, and policy rollout waves. ## Rollout Gates | Gate | Pass condition | | --- | --- | | Source verification | Values, env vars, ports, and secrets match current chart and service config. | | Render validation | `helm lint` and `helm template` succeed with environment values. | | Migration safety | Migrations are forward-only and complete before service rollout. | | Readiness | API, STS, Gateway, Audit, and Coordinator `/ready` endpoints pass. | | Event health | Redis streams, outbox dispatch, audit ingestion, revocation propagation, and replay backlog are healthy. | | Rollback plan | Previous image tag, chart revision, values, and compatible schema plan are documented. | ## Rollout Sequence ```mermaid sequenceDiagram participant Owner as Release owner participant Helm as Helm participant DB as Postgres participant Pods as Caracal pods participant Obs as Observability Owner->>Helm: render and diff Helm->>DB: run migration Job Helm->>Pods: roll services Pods->>Obs: expose health, readiness, metrics Owner->>Obs: confirm gates ``` ## Execution ```bash helm -n caracal diff upgrade caracal infra/helm/caracal -f values.production.yaml helm -n caracal upgrade caracal infra/helm/caracal -f values.production.yaml kubectl -n caracal rollout status deploy/caracal-api kubectl -n caracal rollout status deploy/caracal-audit kubectl -n caracal rollout status deploy/caracal-coordinator ``` When replay persistence is enabled, STS and Gateway render as StatefulSets: ```bash kubectl -n caracal rollout status statefulset/caracal-sts kubectl -n caracal rollout status statefulset/caracal-gateway ``` ## Stop Conditions Stop the rollout when any of these occur: - Migration Job fails. - STS or Gateway cannot prove readiness. - Audit DLQ grows or replay backlog ages. - Gateway revocation snapshot becomes stale. - API outbox dead messages appear. - Postgres pool saturation or Redis memory pressure persists. ## Rollback Use `helm rollback` only after confirming the previous app version is compatible with the current database schema. If schema compatibility is uncertain, roll forward with a corrected app image instead of applying down migrations. ## Next Step Use [Deploy Policy Changes](/operations/policy-deployment/) for policy-specific rollout gates, simulation, activation, and rollback. --- # Deploy Policy Changes # URL: https://docs.caracal.run/operations/policy-deployment/ # Markdown: https://docs.caracal.run/markdown/operations/policy-deployment.md # Type: workflow # Concepts: # Requires: --- Policy changes affect STS decisions and Gateway access. Treat them as production changes with simulation, activation, audit review, and rollback readiness. ## Deployment Sequence ```mermaid sequenceDiagram participant Author participant Console as Console/Admin API participant API participant Redis participant STS participant Audit Author->>Console: create policy version Console->>API: validate and store version Author->>Console: simulate policy set Console->>API: activate policy set version API->>Redis: publish policy invalidate STS->>Redis: consume invalidate STS->>Audit: emit decisions using current bundle ``` ## Pre-Deployment 1. Confirm the policy uses current resource IDs, scopes, and canonical terminology. 2. Validate syntax and invariants through Console or Admin API. 3. Simulate expected allow and deny cases. 4. Confirm STS readiness and policy age metrics are healthy. 5. Prepare rollback policy-set version. ## Activation Use Console `policy` and `policy set` views or Admin SDK/API automation. Do not use top-level `caracal` runtime commands for policy management. ## Verification | Check | Expected | | --- | --- | | API activation response | New policy-set version is active. | | Redis `caracal.policy.invalidate` | STS consumers receive invalidation. | | STS policy age | Returns under alert threshold. | | Audit decisions | Expected allow/deny records appear for canary requests. | | Gateway behavior | Protected upstream access follows the new decision. | ## Rollback Activate the last known-good policy-set version. Then verify STS policy freshness, canary decisions, audit records, and Gateway behavior. ## Troubleshooting | Symptom | Check | | --- | --- | | Activation succeeds but decisions do not change | Policy invalidation stream, STS policy age, and STS logs. | | Simulation differs from live decisions | Input shape, active grants, resource IDs, subject/session claims, and step-up state. | | Policy fails closed unexpectedly | Rego compile errors, missing scope/resource, or invalid grant. | ## Next Step Use [Upgrade Caracal](/operations/upgrade/) for image, chart, migration, and runtime configuration upgrades. --- # Upgrade Caracal # URL: https://docs.caracal.run/operations/upgrade/ # Markdown: https://docs.caracal.run/markdown/operations/upgrade.md # Type: workflow # Concepts: # Requires: --- Caracal upgrades are forward-only at the database layer. Every release ships **expand-only** schema changes: new migrations are backward compatible with the previous application version, so the database can be migrated while the old version is still serving. This removes the maintenance window — the schema moves first, then services roll onto it. Contract-phase changes (dropping, renaming, retyping, or tightening a column to `NOT NULL`) are split into a later release and must be tagged `-- caracal:phase contract`. CI (`validateMigrations.sh`) rejects untagged contract changes so the no-window guarantee cannot silently regress. ## One-Command Upgrade (Runtime) For the Compose-based runtime, `caracal upgrade` performs the full expand-then-roll sequence in order: ```bash caracal upgrade ``` It stages the new images (`pull`, or `build` in dev), applies the expand-phase migrations against the running database, then rolls the services and waits for readiness. If the migration step fails it aborts before touching services, leaving the previous version running. Pass `--no-pull` to roll already-staged images without fetching. ## Pre-Upgrade 1. Read release notes and changed chart values. 2. Render the new Helm/Compose configuration. 3. Confirm secrets and required env vars still match service config. 4. Back up Postgres and runtime secrets. 5. Confirm Redis streams/groups and audit health. 6. Prepare canary token exchange and Gateway request. ## Helm Upgrade ```bash helm -n caracal diff upgrade caracal infra/helm/caracal -f values.production.yaml helm -n caracal upgrade caracal infra/helm/caracal -f values.production.yaml kubectl -n caracal get jobs kubectl -n caracal rollout status deploy/caracal-api ``` The migration Job runs as a pre-install/pre-upgrade hook when migrations are enabled. ## Compose Upgrade (Manual) `caracal upgrade` automates this. To run it by hand: ```bash export CARACAL_VERSION= docker compose -f infra/docker/runtime-compose.yml pull docker compose -f infra/docker/runtime-compose.yml run --rm dbMigrate docker compose -f infra/docker/runtime-compose.yml up -d bash infra/scripts/smokeTest.sh ``` ## Post-Upgrade Validation | Check | Expected | | --- | --- | | `/ready` endpoints | API, STS, Gateway, Audit, Coordinator pass. | | Migrations | `schema_migrations` includes the expected versions. | | Redis | Streams and consumer groups exist; pending entries are bounded. | | Audit | DLQ empty or understood; replay backlog drains. | | Token exchange | STS exchange succeeds for canary application/resource. | | Gateway | Canary protected request reaches upstream and produces audit evidence. | ## Rollback Prefer roll-forward fixes after migrations. If rolling back app images, verify the previous version understands the current schema and config values. ## Troubleshooting | Symptom | Action | | --- | --- | | Migration hook fails | Stop rollout, inspect Job logs, restore only if no migration committed; otherwise roll forward. | | New pods fail config validation | Compare env vars and mounted secret keys with current service config. | | Readiness fails after app rollback | Schema or Redis stream expectations may no longer match the old app. | ## Next Step Use [Export Audit Evidence](/operations/compliance-audit-integration/) when audit retention, SIEM export, or compliance evidence needs a durable path. --- # Export Audit Evidence # URL: https://docs.caracal.run/operations/compliance-audit-integration/ # Markdown: https://docs.caracal.run/markdown/operations/compliance-audit-integration.md # Type: workflow # Concepts: # Requires: --- Caracal audit records provide decision evidence for STS, Gateway, Coordinator, policy evaluation, Control, and administrative activity. Treat audit as a protected evidence pipeline. ## Evidence Path ```mermaid flowchart LR Producer[API, STS, Gateway, Coordinator, Control] --> Redis[caracal.audit.events] Redis --> Audit[Audit service] Audit --> PG[(audit_events)] Audit --> DLQ[caracal.audit.events.dlq] Audit --> Export[S3-compatible export] Export --> SIEM[SIEM or archive] ``` Audit events are HMAC-protected in published modes. Audit storage includes tamper checks and append-only permission validation. ## Integration Points | Need | Surface | | --- | --- | | Search recent events | Console `audit` or Audit API search. | | Trace one request | Console `request trace` or API request ID lookup. | | SIEM export | `caracal.audit.events` consumer or Audit export path. | | Evidence archive | Postgres backup plus object-store export. | | Tamper response | `CaracalAuditTamperDetected` alert and incident runbook. | ## Operational Controls - Monitor Audit DLQ, consumer lag, tamper metrics, and export watermarks. - Preserve audit HMAC keys according to retention requirements. - Keep audit partitions and backup retention aligned with legal hold requirements. - Do not manually mutate `audit_events`; verification expects append-only behavior. ## Troubleshooting | Symptom | Check | | --- | --- | | SIEM is missing events | Redis group lag, export watermark, Audit readiness, and network egress. | | DLQ has HMAC failures | Producer key mismatch or corrupted stream payload. | | Request explanation missing | Confirm request ID, zone, retention window, and that the protected request reached Caracal. | ## Next Step Use [Hand Off to Platform Teams](/operations/platform-team-handoff/) when deployment, recovery, audit, alerting, and ownership evidence are ready. --- # Hand Off to Platform Teams # URL: https://docs.caracal.run/operations/platform-team-handoff/ # Markdown: https://docs.caracal.run/markdown/operations/platform-team-handoff.md # Type: reference # Concepts: # Requires: --- Use this checklist before a platform team accepts production ownership of a Caracal environment. ## Handoff Checklist | Area | Evidence | | --- | --- | | Ownership | Service owner, on-call rotation, escalation path, security owner. | | Versioning | Image tags, chart values, release notes, rollback revision. | | Secrets | Secret inventory, storage location, rotation owner, last rotation date. | | Storage | Postgres backup/restore proof, Redis stream verification, retention policy. | | Network | Ingress, NetworkPolicy, DNS, TLS, upstream egress, provider endpoints. | | Observability | Dashboards, ServiceMonitor, PrometheusRule or equivalent alerts, log access. | | Runbooks | Deployment, upgrade, incident, backup, restore, policy deployment, failure recovery. | | Access | Console/Admin access, Control API credentials, audit access, break-glass policy. | | Compliance | Audit export path, evidence retention, tamper-response plan. | ## Day-2 Operations | Cadence | Action | | --- | --- | | Daily | Check readiness, critical alerts, audit DLQ, outbox health, and revocation freshness. | | Weekly | Review capacity, backups, policy changes, and pending stream entries. | | Monthly | Rotate or review secrets, restore test, audit export review, dependency updates. | | Per release | Render diff, migration review, rollout gates, rollback compatibility check. | ## Escalation Package When escalating, include environment, version, recent changes, readiness output, alert names, request IDs, audit explanations, Redis stream status, migration status, and whether any access-safety guarantee is degraded. --- # Understand Architecture # URL: https://docs.caracal.run/architecture/ # Markdown: https://docs.caracal.run/markdown/architecture.md # Type: landing # Concepts: # Requires: --- Caracal is a pre-execution authority system for AI agents. It separates control-plane management, token exchange, protected-resource routing, audit evidence, and agent/delegation coordination into independent services backed by Postgres and Redis Streams. ## System Overview ```mermaid flowchart LR Console[Console/Admin SDK] --> API[Control-Plane API] Workload[SDK or caracal run] --> STS[STS] Client[Protected request] --> Gateway[Gateway] Agent[Agent SDK] --> Coordinator[Coordinator] API --> Postgres[(Postgres)] STS --> Postgres Gateway --> Postgres Coordinator --> Postgres API --> Redis[(Redis Streams)] STS --> Redis Gateway --> Redis Coordinator --> Redis Redis --> Audit[Audit] Audit --> Postgres Control[Control API] --> API ``` ## Core Design Choices | Choice | Effect | | --- | --- | | Postgres as source of truth | Product state, policy versions, grants, sessions, audit rows, agents, delegations, and outboxes are durable. | | Redis Streams for propagation | Audit, invalidation, revocation, key, agent, invocation, and delegation events move asynchronously. | | STS for mandate issuance | Every protected access path receives a scoped, short-lived mandate. | | Gateway for protected upstreams | Gateway verifies inbound authority, exchanges with STS, blocks unsafe routing, and emits audit evidence. | | Coordinator for agent authority | Agent sessions, service leases, delegation edges, and invocation lifecycle stay explicit. | | Console/API boundary | Human management uses Console; automation uses Control/Admin APIs; runtime CLI remains lifecycle-only. | ## System Flows | Need | Page | | --- | --- | | Identify services, clients, and dependencies | [Map the System](/architecture/system-topology/) | | Understand STS mandate issuance | [Exchange Tokens](/architecture/token-exchange-flow/) | | Understand agent sessions and delegation | [Coordinate Agents](/architecture/delegation-flow/) | ## State and Safety | Need | Page | | --- | --- | | Understand Redis Streams and outboxes | [Propagate Events](/architecture/event-streams/) | | Understand durable data ownership | [Store State](/architecture/storage-model/) | | Understand signing, HMACs, JWKS, and rotation | [Manage Keys](/architecture/crypto-keys/) | | Understand security and command boundaries | [Enforce Boundaries](/architecture/trust-boundaries/) | ## Next Step Start with [Map the System](/architecture/system-topology/) before tracing request and state flows. For service-by-service detail, continue to [Understand Services](/services/). For deployment detail, use [Operations](/operations/). --- # Map the System # URL: https://docs.caracal.run/architecture/system-topology/ # Markdown: https://docs.caracal.run/markdown/architecture/system-topology.md # Type: architecture # Concepts: # Requires: --- Caracal has six primary HTTP services plus Postgres and Redis. ## Topology ```mermaid flowchart TB subgraph Clients Console[Console] SDK[SDKs and caracal run] ResourceClient[Protected-resource clients] Automation[Control/Admin automation] end subgraph Caracal API[API :3000] STS[STS :8080] Gateway[Gateway :8081] Audit[Audit :9090] Coordinator[Coordinator :4000] Control[Control plugin in API, optional] end Postgres[(Postgres)] Redis[(Redis Streams)] Upstream[Protected upstreams] Console --> API Console --> Coordinator SDK --> STS ResourceClient --> Gateway Automation --> Control Control --> API Gateway --> Upstream API --> Postgres STS --> Postgres Gateway --> Postgres Audit --> Postgres Coordinator --> Postgres API --> Redis STS --> Redis Gateway --> Redis Audit --> Redis Coordinator --> Redis ``` ## Service Responsibilities | Service | Responsibility | | --- | --- | | API | Zones, applications, providers, resources, policies, policy sets, grants, step-up challenges, admin audit, and API outbox. | | STS | OAuth token exchange, mandate issuance, policy evaluation, JWKS, step-up status, policy simulation, signing-key rotation internals. | | Gateway | Protected reverse proxy, inbound mandate verification, per-request STS exchange, SSRF guard, revocation checks, audit replay. | | Audit | Redis audit ingestion, DLQ, tamper checks, retention, search, metrics. | | Coordinator | Agent sessions, agent services, delegations, invocations, sweeper jobs, Coordinator outbox. | | Control | Optional remote management invoke endpoint gated by token auth, replay protection, rate limiting, and runtime gate file. | ## Deployment Shapes | Shape | Source | | --- | --- | | Local development | `infra/docker/docker-compose.yml` through `caracal up`. | | Self-hosted Compose | `infra/docker/runtime-compose.yml`. | | Kubernetes | `infra/helm/caracal`. | ## Next Step Use [Exchange Tokens](/architecture/token-exchange-flow/) to follow how workloads receive scoped mandates. ## Related Pages - [Understand Services](/services/) - [Deploy with Docker Compose](/operations/docker-compose/) - [Deploy with Helm](/operations/kubernetes-helm/) --- # Exchange Tokens # URL: https://docs.caracal.run/architecture/token-exchange-flow/ # Markdown: https://docs.caracal.run/markdown/architecture/token-exchange-flow.md # Type: architecture # Concepts: # Requires: --- STS implements the token exchange boundary. Workloads, SDKs, `caracal run`, and Gateway submit existing authority plus requested resources and receive scoped mandates. ## Exchange Sequence ```mermaid sequenceDiagram participant Client as SDK/caracal run/Gateway participant STS participant PG as Postgres participant Redis participant Audit Client->>STS: POST /oauth/2/token STS->>PG: authenticate app, load grants/resources/policies/sessions STS->>Redis: observe invalidation and revocation state STS->>STS: evaluate policy and constraints alt step-up required STS-->>Client: interaction_required with challenge id else allowed STS-->>Client: mandate JWT STS->>Redis: signed audit event Redis->>Audit: audit-ingestor consumes end ``` ## Inputs | Input | Purpose | | --- | --- | | `grant_type` | OAuth token-exchange request type. | | `subject_token` | Existing user, service, application, or agent authority. | | `actor_token` | Optional actor authority for delegated exchange. | | `resource` | Target resource identifiers. | | `scope` | Requested resource scopes. | | `zone_id` | Tenant boundary. | | `application_id` and client credential | Authenticates the calling application. | | `session_id`, `agent_session_id`, `delegation_edge_id` | Authority anchors for session/delegation checks. | | `challenge_id` / `challenge_response` | Step-up completion. | ## TTL Contracts | Mandate type | Contract | | --- | --- | | Resource mandate | Capped at 15 minutes. | | Session mandate | Capped at 60 minutes. | | Runtime-injected credential | `caracal run` injects credentials capped at 15 minutes after STS exchange. | Gateway performs a per-request exchange and rejects inbound tokens that are too close to expiry before proxying. ## Gateway-Authenticated Exchange Gateway exchanges with STS using a request signature, timestamp, and nonce over the token exchange request. STS verifies the Gateway HMAC key and consumes the nonce before trusting the Gateway-authenticated path. ## Next Step Use [Coordinate Agents](/architecture/delegation-flow/) to see how agent sessions and delegation edges feed token exchange. ## Related Pages - [Issue Mandates](/services/sts/) - [Mandates](/concepts/mandate/) - [Step-Up Challenges](/concepts/step-up/) - [Use STS Endpoint](/api/sts/) --- # Coordinate Agents # URL: https://docs.caracal.run/architecture/delegation-flow/ # Markdown: https://docs.caracal.run/markdown/architecture/delegation-flow.md # Type: architecture # Concepts: # Requires: --- Coordinator owns agent runtime state. It records agent sessions, service leases, invocation lifecycle, and delegation edges that STS later uses to validate delegated exchanges. ## Flow ```mermaid sequenceDiagram participant SDK as Agent SDK participant Coord as Coordinator participant PG as Postgres participant Redis as Redis Streams participant STS SDK->>Coord: create or update agent session Coord->>PG: persist session and leases Coord->>Redis: publish caracal.agents.lifecycle SDK->>Coord: create delegation edge Coord->>PG: persist edge and graph epoch Coord->>Redis: publish delegation invalidation SDK->>STS: exchange with agent_session_id/delegation_edge_id STS->>PG: validate session, edge, constraints, and graph state ``` ## Coordinator State | State | Purpose | | --- | --- | | `agent_sessions` | Agent session tree, parent-child relationships, status, leases, and root subject session. | | `agent_services` | Service agents and heartbeat health. | | `delegation_edges` | Source/target sessions, resource/scope bounds, expiry, constraints, and status. | | `agent_invocations` | Invocation lifecycle and deadline tracking. | | `delegation_graph_epochs` | Graph invalidation and traversal consistency. | | `caracal_outbox` | Durable event publication for Coordinator-produced topics. | ## Event Topics | Topic | Meaning | | --- | --- | | `caracal.agents.lifecycle` | Agent session lifecycle. | | `caracal.invocations.lifecycle` | Invocation lifecycle. | | `caracal.delegations.invalidate` | Delegation graph invalidation. | | `caracal.sessions.revoke` | Session, agent, and delegation revocation propagation. | ## Operator Boundaries Human operators use Console `agent session` and `delegation` views. Automation uses Coordinator API/Admin SDK surfaces. Top-level `caracal` runtime commands do not manage agent sessions or delegation. ## Next Step Use [Propagate Events](/architecture/event-streams/) to understand how lifecycle, invalidation, revocation, and audit messages move. ## Related Pages - [Coordinate Agent State](/services/coordinator/) - [Agent Delegation](/concepts/delegation/) - [Manage Agents and Delegation](/runtime-console/agents/) --- # Propagate Events # URL: https://docs.caracal.run/architecture/event-streams/ # Markdown: https://docs.caracal.run/markdown/architecture/event-streams.md # Type: architecture # Concepts: # Requires: --- Caracal uses Postgres outboxes for durable enqueue and Redis Streams for asynchronous delivery. Published modes sign stream messages with `STREAMS_HMAC_KEY`. ## Event Pipeline ```mermaid flowchart LR Tx[Postgres transaction] --> Outbox[event_outbox or caracal_outbox] Outbox --> Redis[Redis Streams] Redis --> Consumers[Consumers] Consumers --> Effects[Policy reload, revocation, audit ingest, relay] Producers[STS/Gateway audit emitters] --> Replay[Audit replay directory] Replay --> Redis ``` ## Streams | Stream | Producers | Consumers | | --- | --- | --- | | `caracal.audit.events` | API, STS, Gateway, Coordinator, Control | Audit `audit-ingestor`, SIEM exporters | | `caracal.audit.events.dlq` | Audit | DLQ observers | | `caracal.policy.invalidate` | API | STS policy loader | | `caracal.sessions.revoke` | API/Coordinator | STS and resource/Gateway revocation consumers | | `caracal.keys.invalidate` | API/STS | STS key caches | | `caracal.agents.lifecycle` | Coordinator | Coordinator lifecycle relay job | | `caracal.invocations.lifecycle` | Coordinator | Invocation observers | | `caracal.delegations.invalidate` | Coordinator | Delegation observers | | `caracal.providers.ratelimit` | Redis provisioner/provider coordination | Provider rate-limit coordination | ## Outbox Behavior | Outbox | Owner | Behavior | | --- | --- | --- | | `event_outbox` | API | Durable enqueue inside API transactions, cooperative dispatcher, signed Redis `XADD`, retry/backoff, dead-row metrics. | | `caracal_outbox` | Coordinator | Dedupe by producer/topic/dedupe key and publishes Coordinator topics. | ## Audit Replay STS and Gateway use replay directories under `/var/lib/caracal/audit-replay`. When Redis or Audit is unavailable, replay files preserve pending audit events so they can drain after recovery. ## Next Step Use [Store State](/architecture/storage-model/) to understand which data is durable, transient, or recoverable. ## Related Pages - [Operate Redis Streams](/operations/redis/) - [Ingest Audit Evidence](/services/audit/) - [Use Event Topics](/api/event-topics/) --- # Store State # URL: https://docs.caracal.run/architecture/storage-model/ # Markdown: https://docs.caracal.run/markdown/architecture/storage-model.md # Type: architecture # Concepts: # Requires: --- Caracal separates durable state from propagation state. | Store | Role | | --- | --- | | Postgres | Durable product, authority, audit, coordination, and outbox state. | | Redis Streams | Event propagation, invalidation, revocation, and consumer coordination. | | Replay volumes | Temporary audit emission buffer for STS and Gateway. | | Secret manager/files | Runtime credentials and HMAC/encryption material. | ## Postgres Ownership | Domain | Tables | | --- | --- | | Product model | `zones`, `providers`, `applications`, `resources`, `application_dependencies` | | Authority | `sessions`, `delegated_grants`, `grants`, `step_up_challenges` | | Policy | `policies`, `policy_versions`, `policy_sets`, `policy_set_versions`, `policy_set_bindings` | | Audit | `audit_events`, partitions, `audit_export_watermark`, `audit_ingest_alerts`, `admin_audit_events` | | Agents | `agent_sessions`, `agent_topology`, `agent_services`, `agent_invocations`, `delegation_edges`, `delegation_graph_epochs` | | Operations | `event_outbox`, `caracal_outbox`, `admin_tokens`, `gateway_resource_bindings`, `gateway_binding_revision`, `resource_rate_limits` | ## Integrity Controls - Migrations are forward-only in production tooling. - `schema_migrations` records applied versions. - Audit role cannot update or delete `audit_events`. - Policy versions are immutable. - Zone-scoped tables use fail-closed row-level security. - Audit partitions are maintained for the rolling window. ## Redis Ownership Redis keeps delivery state, pending entries, consumer groups, rate-limit coordination, and revocation/invalidation streams. Redis does not replace Postgres for durable product state. ## Restore Priority 1. Postgres and runtime secrets. 2. STS/Gateway replay volumes. 3. Redis streams/pending entries when available. 4. Audit exports and SIEM archive. ## Next Step Use [Manage Keys](/architecture/crypto-keys/) to understand how signing, encryption, and HMAC keys protect stored and propagated state. ## Related Pages - [Operate PostgreSQL](/operations/postgres/) - [Operate Redis Streams](/operations/redis/) - [Back Up and Retain Data](/operations/backup-retention/) --- # Manage Keys # URL: https://docs.caracal.run/architecture/crypto-keys/ # Markdown: https://docs.caracal.run/markdown/architecture/crypto-keys.md # Type: architecture # Concepts: # Requires: --- Caracal uses separate keys for encryption, mandate signing, audit integrity, stream integrity, and service-to-service exchange. ## Key Map | Material | Used by | Purpose | | --- | --- | --- | | Zone signing keys | STS | Sign mandate JWTs and publish public keys through JWKS. | | `ZONE_KEK` | API and STS | Encrypt/decrypt zone secrets and key material. | | `AUDIT_HMAC_KEY` | API, STS, Gateway, Audit, Control | Protect audit event integrity. | | `STREAMS_HMAC_KEY` | API, STS, Gateway, Coordinator, Audit/consumers | Sign Redis stream messages in published modes. | | `GATEWAY_STS_HMAC_KEY` | Gateway and STS | Authenticate Gateway token-exchange requests to STS. | | Admin and Coordinator tokens | API, Coordinator, Console, Control | Authenticate management and agent/delegation operations. | ## Mandate Verification ```mermaid sequenceDiagram participant STS participant Client participant Resource STS->>Client: signed mandate JWT Resource->>STS: GET /.well-known/jwks.json STS-->>Resource: public keys with cache headers Client->>Resource: bearer mandate Resource->>Resource: verify signature, issuer, audience, scopes, token use, revocation ``` STS JWKS responses are cacheable; rotation plans must preserve overlap until verifier caches expire. ## Gateway Exchange Signature Gateway signs STS exchange requests with timestamp, request ID, method, path, and form body. STS checks the signature, skew, request ID, and nonce before accepting the Gateway-authenticated path. ## Published-Mode Requirements In `rc` and `stable`, HMAC keys used by services must be present and at least 32 bytes where validated. Missing keys cause startup or readiness failure rather than silent downgrade. ## Next Step Use [Enforce Boundaries](/architecture/trust-boundaries/) to connect keys, services, commands, and data stores to trust boundaries. ## Related Pages - [Rotate Keys and Secrets](/operations/key-management/) - [Identity Package](/sdks/identity/) - [Enforce Boundaries](/architecture/trust-boundaries/) --- # Enforce Boundaries # URL: https://docs.caracal.run/architecture/trust-boundaries/ # Markdown: https://docs.caracal.run/markdown/architecture/trust-boundaries.md # Type: architecture # Concepts: # Requires: --- Caracal’s trust model depends on clear boundaries. Do not blur runtime lifecycle, product management, token issuance, protected-resource routing, and audit evidence. ## Boundary Map | Boundary | Trusted side | Untrusted or constrained side | | --- | --- | --- | | Runtime CLI | Local lifecycle and `caracal run` config | Product-management state and admin actions. | | Console/Admin API | Authenticated operators and Control/Admin clients | Anonymous clients and expired/insufficient tokens. | | STS | Valid application credentials, policy/grant/session state | Malformed subject tokens, invalid client secrets, unsatisfied step-up. | | Gateway | Verified inbound mandate, configured binding, revocation-fresh state | Arbitrary upstream URLs, path traversal, replayed/expiring tokens. | | Audit | HMAC-verified stream events and append-only database role | Tampered stream payloads or mutable evidence. | | Redis | Signed operational messages | Unsigned or mismatched stream messages in published modes. | | Postgres | Service roles and fail-closed RLS | Cross-zone reads without zone context. | | Control | Enabled gate, verified JWT, replay-safe JTI, rate limit | Disabled gate, replayed tokens, unsupported commands. | ## Gateway Fail-Closed Checks Gateway denies before upstream dispatch when: - bearer token is missing, malformed, too large, expiring, revoked, replayed, or signature-invalid; - `X-Caracal-Resource` is missing; - no binding exists for the token zone and resource; - path traversal is detected; - STS exchange fails or the STS circuit is open; - upstream host safety rules reject the destination. ## Control Boundary Control is optional. It requires a gate file, JWKS-backed JWT verification, replay protection, rate limits, and an API token for dispatch. It must not be exposed as a top-level runtime CLI command. ## Related Pages - [Harden Production](/operations/tls-hardening/) - [Automate Management](/services/control/) - [Protect Upstreams](/services/gateway/) --- # Operate Runtime and Console # URL: https://docs.caracal.run/runtime-console/ # Markdown: https://docs.caracal.run/markdown/runtime-console.md # Type: landing # Concepts: # Requires: --- Use this section after installation when you need to operate Caracal locally or understand which operator surface owns a task. Caracal has two local operator surfaces: | Surface | Use it for | Do not use it for | | --- | --- | --- | | `caracal` | Local stack lifecycle, readiness checks, local reset, token-injected process execution, and Console launch. | Zones, applications, providers, resources, policies, audit, agents, delegation, or Control setup. | | `caracal console` / `caracal-console` | Human management for zones, applications, providers, resources, policies, policy sets, authority sessions, Control, audit, request traces, agent sessions, delegation, and diagnostics. | Starting or stopping the stack, purging local runtime state, or running workloads. | The runtime CLI stays small so lifecycle and workload execution do not depend on admin tokens, zone selection, or Control credentials. Product management belongs to the Console, Control API, or Admin SDK. ## Recommended Path 1. [Choose the Right Surface](/runtime-console/cli-and-console/) to confirm whether a task belongs in `caracal`, Console, Control API, or Admin SDK. 2. [Start and Check the Stack](/runtime-console/stack/) with `caracal up` and `caracal status --ready`. 3. [Use the Console](/runtime-console/console/) to run guided setup and create the first runtime profile. 4. [Configure Workloads](/runtime-console/config-file/) when a workload needs explicit profile, credential, or service endpoint settings. 5. [Run Workloads](/runtime-console/runtime/) with `caracal run -- `. 6. [Manage Product Objects](/runtime-console/admin/) for ongoing zone, application, provider, resource, policy, policy-set, and Control work. 7. [Inspect Diagnostics and Audit](/runtime-console/observability/) when debugging health, readiness, audit, or request decisions. 8. [Manage Agents and Delegation](/runtime-console/agents/) when workloads create agent sessions or delegation edges. ## Command Map | Outcome | Surface | | --- | --- | | Start, stop, check, or purge the local runtime stack | `caracal` | | Launch the terminal Console | `caracal console` | | Run an agent process with scoped resource tokens | `caracal run -- ` | | Create a zone, app, provider, resource, policy, or policy set | Console | | Enable Control automation | Console `control` | | Script management workflows | Control API or Admin SDK | | Inspect audit events, request traces, diagnostics, agents, or delegation | Console | ## Next Step Start with [Choose the Right Surface](/runtime-console/cli-and-console/) if you are deciding where an operation belongs. If you already know you need a local stack, go directly to [Start and Check the Stack](/runtime-console/stack/). --- # Choose the Right Surface # URL: https://docs.caracal.run/runtime-console/cli-and-console/ # Markdown: https://docs.caracal.run/markdown/runtime-console/cli-and-console.md # Type: reference # Concepts: # Requires: --- Use this page before reaching for a command. Caracal separates local runtime lifecycle from product management so workload execution does not depend on admin tokens or selected zones. ## Ownership Rule ```text local lifecycle and process execution -> caracal human product management -> caracal console / caracal-console scripted product management -> Control API or Admin SDK ``` The top-level runtime CLI intentionally contains only `up`, `down`, `status`, `purge`, `run`, and `console`. :::note[FAQ] [Why are zone and policy commands in the Console instead of the `caracal` CLI?](/reference/faq/#faq-015) ::: ## Runtime CLI Tasks | Outcome | Command | | --- | --- | | Start the local stack | `caracal up` | | Stop the local stack | `caracal down` | | Stop and remove volumes | `caracal down -v` | | Probe service health | `caracal status` | | Probe dependency readiness | `caracal status --ready` | | Emit structured status | `caracal status --json` | | Remove local stack/runtime state | `caracal purge` | | Run a process with resource tokens | `caracal run -- ` | | Open the Console | `caracal console` | Runtime CLI commands must not require admin tokens, selected zones, product credentials, or Control credentials. ## Console Tasks | Outcome | Console menu | | --- | --- | | Create or select zones | `zone` | | Register confidential agent applications | `application` | | Configure provider credential sources | `provider` | | Register protected resources and scopes | `resource` | | Author access policies | `policy` | | Simulate and activate policy sets | `policy set` | | Inspect authority sessions | `authority session` | | Enable and manage Control API access | `control` | | Search audit records and trace selected requests | `audit` | | Trace one known request ID | `request trace` | | Inspect or manage agent sessions | `agent session` | | Inspect or revoke delegation edges | `delegation` | | Run health, readiness, zone, and preflight checks | `diagnostics` | Product-management workflows are not mirrored as top-level runtime CLI commands. Keeping them in the Console prevents command drift and keeps interactive operations tied to the same engine/admin helpers. ## Automation Tasks Use the Admin SDK from trusted operator environments when scripts need to create zones. Use the Control API for zone-bound automation over applications, resources, policies, authority sessions, agents, delegation edges, audit queries, and Control credentials. Enable Control from the Console `control` menu, then call the authenticated HTTP surface from CI or automation. Automation must run with operator credentials only in trusted control-plane environments. Agent workloads should use `caracal run` or application credentials and must not receive `CARACAL_ADMIN_TOKEN`, `CARACAL_COORDINATOR_TOKEN`, `CARACAL_SECRETS_DIR`, or mounted operator secret files. ## Troubleshooting | Symptom | Check | | --- | --- | | `caracal console` is hidden or unavailable | Use `caracal-console` directly, or reinstall so the Console binary/workspace shim is available. Runtime lifecycle commands remain available. | | A command for zones, policies, or audit is missing from `caracal` | Open the Console. Those workflows are intentionally Console-owned. | | Console fails in CI | Use the Control API or Admin SDK. The Console requires an interactive TTY. | | Runtime command asks for product-management context | Treat it as a bug; runtime lifecycle must not depend on admin or zone state. | ## Next Step Use [Start and Check the Stack](/runtime-console/stack/) when you need local services running. --- # Start and Check the Stack # URL: https://docs.caracal.run/runtime-console/stack/ # Markdown: https://docs.caracal.run/markdown/runtime-console/stack.md # Type: reference # Concepts: # Requires: --- Runtime lifecycle commands manage local infrastructure only. They delegate to Docker Compose and keep product state changes in the Console. ## Service Layout | Service | Default local port | | --- | --- | | API | `3000` | | STS | `8080` | | Gateway | `8081` | | Audit | `9090` | | Coordinator | `4000` | Control runtime exposure is managed through the Console `control` menu, not by targeting a stack service directly from the runtime CLI. ## Start ```bash caracal up ``` Development mode rebuilds local images. Release modes use installed compose assets and the selected Caracal version. ## Stop ```bash caracal down caracal down -v ``` Use `-v` only when you intend to remove Docker volumes. ## Check Health and Readiness ```bash caracal status caracal status --ready caracal status --ready --json ``` `status` probes service health endpoints. `status --ready` probes readiness endpoints and dependencies. The command exits `0` only when the selected probes pass. ## Purge Local State ```bash caracal purge ``` `purge` can remove stack artifacts, volumes, logs, config, runtime state, secrets, example containers and images, cached assets, or all local runtime state depending on the selected purge target. Use it when you intentionally want a clean local environment. After a full purge, recreate product state through the Console: zones, applications, providers, resources, policies, Control keys, and generated runtime profiles. ## Troubleshooting | Symptom | Likely cause | | --- | --- | | `status` passes but `status --ready` fails | A service is alive but one of its dependencies or readiness checks is not ready. | | Console cannot connect after `up` | Check `caracal status --ready`; custom deployments should confirm the Management API URL override. | | Workloads cannot exchange tokens | Readiness for STS, Gateway, and audit must pass before debugging workload config. | | Local tokens are stale after `purge` and `up` | Prefer local generated secret files over exported shell tokens for localhost Console sessions. | ## Next Step Continue to [Use the Console](/runtime-console/console/) to create the first zone, application, provider, resource, policy, and runtime profile. --- # Use the Console # URL: https://docs.caracal.run/runtime-console/console/ # Markdown: https://docs.caracal.run/markdown/runtime-console/console.md # Type: reference # Concepts: # Requires: --- `caracal-console` is the terminal Console for setting up and operating protected agent access. It fetches from the management API, renders a full-screen TTY interface, and keeps product management out of the top-level runtime CLI. ## Launch ```bash caracal console ``` or: ```bash caracal-console ``` The Console requires an interactive TTY. In non-interactive environments, use the Control API or Admin SDK. ## Credentials and Endpoints | Input | Purpose | | --- | --- | | `CARACAL_API_URL` | Cloud/custom Management API URL override. | | `CARACAL_ADMIN_TOKEN` or `CARACAL_ADMIN_TOKEN_FILE` | Admin bearer token for management views. | | `CARACAL_COORDINATOR_URL` | Cloud/custom Coordinator URL override for `agent session` and `delegation`. | | `CARACAL_COORDINATOR_TOKEN` or `CARACAL_COORDINATOR_TOKEN_FILE` | Coordinator token for agent and delegation views. | For default local services, generated secret files from `caracal up` are preferred over exported token values so local reset cycles do not keep stale shell credentials. Those files live outside the source workspace by default and must not be mounted into agent workspaces. ## Screen Model ```text +------------------------------------------------------------------+ | Caracal Console API endpoint | | > menu > selected view breadcrumb | +------------------------------------------------------------------+ | | | view content | | | +------------------------------------------------------------------+ | status message | | key hints | +------------------------------------------------------------------+ ``` The Console redraws when content changes and reacts to terminal resize events. ## Main Menu | Group | Key | Label | Zone required | | --- | --- | --- | --- | | start | `s` | `guided setup` | No | | manage | `1` | `zone` | No | | manage | `2` | `application` | Yes | | manage | `3` | `provider` | Yes | | manage | `4` | `resource` | Yes | | manage | `5` | `policy` | Yes | | manage | `6` | `policy set` | Yes | | sessions | `7` | `authority session` | Yes | | sessions | `r` | `agent session` | Yes | | sessions | `g` | `delegation` | Yes | | observe | `a` | `audit` | Yes | | observe | `t` | `request trace` | Yes | | runtime | `c` | `control` | Yes | | runtime | `d` | `diagnostics` | No | Press `z` to open the zone picker before navigating to zone-scoped views. ## Sessions Views The `sessions` group observes the three runtime identity objects. They are read-only in the Console — authority is minted by the STS and delegation is authored through the agent SDK; the Console lists and inspects them. | View | What it shows | Object | | --- | --- | --- | | `authority session` | STS-minted subject sessions for a user or application | a `sessions` row | | `agent session` | A spawned agent's runtime session, lifecycle, and lease health | an `agent_sessions` row | | `delegation` | Authority edges between agent sessions (who narrowed whom) | a `delegation_edges` row | An **authority session** is the root credential a principal holds after a token exchange. An **agent session** is one spawned agent under an application; a *delegated* session is simply an agent session that has an inbound delegation edge — not a separate object. A **delegation** edge connects two agent sessions, so inbound and outbound edges are anchored on the agent session id. The `agent session` view supports filtering (`f`) by `status` (active, suspended, terminated), `lifecycle` (task, service), `application`, and `label`. The `expires` column surfaces lease health: a task agent shows its TTL deadline; a service agent shows its heartbeat-lease deadline (or `no lease` if it has stopped renewing). Use `i`/`o` on an agent row to view its inbound and outbound delegation edges. ## Guided Setup `guided setup` creates or selects the first working objects: 1. Zone 2. Agent app 3. Provider 4. Resource 5. Access policy 6. Review and create Important guided fields include `upstream URL`, `first request path`, `runtime profile`, `write profile files`, `profile path`, `secret file`, `resource identifier`, and `resource scopes`. Use file writing when you want the Console to create the runtime profile and secret file locally. Leave it disabled when a platform secret manager or CI system will provide those values. ## Navigation Keys | Key | Action | | --- | --- | | `q`, `ctrl-c` | Quit | | `h`, `esc`, `left` | Back | | `up`, `j` | Move up | | `down`, `k` | Move down | | `pgup`, `pgdn` | Page through long lists | | `home`, `g` | First row | | `end`, `G` | Last row | | `enter` | Open or submit | | `r` | Reload | | `z` | Zone picker | | `?` | Contextual help | Forms can expose advanced options for manual identifiers, raw JSON, shadow versions, overwrite behavior, and other non-default settings. ## Troubleshooting | Symptom | Check | | --- | --- | | Console exits immediately | Ensure stdin is a TTY. | | `401 Unauthorized` | Check `CARACAL_ADMIN_TOKEN` or `CARACAL_ADMIN_TOKEN_FILE`. | | `403 Forbidden` | The token lacks the required management scope. | | Connection error | Run `caracal status --ready`; custom deployments should confirm the Management API URL override. | | Coordinator views fail | Confirm the Coordinator token and, for custom deployments, the Coordinator URL override. | | A zone-scoped view will not open | Press `z` and select a zone first. | ## Next Step Use [Configure Workloads](/runtime-console/config-file/) when a workload needs a runtime profile, credential manifest, secret file, or service endpoint override. --- # Configure Workloads # URL: https://docs.caracal.run/runtime-console/config-file/ # Markdown: https://docs.caracal.run/markdown/runtime-console/config-file.md # Type: config # Concepts: # Requires: --- Runtime config is workload input for `caracal run` and SDK credential loaders. It is not Console state and it does not create, select, or manage zones. ## Source Precedence | Order | Source | | --- | --- | | 1 | `CARACAL_CONFIG`, when set and the file exists. | | 2 | Environment runtime config. | | 3 | `$XDG_CONFIG_HOME/caracal/caracal.toml` or `~/.config/caracal/caracal.toml`. | The runtime does not read `./caracal.toml` from the current working directory. ## Environment Config | Variable | Meaning | | --- | --- | | `CARACAL_STS_URL` or `CARACAL_ZONE_URL` | Cloud/custom STS URL override for token exchange. | | `CARACAL_GATEWAY_URL` | Cloud/custom Gateway URL override used by SDK transports. | | `CARACAL_COORDINATOR_URL` | Cloud/custom Coordinator URL override used by SDKs and Console agent views. | | `CARACAL_ZONE_ID` | Zone identifier. | | `CARACAL_APPLICATION_ID` | Confidential application identifier. | | `CARACAL_APP_CLIENT_SECRET_FILE` | Cloud/custom path to a mounted client-secret file. | | `CARACAL_APP_CLIENT_SECRET` | Inline local-development client secret. | | `CARACAL_RUN_CREDENTIALS_FILE` | Cloud/custom JSON credential manifest file. | | `CARACAL_RUN_CREDENTIALS` | Inline JSON credential manifest. | | `CARACAL_RUN_CONTINUE_ON_FAILURE` | `true` or `false` for required credential failure behavior. | | `CARACAL_RUN_TTL_SECONDS` | Credential exchange TTL, capped at 900 seconds. | | `CARACAL_MCP_GOVERNANCE_MODE` | `block` or `log`. | Set only one of `CARACAL_RUN_CREDENTIALS_FILE` and `CARACAL_RUN_CREDENTIALS`. Local dev and stable runtime launches normally set only `CARACAL_ZONE_ID` and `CARACAL_APPLICATION_ID`. The runtime and SDKs auto-detect the client secret and credential manifest from the OS Caracal config directory: ```text /runtime///client-secret /runtime///credentials.json ``` Local dev and stable service URLs also resolve automatically. Set service URL overrides only for cloud deployments, containers, or custom service locations. ## TOML Schema ```toml zone_id = "zone_prod" application_id = "app_agent" ttl_seconds = 900 [[credentials]] env = "OPENAI_API_KEY" resource = "resource://openai" credential_type = "provider_token" upstream_prefix = "https://api.openai.com/v1" [[optional_credentials]] env = "CARACAL_MANDATE" resource = "resource://mandate-aware-api" credential_type = "caracal_mandate" on_failure = "warn" [mcp_governance] mode = "block" ``` ## Field Reference | Field | Meaning | | --- | --- | | `zone_url` | Cloud/custom STS URL override used by `caracal run`; `sts_url` is accepted as an SDK-readable alias/fallback. | | `gateway_url` | Cloud/custom Gateway endpoint override for SDK transports. | | `coordinator_url` | Cloud/custom Coordinator endpoint override for SDK agent/delegation features. | | `zone_id` | Zone created in Console. | | `application_id` | Confidential application used by the workload. | | `app_client_secret_file` | Cloud/custom secret-file path override. | | `app_client_secret` | Inline local-development secret. | | `ttl_seconds` | Credential exchange TTL; defaults to and is capped at 900 seconds. | | `continue_on_failure` | Required credential failure behavior; defaults to `false`. | | `credentials[]` | Required resource tokens. | | `optional_credentials[]` | Optional resource tokens with `on_failure = "warn"` or `"error"`. | | `mcp_governance.mode` | `block` or `log` for likely MCP subprocesses. | Credential entries use `env`, `resource`, optional `upstream_prefix`, and optional `credential_type`. `provider_token` is the `caracal run` golden path and requires a provider with `allow_runtime_injection = true`; `caracal_mandate` is for workloads that consume Caracal mandates directly. SDKs can also use entries as resource audiences and Gateway routing hints. ## Security Rules - Use auto-detected local credential files for dev and stable runtime launches. - Use explicit secret-file paths for cloud deployments, containers, or external secret stores. - Secret files and explicit config files must not be group- or world-writable. - The default profile path must use owner-only permissions on POSIX systems. - Non-local insecure `http://` STS URLs are rejected unless explicitly allowed by environment. - Environment names that affect runtime loading, such as `NODE_OPTIONS`, `LD_PRELOAD`, and dynamic-library path variables, are blocked. ## Cloud and Custom Secret Paths Cloud deployments, containers, and external secret stores can override the auto-detected local paths with `CARACAL_APP_CLIENT_SECRET_FILE`, `CARACAL_RUN_CREDENTIALS_FILE`, or `app_client_secret_file` in a runtime profile. Use these only when the standard OS Caracal config directory is not the correct secret location for the workload. Cloud deployments and custom service locations can also set `CARACAL_STS_URL`, `CARACAL_GATEWAY_URL`, `CARACAL_COORDINATOR_URL`, `sts_url`, `gateway_url`, or `coordinator_url`. Do not set these for the default local stack. ## Generated Profiles Console `guided setup` can generate a complete profile and one-time secret file using real zone, application, provider, resource, policy, and credential values. Enable file writing when you want the Console to write the profile and secret file with owner-only permissions. ## Troubleshooting | Symptom | Check | | --- | --- | | `CARACAL_CONFIG` is set but ignored | The file must exist; when it is set, no other profile path is considered. | | Runtime rejects a profile | Remove unknown fields and check permissions, URL safety, and credential env names. | | Provider key is not injected | Confirm the credential uses `credential_type = "provider_token"` and the provider config allows runtime injection. | | SDK Gateway routing does not happen | Confirm `gateway_url` and `upstream_prefix` are present. | | Optional credential stops the process | Set `on_failure = "warn"` for optional credentials that may be absent. | ## Next Step Use [Run Workloads](/runtime-console/runtime/) when the profile is ready and you need to inject scoped credentials into a process. --- # Run Workloads # URL: https://docs.caracal.run/runtime-console/runtime/ # Markdown: https://docs.caracal.run/markdown/runtime-console/runtime.md # Type: reference # Concepts: # Requires: --- `caracal run` is the runtime execution command. It starts a subprocess with scoped Caracal resource tokens in environment variables and returns the subprocess exit code. ```bash caracal run -- python3 agent.py ``` Use `--` when the child command accepts flags. ## Execution Flow ```text runtime config -> MCP governance check -> STS token exchange -> env injection -> child process ``` 1. Load runtime config from environment, `CARACAL_CONFIG`, or the default XDG profile. 2. Validate config keys, URL safety, secret-file permissions, and credential environment names. 3. Check `[mcp_governance]` rules before spawning likely MCP server commands. 4. Exchange each required `credentials[]` entry for a resource token. 5. Exchange each `optional_credentials[]` entry and apply its `on_failure` behavior. 6. Spawn the child process without `shell: true`. 7. Forward stdin, stdout, stderr, and the child exit code. Injected tokens have a 15-minute TTL from the runtime exchange. ## Required Runtime Identity `caracal run` needs: | Input | Source | | --- | --- | | STS/zone URL | Automatic for local dev/stable; explicit only for cloud/custom deployments. | | Zone ID | `CARACAL_ZONE_ID` or `zone_id`. | | Application ID | `CARACAL_APPLICATION_ID` or `application_id`. | | Client secret | Auto-detected local client-secret file or explicit cloud/custom secret-file config. | | Credential manifest | Auto-detected local credential manifest or `credentials[]` / `optional_credentials[]` in a profile. | Console `guided setup` is the recommended way to generate the first profile and one-time client secret file. Local dev and stable launches auto-detect credential files under the OS Caracal config directory after the zone and application IDs are known. Service URLs use local defaults automatically unless cloud/custom deployment docs configure explicit overrides. ## Step-Up Challenges When STS returns `interaction_required`, the runtime writes the challenge notice, polls the step-up endpoint every two seconds, and retries exchange for up to five minutes. If the challenge is not satisfied in time, the subprocess is not spawned. ## Environment Safety The runtime injects only configured credential environment variables and a small inherited allowlist such as path, home, shell, temporary directory, locale, color, CI, XDG, and Docker-related variables. Credential names that can alter process loading, such as `NODE_OPTIONS`, `LD_PRELOAD`, or dynamic-library path variables, are blocked. ## MCP governance ```toml [mcp_governance] mode = "block" ``` `block` stops likely MCP server subprocesses before launch. `log` records the governance event and continues; outside development it requires an explicit platform override. ## Troubleshooting | Symptom | Check | | --- | --- | | `runtime config not found` | Run Console `guided setup`, set platform env vars, or point `CARACAL_CONFIG` at a profile. | | Secret-file validation fails | Store the secret value in the file and ensure it is not group- or world-writable. | | Token exchange fails | Confirm `caracal status --ready`, zone ID, application ID, client secret, resource ID, and policy coverage. | | Child command never starts | Required credential exchange, step-up, config validation, or MCP governance failed before spawn. | | Child command exits non-zero | The runtime returns the child process exit code after successful token injection. | ## Next Step Use [Manage Product Objects](/runtime-console/admin/) for ongoing Console work after the workload path is running. --- # Manage Product Objects # URL: https://docs.caracal.run/runtime-console/admin/ # Markdown: https://docs.caracal.run/markdown/runtime-console/admin.md # Type: reference # Concepts: # Requires: --- The Console is the supported human interface for product management. Launch it with: ```bash caracal console ``` or: ```bash caracal-console ``` ## Management Flow ```text zone -> application -> provider -> resource -> policy -> policy set -> authority session/audit ``` Console `guided setup` walks the first version of that flow and can write a runtime profile for `caracal run` and SDKs. ## Menu Workflows | Menu label | What it manages | | --- | --- | | `zone` | Zone records and active zone selection. | | `application` | Confidential agent applications and one-time client secrets. | | `provider` | Gateway upstream auth modes for Caracal mandates, OAuth 2.0 authorization code, OAuth 2.0 client credentials, API-key, and bearer-token upstreams. | | `resource` | Protected resource identifiers, scopes, upstream URLs, Gateway application bindings, and upstream credential provider bindings. | | `policy` | Policy content and versions. | | `policy set` | Policy-set composition, simulation, activation, and shadow evaluation. | | `authority session` | Active authority-session records. | | `control` | Control API exposure and Control credentials. | ## Secrets and Credentials Application client secrets are shown once when created. Provider secrets are accepted only in provider create or rotation-style edit flows and are stored sealed by the Control API. Store application secrets in a secret manager, mounted secret file, or Console-generated local profile before leaving the result screen. The Console masks secret-shaped fields and error output. ## Automation Handoff Use the Console `control` menu when management needs to move from interactive operations to automation. Control API calls are authenticated and audited. They use the same product-management boundary as the Console; they are not exposed as top-level runtime CLI commands. ## Troubleshooting | Symptom | Check | | --- | --- | | A view says a zone is required | Press `z` to select a zone before opening zone-scoped views. | | A created client secret is no longer visible | Generate or rotate a new secret; one-time secrets are not recoverable. | | Policy activation fails | Run simulation, inspect validation errors, and confirm the policy set references the intended policy versions. | | Control API calls fail | Confirm Control is enabled from the Console `control` menu and the automation token targets the Control resource. | ## Next Step Use [Inspect Diagnostics and Audit](/runtime-console/observability/) when you need health checks, audit records, or request traces. --- # Inspect Diagnostics and Audit # URL: https://docs.caracal.run/runtime-console/observability/ # Markdown: https://docs.caracal.run/markdown/runtime-console/observability.md # Type: reference # Concepts: # Requires: --- Observability workflows live in the Console because they need operator context, selected zones, and structured API responses. ## Diagnostics Open `diagnostics` with `d`. The Console Doctor view runs the shared engine diagnostics across four sections: | Section | Checks | | --- | --- | | `health` | API reachability, admin authentication, and operator clock skew against the platform. | | `readiness` | Per-service `/ready` probes plus threshold evaluation of STS, Gateway, audit, and Coordinator operator metrics. | | `zones` | Visible zones, resources, policy sets, and audit queryability. | | `preflight` | Local secrets, keys, TLS material, and Postgres/Redis reachability. | Doctor does more than report liveness: it evaluates the operator metrics it collects and raises a warning or failure when a signal needs action. Notable deeper checks include: | Signal | Result | Why it matters | | --- | --- | --- | | Clock skew vs the API exceeds 30s | `fail` | Token `iat`/`nbf`/`exp` validation and audit ordering break when hosts are not NTP-synced. | | STS OPA `compile_errors` > 0 | `fail` | A policy bundle will not compile, so token exchanges deny or run on a stale bundle. | | Audit `tamper_mismatch_total` or `chain_breaks` > 0 | `fail` | Audit chain integrity is broken — treat as a security incident. | | Audit `dlq_size` or `hmac_failures` > 0 | `warn` | Events are failing processing or HMAC verification before they age out. | | Coordinator outbox `dead` rows, or a large `pending` backlog | `warn` | Spawn and revocation events are not propagating downstream. | Useful Doctor controls include reload, all-zones or selected-zone checks, preflight-only checks, and strict readiness where warnings fail automation gates. The `Next actions` block collects the advice for every non-passing check. This view is deeper than `caracal status`: `status` answers "is the stack up", while Doctor inspects internal health — policy compilation, audit integrity, event propagation, clock alignment, and local key material — that keeps the platform working correctly over time. ## Audit Open `audit` with `a`. Audit records tie STS, Gateway, Coordinator, policy, and Control decisions to request IDs and authority anchors. Use the audit view to filter recent events, inspect denials, and gather incident context. The audit view opens recent events immediately and streams new ones as they arrive. Press `f` for advanced filters: decision, event type, time window, request ID, agent session, agent label, and result limit. Time-window fields accept a relative window such as `15m`, `2h`, or `7d`, an ISO timestamp, or a date — the Console converts the value before querying the API. Use row actions to open the full event group (`enter`) or the decision trace (`x`) for the selected request. Audit is an event ledger: it records what authorization decided. It is distinct from the `sessions` views, which show the live state of authority sessions, agent sessions, and delegation edges. Filter audit by `agent session` or `agent label` to connect a decision back to the session that made it. ## Request Trace Open `request trace` with `t` when you already have a request ID — from a log line, error response, or Gateway header — and need the decision trace directly without scanning the audit list. It is the same trace reached by `x` on an audit row, exposed as a direct entry point. Trace output connects the request, selected resource, subject, scopes, determining policies, diagnostics, and result. ## Troubleshooting | Symptom | Check | | --- | --- | | Diagnostics cannot reach API | Run `caracal status --ready`; custom deployments should confirm the Management API URL override. | | Zone diagnostics show no zones | Confirm the admin token has visibility into the expected zone. | | Audit records are missing | Confirm the selected zone, time window, and audit service readiness. | For a symptom-first path from a failing application call to the right surface and tool, see [Troubleshoot by Symptom](/operations/troubleshooting/). :::note[FAQ] [Why is an audit event missing?](/reference/faq/#faq-018) ::: ## Next Step Use [Manage Agents and Delegation](/runtime-console/agents/) when a request involves agent sessions, child sessions, or delegated authority. --- # Manage Agents and Delegation # URL: https://docs.caracal.run/runtime-console/agents/ # Markdown: https://docs.caracal.run/markdown/runtime-console/agents.md # Type: reference # Concepts: # Requires: --- Agent sessions and delegation operations are Console workflows backed by the Coordinator. They are not top-level runtime CLI commands. ## Requirements | Input | Purpose | | --- | --- | | Coordinator endpoint | Automatic for local dev/stable; explicit only for cloud/custom deployments. | | `CARACAL_COORDINATOR_TOKEN` or `CARACAL_COORDINATOR_TOKEN_FILE` | Coordinator authority for agent and delegation views. | | Selected zone | Scope for agent session and delegation queries. | For the default local stack, Console prefers the Coordinator token generated by `caracal up` so a purge/up cycle does not leave an old exported token in use. ## Agent Sessions Open `agent session` with `r` to list agent sessions, inspect a session tree, suspend an active subtree, resume a suspended subtree, or terminate a session. Tree views show parent-child relationships so operators can understand inherited authority. ## Delegation Open `delegation` with `g` to inspect active, inbound, and outbound delegation edges. Detail views show source and target sessions, resources, scope bounds, expiry, and traversal state. Revoking a delegation edge ends the delegated authority immediately. Review outbound edges before terminating an agent session so dependent work can be stopped cleanly. ## Audit Correlation Agent session IDs and delegation edge IDs appear in audit events. Use `audit` request tracing or `request trace` when you need the policy decision and diagnostic payload behind an allow or deny. ## Troubleshooting | Symptom | Check | | --- | --- | | Agent session views are empty | Confirm the selected zone and that workloads are creating agent sessions through an SDK. | | Coordinator token missing | Run `caracal up` locally or set `CARACAL_COORDINATOR_TOKEN_FILE` for a remote Coordinator. | | Delegation revocation appears delayed | Confirm resource servers consume revocation state and reject revoked delegation anchors. | --- # Choose an SDK or Package # URL: https://docs.caracal.run/sdks/ # Markdown: https://docs.caracal.run/markdown/sdks.md # Type: page # Concepts: # Requires: --- Use SDKs when you need package-level details after a guide. The [Guides](/guides/) section shows task workflows; this section explains package names, install commands, public APIs, runtime requirements, and where each package fits. ## Choose by What You Are Building | I am building... | Start here | Then read | | --- | --- | --- | | TypeScript app or agent workflow | [TypeScript SDK](./typescript/) | [OAuth Package](./oauth/), [A2A Transport](./transport-a2a/) | | Python app, agent, or ASGI service | [Python SDK](./python/) | [ASGI Connector](./connectors/asgi/), [FastMCP Connector](./connectors/fastmcp/), [MCP Auth Transport](./transport-mcp/) | | Go service or agent workflow | [Go SDK](./go/) | [Go net/http Connector](./connectors/nethttp/) | | Express resource server | [Express Connector](./connectors/express/) | [MCP Auth Transport](./transport-mcp/), [Redis Revocation Store](./connectors/redis/) | | FastAPI or Starlette resource server | [ASGI Connector](./connectors/asgi/) | [MCP Auth Transport](./transport-mcp/), [Redis Revocation Store](./connectors/redis/) | | FastMCP server | [FastMCP Connector](./connectors/fastmcp/) | [MCP Auth Transport](./transport-mcp/) | | Custom verification boundary | [Verification Layer Overview](./verification-layer/) | [Identity Package](./identity/), [Revocation Package](./revocation/) | | Admin or provisioning automation | [Admin Package](./admin/) | [Use Management API](/api/control-plane/) | | Durable token-state storage | [Postgres Token State Backend](./connectors/postgres/) | [Redis Revocation Store](./connectors/redis/) | ## Package Map | Area | TypeScript / Node | Python | Go | | --- | --- | --- | --- | | App SDK | `@caracalai/sdk` | `caracalai-sdk` | `github.com/garudex-labs/caracal/packages/sdk/go` | | Identity verification | `@caracalai/identity` | `caracalai-identity` | `github.com/garudex-labs/caracal/packages/identity/go` | | OAuth token exchange | `@caracalai/oauth` | `caracalai-oauth` | `github.com/garudex-labs/caracal/packages/oauth/go` | | Revocation store | `@caracalai/revocation` | `caracalai-revocation` | `github.com/garudex-labs/caracal/packages/revocation/go` | | MCP auth transport | `@caracalai/transport-mcp` | `caracalai-transport-mcp` | `github.com/garudex-labs/caracal/packages/transport/mcp/go` | | A2A transport | `@caracalai/transport-a2a` | Not published in this repository | Not published in this repository | | Express connector | `@caracalai/mcp-express` | Not applicable | Not applicable | | FastMCP connector | `@caracalai/mcp-fastmcp` | `caracalai-mcp-fastmcp` | Not applicable | | ASGI connector | Not applicable | `caracalai-asgi` | Not applicable | | net/http connector | Not applicable | Not applicable | `github.com/garudex-labs/caracal/packages/connectors/nethttp/go` | | Redis revocation store | `@caracalai/revocation-redis` | `caracalai-revocation-redis` | `github.com/garudex-labs/caracal/packages/connectors/redis/go` | | Postgres token state backend | `@caracalai/tokenstate-postgres` | Not published in this repository | Not published in this repository | ## Runtime Requirements | Ecosystem | Current package target | | --- | --- | | Node.js | Node `>=22` for published TypeScript packages that declare an engine. | | Python | Python `>=3.12` for published Python packages. | | Go | Module paths under `github.com/garudex-labs/caracal/packages/...`; current modules declare Go `1.26` where specified. | ## Related Guides - [Add SDK to Your App](/get-started/add-sdk-to-your-app/) - [Connect Your App with the SDK](/tutorials/connect-an-agent/) - [Protect a Gateway-Routed HTTP API](/guides/protect-gateway-http/) - [Implement Multi-Agent Delegation](/guides/delegation/) --- # TypeScript SDK # URL: https://docs.caracal.run/sdks/typescript/ # Markdown: https://docs.caracal.run/markdown/sdks/typescript.md # Type: page # Concepts: # Requires: --- `@caracalai/sdk` is the main TypeScript package for agent lifecycle, delegation, Gateway routing, and Caracal context propagation. ## Install ```bash npm install @caracalai/sdk ``` The package is ESM-only and targets Node `>=22`. To consume it from a CommonJS project, use a dynamic `await import('@caracalai/sdk')` or set `"type": "module"` in your `package.json`. ## Connect and Configure | API | Use it when | | --- | --- | | `new Caracal()` | You want auto-detection from explicit options, `CARACAL_CONFIG`, the default profile, or environment variables. | | `Caracal.fromEnv()` | You want environment-only startup. | | `Caracal.fromConfig(path)` | You want to load a generated runtime profile. | | `Caracal.fromClientSecret(options)` | You want service-root token refresh through STS client-secret exchange. | ```ts import { Caracal } from "@caracalai/sdk"; const caracal = new Caracal(); ``` ## Core methods | Method | Purpose | | --- | --- | | `spawn(fn, options?)` | Create an agent session and bind context while `fn` runs; pass `grant: Grant.narrow(...)` to bound the child's authority. | | `spawnService(options?)` | Start a long-lived service agent; returns a handle with `heartbeat()` and `close()`. Pass `grant: Grant.narrow(...)` to bound the handle's authority. | | `delegate(options, fn)` | Create a delegation edge to an existing agent session. | | `headers(options?)` | Project the current bound context into HTTP headers synchronously. | | `headersAsync(options?)` | Project headers when a root token may require async refresh. | | `bindFromHeaders(headers, fn, options?)` | Bind inbound Caracal envelope headers to the current async context. | | `transport(options?)` | Return a fetch-compatible function that injects Caracal headers and Gateway routing. | | `gatewayRequest(resourceId, path?)` | Build a Gateway URL and `X-Caracal-Resource` header. | | `current()` | Return the currently bound `CaracalContext`, if present. | ## Context Propagation ```ts import { Grant } from "@caracalai/sdk"; await caracal.spawn(async () => { await fetch("https://api.example.com/tickets", { headers: await caracal.headersAsync(), }); }, { grant: Grant.narrow(["tickets:read"], { resourceId: "https://api.example.com/tickets", constraints: { maxHops: 1, budget: 5, policyApproved: true, }, ttlSeconds: 600, }), }); ``` `DelegationConstraints` uses camelCase fields: `resources`, `maxDepth`, `maxHops`, `ttlSeconds`, `budget`, `policyApproved`, `expiresAt`, and `broadReason`. ## Protect Inbound Requests Use `bindFromHeaders()` to propagate a Caracal context after an upstream Gateway, connector, or transport verifier has accepted the inbound mandate. Use [Verification Layer Overview](/sdks/verification-layer/) when the TypeScript service must enforce mandate signatures, scopes, targets, agent identity, delegation, or revocation at its own boundary. Header and transport helpers refuse to fall back to the application root token unless you pass `{ allowRoot: true }`. Use that option only for trusted service-root ingress or setup calls. Normal agent work should run inside `spawn()`, `delegate()`, or `bindFromHeaders()`. ## Related pages - [Integrate the TypeScript SDK](/guides/sdk-typescript/) - [Verification Layer Overview](/sdks/verification-layer/) - [Enforce, propagate, or attribute](/concepts/authority-model/#enforce-propagate-or-attribute) - [Agent Delegation](/concepts/delegation/) - [A2A Transport](/sdks/transport-a2a/) --- # Python SDK # URL: https://docs.caracal.run/sdks/python/ # Markdown: https://docs.caracal.run/markdown/sdks/python.md # Type: page # Concepts: # Requires: --- `caracalai-sdk` is the main Python package for async agent lifecycle, delegation, Gateway routing, and ASGI context propagation. ## Install ```bash pip install caracalai-sdk ``` The package requires Python `>=3.12`. ## Connect and Configure | API | Use it when | | --- | --- | | `Caracal()` | Auto-detect a generated profile or environment variables. | | `Caracal.from_env()` | Load only from `CARACAL_*` environment variables. | | `Caracal.from_config(path)` | Load a Console-generated `caracal.toml`. | | `Caracal.from_client_secret(...)` | Build a client that refreshes application subject tokens through STS. | ```python from caracalai import Caracal caracal = Caracal() ``` ## Core methods | Method | Purpose | | --- | --- | | `async with caracal.spawn(...)` | Create and bind an agent session; pass `grant=Grant.narrow(...)` to bound the child's authority. | | `await caracal.spawn_service(...)` | Start a long-lived service agent; returns a handle with `heartbeat()` and `aclose()`. Pass `grant=Grant.narrow(...)` to bound the handle's authority. | | `async with caracal.delegate(...)` | Create and bind a delegation edge to an existing agent session. | | `caracal.headers(allow_root=False, ctx=None)` | Project the bound context (or an explicit `ctx`) into HTTP headers. | | `async with caracal.bind(ctx)` | Rebind a captured context into a new async task. | | `async with caracal.bind_from_headers(headers, allow_root=False)` | Bind inbound Caracal envelope headers. | | `caracal.transport(allow_root=False, ctx=None, scopes=None, **kwargs)` | Return an `httpx.AsyncClient` that injects Caracal headers and Gateway routing. Pass `ctx=` from thread pools or executors; pass `scopes=` to authorize gateway calls with a scoped resource mandate instead of the raw subject token. | | `caracal.sync_transport(allow_root=False, ctx=None, scopes=None, **kwargs)` | Synchronous `httpx.Client` counterpart. | | `await caracal.fetch(resource_id, path, ctx=None, scopes=None, ...)` | One-call Gateway request to a resource with context and authority injected. | | `caracal.context_middleware(verifier=None)` | ASGI middleware factory: propagates context, and enforces at the boundary when a `verifier` is passed. | | `caracal.gateway_request(resource_id, path="/")` | Build a Gateway URL and `X-Caracal-Resource` header. | | `caracal.mint_mandate(resource_id, scopes, ctx=None, ttl_seconds=None)` | Mint a cached resource mandate carrying the bound agent session and delegation edge; requires client-secret credentials. | ## Context Propagation ```python from caracalai import DelegationConstraints constraints = DelegationConstraints( max_hops=1, budget=5, policy_approved=True, ) ``` `DelegationConstraints` uses Python field names: `resources`, `max_depth`, `max_hops`, `ttl_seconds`, `budget`, `policy_approved`, `expires_at`, and `broad_reason`. ## Protect Inbound Requests `context_middleware()` is framework-agnostic and runs on any ASGI app (FastAPI, Starlette, Quart, Django ASGI). Without a verifier it only **propagates**: it binds the inbound Caracal envelope into request context but does not check JWT signatures, audience, scopes, token use, or revocation. Use this when a Gateway already enforced the mandate upstream. Pass `verifier=` to **enforce at the boundary**. The callable receives the bearer token and must raise on failure; back it with `caracalai_identity.verify_token` so the application sees a request only after the mandate is proven. The SDK never inspects token internals itself. ```python from caracalai_identity import verify_token from caracalai import Caracal caracal = Caracal.from_env() app = FastAPI() async def verify(token: str) -> None: await verify_token(token, issuer=ISSUER, audience=AUDIENCE) app.add_middleware(caracal.context_middleware(verifier=verify)) ``` See [Enforce, propagate, or attribute](/concepts/authority-model/#enforce-propagate-or-attribute) for which call path verifies authority. ## Related pages - [Integrate the Python SDK](/guides/sdk-python/) - [Protect a FastMCP App](/guides/protect-fastmcp/) - [Verification Layer Overview](/sdks/verification-layer/) - [MCP Auth Transport](/sdks/transport-mcp/) --- # Go SDK # URL: https://docs.caracal.run/sdks/go/ # Markdown: https://docs.caracal.run/markdown/sdks/go.md # Type: page # Concepts: # Requires: --- The Go SDK provides agent lifecycle, delegation, Gateway routing, and context propagation for Go services. ## Install ```bash go get github.com/garudex-labs/caracal/packages/sdk/go ``` ## Connect and Configure ```go import caracal "github.com/garudex-labs/caracal/packages/sdk/go" client, err := caracal.New() if err != nil { return err } ``` `New()` loads `CARACAL_CONFIG`, the default generated profile, or environment variables. `FromEnv()`, `FromConfig()`, and `FromClientSecret()` are available when you need explicit control. ## Core methods | API | Purpose | | --- | --- | | `New()` | Auto-detect profile or environment configuration. | | `FromEnv()` | Load from environment variables. | | `FromConfig(path)` | Load a generated runtime profile. | | `FromClientSecret(options)` | Use STS client-secret exchange for root SDK operations. | | `client.Spawn(ctx, fn, opts...)` | Create an agent session and run `fn` with a bound `context.Context`; set `SpawnOptions.Grant` to bound the child's authority. | | `client.Delegate(ctx, opts, fn)` | Create a delegation edge from the current agent session. | | `client.Headers(ctx, opts...)` | Project the bound context to `http.Header`. | | `client.BindFromRequest(ctx, req, opts...)` | Bind inbound Caracal envelope headers. | | `client.Transport(base, opts...)` | Return an `*http.Client` that injects Caracal headers. | | `client.GatewayRequest(resourceID, path)` | Build explicit Gateway routing metadata. | | `client.Current(ctx)` | Inspect the current Caracal context. | ## Context Propagation ```go constraints := &caracal.DelegationConstraints{ MaxHops: 1, Budget: 5, PolicyApproved: true, } ``` Available fields are `Resources`, `MaxDepth`, `MaxHops`, `TTLSeconds`, `Budget`, `PolicyApproved`, `ExpiresAt`, and `BroadReason`. ## Protect Inbound Requests Use `BindFromRequest()` to propagate a Caracal context after an upstream Gateway, connector, or transport verifier has accepted the inbound mandate. Use [Verification Layer Overview](/sdks/verification-layer/) when the Go service must enforce mandate signatures, scopes, targets, agent identity, delegation, or revocation at its own boundary. `Headers` and `Transport` require a bound Caracal context by default. Pass `caracal.RootOptions{AllowRoot: true}` only when the call should intentionally use the application subject token. ## Related pages - [Integrate the Go SDK](/guides/sdk-go/) - [Verification Layer Overview](/sdks/verification-layer/) - [Enforce, propagate, or attribute](/concepts/authority-model/#enforce-propagate-or-attribute) - [Protect a Go net/http Service](/guides/protect-nethttp/) - [Agent Delegation](/concepts/delegation/) --- # Verification Layer Overview # URL: https://docs.caracal.run/sdks/verification-layer/ # Markdown: https://docs.caracal.run/markdown/sdks/verification-layer.md # Type: page # Concepts: # Requires: --- Use this page when you are protecting an inbound resource-server boundary and need to choose the right package layer. Start with the highest-level connector that fits your framework; use lower-level packages only when you are building a custom boundary. ## Which Layer Should I Use? | Need | Use | | --- | --- | | Express route middleware | [Express Connector](./connectors/express/) | | FastAPI/Starlette (ASGI) middleware | [ASGI Connector](./connectors/asgi/) | | FastMCP server or tool authentication | [FastMCP Connector](./connectors/fastmcp/) | | Go `net/http` middleware | [Go net/http Connector](./connectors/nethttp/) | | Framework-neutral bearer parsing and mandate verification | [MCP Auth Transport](./transport-mcp/) | | Custom JWT claim verification | [Identity Package](./identity/) | | Shared revocation checks | [Revocation Package](./revocation/) plus [Redis Revocation Store](./connectors/redis/) | | Durable TypeScript token state | [Postgres Token State Backend](./connectors/postgres/) | ## Framework Connectors Connectors adapt the shared transport and identity packages to common server frameworks. They should reject failed requests before your handler or tool runs, attach verified claims to framework context, and preserve the same 401/403 behavior across languages. Use connectors first when your framework is supported. They reduce boilerplate and keep error mapping consistent with the rest of Caracal. ## MCP Auth Transport [MCP Auth Transport](./transport-mcp/) is the reusable verification layer under the connectors. Use it when your framework is unsupported or when you need direct control over bearer parsing, verifier defaults, route-level scopes, targets, agent requirements, delegation requirements, hop limits, and safe error hints. ## Identity Package [Identity Package](./identity/) verifies mandate JWT claims directly. Use it when you are composing a custom verifier or connector. It does not provide the full transport error mapping or revocation-store integration by itself. ## Revocation and Shared State Resource servers must reject mandates anchored to revoked sessions, root sessions, agent sessions, or delegation edges. Use in-memory revocation stores for local development only. Use [Redis Revocation Store](./connectors/redis/) for multi-instance resource servers that consume `caracal.sessions.revoke`. Use [Postgres Token State Backend](./connectors/postgres/) only when a TypeScript service needs durable token-state rows. It is not the revocation stream consumer. ## Failure Behavior All HTTP verification layers should preserve the shared status mapping: | Status | Meaning | | --- | --- | | `401` | The credential was missing, invalid, expired, revoked, or from the wrong zone. | | `403` | The mandate verified but lacks required scope, target, agent, delegation, chain, or hop authority. | ## Related Pages - [Protect an MCP Server](/guides/protect-mcp/) - [Protect an Express App](/guides/protect-express/) - [Protect a FastAPI App](/guides/protect-fastapi/) - [Protect a FastMCP App](/guides/protect-fastmcp/) - [Protect a Go net/http Service](/guides/protect-nethttp/) --- # Framework Connectors # URL: https://docs.caracal.run/sdks/connectors/ # Markdown: https://docs.caracal.run/markdown/sdks/connectors.md # Type: page # Concepts: # Requires: --- Framework connectors adapt the lower-level identity and MCP auth transport packages to common server frameworks. Use them before reaching for lower-level verification APIs. ## Connector Map | Connector | Package | Use it for | | --- | --- | --- | | [Express](/sdks/connectors/express/) | `@caracalai/mcp-express` | Protecting Express 5 routes with Caracal mandate verification. | | [ASGI](/sdks/connectors/asgi/) | `caracalai-asgi` | Protecting FastAPI, Starlette, and other Python ASGI apps with Caracal mandate verification. | | [FastMCP](/sdks/connectors/fastmcp/) | `@caracalai/mcp-fastmcp`, `caracalai-mcp-fastmcp` | Verifying FastMCP bearer tokens before tool execution. | | [Go net/http](/sdks/connectors/nethttp/) | `github.com/garudex-labs/caracal/packages/connectors/nethttp/go` | Protecting Go HTTP handlers. | ## Choose a Verification Layer | Need | Use | | --- | --- | | Framework-neutral verification | [MCP Auth Transport](/sdks/transport-mcp/) | | Express route middleware | [Express Connector](/sdks/connectors/express/) | | FastAPI/Starlette (ASGI) middleware | [ASGI Connector](/sdks/connectors/asgi/) | | FastMCP tool/server authentication | [FastMCP Connector](/sdks/connectors/fastmcp/) | | Go HTTP middleware | [Go net/http Connector](/sdks/connectors/nethttp/) | | Shared revocation store | [Redis Revocation Store](/sdks/connectors/redis/) | ## Boundary Semantics Every HTTP connector maps verification failures through one canonical status function in `@caracalai/transport-mcp` (`httpStatusForAuthError` in TypeScript, `transportmcp.HTTPStatus` in Go), so the boundary behaves identically across frameworks and languages: - **401** — the credential itself was not accepted: `missing_token`, `invalid_token`, `invalid_zone`, `session_revoked`, `delegation_stale`. - **403** — the mandate verified but its authority is insufficient for the route: `insufficient_scope`, `agent_required`, `delegation_required`, `chain_mismatch`, `hop_count_exceeded`. Connectors never re-derive these status codes; they consume the shared mapping. ## Related State Backends - [Redis Revocation Store](/sdks/connectors/redis/) - [Postgres Token State Backend](/sdks/connectors/postgres/) ## Related Guides - [Verification Layer Overview](/sdks/verification-layer/) - [Protect an MCP Server](/guides/protect-mcp/) - [Protect an Express App](/guides/protect-express/) - [Protect a FastAPI App](/guides/protect-fastapi/) - [Protect a FastMCP App](/guides/protect-fastmcp/) - [Protect a Go net/http Service](/guides/protect-nethttp/) --- # Express Connector # URL: https://docs.caracal.run/sdks/connectors/express/ # Markdown: https://docs.caracal.run/markdown/sdks/connectors/express.md # Type: page # Concepts: # Requires: --- `@caracalai/mcp-express` protects Express routes by parsing the bearer token, verifying the mandate through `@caracalai/transport-mcp`, and attaching Caracal claims to the request. ## Install ```bash npm install @caracalai/mcp-express @caracalai/transport-mcp @caracalai/revocation-redis ``` The connector has an Express `^5.0.0` peer dependency and targets Node `>=22`. ## Middleware ```ts import express from "express"; import { caracalAuth } from "@caracalai/mcp-express"; import { createMandateVerifier } from "@caracalai/transport-mcp"; import { RedisRevocationStore } from "@caracalai/revocation-redis"; const app = express(); const verifier = createMandateVerifier({ issuer: "https://sts.example.com", audience: "https://mcp.example.com", zoneId: "zone_prod", revocations: new RedisRevocationStore(redis), }); app.use("/mcp", caracalAuth({ verifier }, { requiredScopes: ["mcp:tool:call"], requiredTargets: ["https://mcp.example.com"], requireAgent: true, })); ``` ## Request shape The middleware attaches Caracal claims to `req.caracal` and `req.caracalClaims` when verification succeeds. Use the exported `CaracalRequest` type when a handler needs typed access. ```ts import type { CaracalRequest } from "@caracalai/mcp-express"; app.post("/mcp/tools/search", (req: CaracalRequest, res) => { res.json({ subject: req.caracal?.sub }); }); ``` ## Failure behavior Failed verification returns an MCP transport error code such as `missing_token`, `invalid_token`, `insufficient_scope`, or `session_revoked`, plus a safe `error_hint` field. Pair the connector with a shared revocation store when multiple resource-server instances serve the same resource. ## Related pages - [Protect an Express App](/guides/protect-express/) - [MCP Auth Transport](/sdks/transport-mcp/) - [Redis Revocation Store](/sdks/connectors/redis/) --- # FastMCP Connector # URL: https://docs.caracal.run/sdks/connectors/fastmcp/ # Markdown: https://docs.caracal.run/markdown/sdks/connectors/fastmcp.md # Type: page # Concepts: # Requires: --- FastMCP connectors expose small verifier APIs that delegate to the MCP transport packages. Use them when your FastMCP integration needs to authenticate a bearer token before running a tool. ## Install | Ecosystem | Package | | --- | --- | | TypeScript | `npm install @caracalai/mcp-fastmcp @caracalai/transport-mcp @caracalai/revocation` | | Python | `pip install caracalai-mcp-fastmcp` | ## TypeScript verifier ```ts import { extractBearer, verifyFastMcpToken } from "@caracalai/mcp-fastmcp"; import { createMandateVerifier } from "@caracalai/transport-mcp"; import { InMemoryRevocationStore } from "@caracalai/revocation"; const verifier = createMandateVerifier({ issuer: "https://sts.example.com", audience: "https://mcp.example.com", zoneId: "zone_prod", revocations: new InMemoryRevocationStore(), }); const token = extractBearer(request.headers.get("authorization") ?? ""); if (!token) throw new Error("missing bearer token"); const context = await verifyFastMcpToken(token, verifier, { requiredScopes: ["mcp:tool:call"], requiredTargets: ["https://mcp.example.com"], requireAgent: true, }); console.log(context.sub, context.zoneId, context.scope); ``` `verifyFastMcpToken()` returns `{ sub, zoneId, scope }` or throws `FastMcpAuthError`. ## Python verifier ```python from caracalai_mcp_fastmcp import CaracalAuth, CaracalAuthError auth = CaracalAuth( issuer="https://sts.example.com", audience="https://mcp.example.com", zone_id="zone_prod", required_scopes=["mcp:tool:call"], required_targets=["https://mcp.example.com"], require_agent=True, revocations=revocations, ) try: context = await auth.verify_token(token) except CaracalAuthError as exc: raise RuntimeError(exc.code) from exc ``` ## Boundary The connector verifies tokens; it does not create agent sessions or delegation edges. Use the [Python SDK](/sdks/python/) or [TypeScript SDK](/sdks/typescript/) to create Caracal context before making outbound calls. ## Related pages - [Protect a FastMCP App](/guides/protect-fastmcp/) - [MCP Auth Transport](/sdks/transport-mcp/) - [Sessions and Revocation](/concepts/sessions-revocation/) --- # Go net/http Connector # URL: https://docs.caracal.run/sdks/connectors/nethttp/ # Markdown: https://docs.caracal.run/markdown/sdks/connectors/nethttp.md # Type: page # Concepts: # Requires: --- The Go net/http connector wraps handlers with MCP transport verification and stores verified claims in the request context. ## Install ```bash go get github.com/garudex-labs/caracal/packages/connectors/nethttp/go ``` ## Middleware ```go import ( "net/http" "time" nethttp "github.com/garudex-labs/caracal/packages/connectors/nethttp/go" revocation "github.com/garudex-labs/caracal/packages/revocation/go" transportmcp "github.com/garudex-labs/caracal/packages/transport/mcp/go" ) revocations := revocation.NewInMemoryStore(24 * time.Hour) verifier := transportmcp.NewVerifier(transportmcp.Options{ Issuer: "https://sts.example.com", Audience: "https://api.example.com", ZoneID: "zone_prod", Revocations: revocations, }) handler := nethttp.VerifierMiddleware(verifier.Require(transportmcp.Options{ RequiredScopes: []string{"tickets:read"}, RequiredTargets: []string{"https://api.example.com/tickets"}, RequireAgent: true, }))(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { claims, ok := nethttp.ClaimsFromContext(r.Context()) if !ok { http.Error(w, "missing claims", http.StatusUnauthorized) return } _, _ = w.Write([]byte(claims.Sub)) })) ``` ## APIs | API | Purpose | | --- | --- | | `Middleware(opts)` | Return middleware that verifies the bearer token and rejects failed requests. | | `VerifierMiddleware(verifier)` | Return middleware backed by a reusable verifier with shared defaults. | | `ClaimsFromContext(ctx)` | Retrieve verified Caracal claims inside a handler. | ## Failure behavior The middleware maps MCP transport auth errors to HTTP failures before the handler runs and includes a safe `error_hint` in JSON failures. Use a shared revocation store through `transportmcp.Options` in production so revoked sessions are rejected consistently across service instances. ## Related pages - [Protect a Go net/http Service](/guides/protect-nethttp/) - [Go SDK](/sdks/go/) - [MCP Auth Transport](/sdks/transport-mcp/) --- # MCP Auth Transport # URL: https://docs.caracal.run/sdks/transport-mcp/ # Markdown: https://docs.caracal.run/markdown/sdks/transport-mcp.md # Type: page # Concepts: # Requires: --- The MCP auth transport packages verify Caracal mandates without tying the check to Express, FastMCP, or Go net/http. Framework connectors build on this layer. ## Install | Ecosystem | Package | | --- | --- | | TypeScript | `npm install @caracalai/transport-mcp` | | Python | `pip install caracalai-transport-mcp` | | Go | `go get github.com/garudex-labs/caracal/packages/transport/mcp/go` | ## Core APIs | API | Purpose | | --- | --- | | `extractBearer` / `extract_bearer` / `ExtractBearer` | Parse a bearer token from an `Authorization` header. | | `createMandateVerifier` / `NewVerifier` | Build a reusable verifier with secure defaults, JWKS caching, revocation checks, and per-route overrides. | | `create_mandate_verifier` | Python reusable verifier with the same defaults, warmup, and per-route override model. | | `authenticate` / `Authenticate` | Verify one token against identity claims and revocation anchors. | | `checkActiveAuthority` / `check_active_authority` / `CheckActiveAuthority` | Check expiry and revoked anchors for verified claims. | ## TypeScript example ```ts import { createMandateVerifier } from "@caracalai/transport-mcp"; import { InMemoryRevocationStore } from "@caracalai/revocation"; const verifier = createMandateVerifier({ issuer: "https://sts.example.com", audience: "https://mcp.example.com", zoneId: "zone_prod", revocations: new InMemoryRevocationStore(), }); const result = await verifier.authorization(req.headers.authorization, { requiredScopes: ["mcp:tool:call"], requiredTargets: ["https://mcp.example.com"], requireAgent: true, }); if (!result.ok) { throw new Error(`${result.error.code}: ${result.error.hint}`); } ``` Use one verifier per resource server and add route-level scopes or targets through `verifier.authorization(..., overrides)` or `verifier.require(overrides)`. The verifier defaults to resource mandates, checks the STS issuer and audience, enforces zone/scopes/targets/agent/delegation/hop constraints, checks expiry, and queries all revocation anchors for the session, root session, agent session, and delegation edge. ## Python example ```python from caracalai_revocation import InMemoryRevocationStore from caracalai_transport_mcp import AuthOptions, create_mandate_verifier verifier = create_mandate_verifier( AuthOptions( issuer="https://sts.example.com", audience="https://api.example.com", expected_zone_id="zone_prod", revocations=InMemoryRevocationStore(), ) ) await verifier.warmup() result = await verifier.authorization( request.headers.get("authorization"), required_scopes=["tickets:read"], required_targets=["https://api.example.com/tickets"], ) if not result.ok: raise PermissionError(f"{result.error.code}: {result.error.hint}") ``` ## Error codes `authenticate` and reusable verifiers normalize failures into transport error codes: `missing_token`, `invalid_token`, `invalid_zone`, `insufficient_scope`, `session_revoked`, `agent_required`, `delegation_required`, `chain_mismatch`, and `hop_count_exceeded`. TypeScript, Python, and Go reusable verifiers include a safe debugging hint for operator logs and API error bodies. ## Related pages - [Verification Layer Overview](/sdks/verification-layer/) - [Protect an MCP Server](/guides/protect-mcp/) - [Express Connector](/sdks/connectors/express/) - [FastMCP Connector](/sdks/connectors/fastmcp/) - [Go net/http Connector](/sdks/connectors/nethttp/) --- # Identity Package # URL: https://docs.caracal.run/sdks/identity/ # Markdown: https://docs.caracal.run/markdown/sdks/identity.md # Type: page # Concepts: # Requires: --- The identity packages verify Caracal mandate JWTs. Use them when you are building a connector, transport, or custom resource-server boundary that accepts mandates directly. ## Install | Ecosystem | Package | | --- | --- | | TypeScript | `npm install @caracalai/identity` | | Python | `pip install caracalai-identity` | | Go | `go get github.com/garudex-labs/caracal/packages/identity/go` | Node packages target Node `>=22`; Python packages require Python `>=3.12`. ## Verification inputs | Option | Meaning | | --- | --- | | `issuer` / `Issuer` | Expected STS issuer. | | `audience` / `Audience` | Expected audience for the mandate. | | `zoneId` / `expected_zone_id` / `ZoneID` | Zone claim check. When omitted, the verifier reads the unverified `zone_id` claim to select the zone keyset, then proves it through signature verification. | | `requiredScopes` / `required_scopes` / `RequiredScopes` | Scopes every accepted mandate must contain. | | `requiredTargets` / `required_targets` / `RequiredTargets` | Target resources every accepted mandate must include. | | `requiredUse` / `required_use` / `RequiredUse` | Token use, usually `resource`. | | `requireAgent` / `require_agent` / `RequireAgent` | Require agent-session claims. | | `requireDelegation` / `require_delegation` / `RequireDelegation` | Require a delegation edge. | | `requireChainContains` / `require_chain_contains` / `RequireChainContains` | Require an application in the delegation chain. | | `maxHopCount` / `max_hop_count` / `MaxHopCount` | Cap delegation chain depth. | ## TypeScript example ```ts import { verify } from "@caracalai/identity"; const claims = await verify(token, { issuer: "https://sts.example.com", audience: "https://api.example.com", zoneId: "zone_prod", requiredScopes: ["tickets:read"], requiredTargets: ["https://api.example.com/tickets"], }); ``` ## JWKS caching STS signing keysets are zone-scoped: every fetch hits `{issuer}/.well-known/jwks.json?zone_id={zone}` and is cached per issuer and zone. TypeScript verifiers use an in-memory JWKS cache by default. Build a cache explicitly when a resource server wants shorter local-development TTLs, a custom fetch implementation, or warmup during service boot. ```ts import { createJwksCache, verify } from "@caracalai/identity"; const jwksCache = createJwksCache({ ttlMs: 300_000, fetchTimeoutMs: 5_000 }); await jwksCache.warm("https://sts.example.com", "zone_prod"); const claims = await verify(token, { issuer: "https://sts.example.com", audience: "https://api.example.com", zoneId: "zone_prod", jwksCache, }); ``` Go and Python identity packages also cache issuer JWKS metadata in memory. Python exposes `warm_jwks(issuer, zone_id)` for service boot; Go callers can warm the cache with `GetJWKSContext(ctx, issuer, zoneID)`. ## Failure classes | Error | Meaning | | --- | --- | | `TokenInvalidError` | Signature, issuer, audience, expiry, use, or claim validation failed. | | `ZoneInvalidError` | The token zone did not match the expected zone. | | `ScopeInsufficientError` | A required scope is missing. | | `AgentIdentityRequiredError` | The verifier requires an agent identity. | | `DelegationRequiredError` | The verifier requires delegated authority. | | `ChainMismatchError` | The delegation chain is missing a required application. | | `HopCountExceededError` | The mandate exceeds the configured hop limit. | ## When to Use MCP Auth Transport Use [MCP Auth Transport](/sdks/transport-mcp/) when you also need revocation checks, reusable verifier defaults, bearer-header parsing, and safe debugging hints. Use the identity package directly when you are composing your own verifier or connector boundary. --- # Revocation Package # URL: https://docs.caracal.run/sdks/revocation/ # Markdown: https://docs.caracal.run/markdown/sdks/revocation.md # Type: page # Concepts: # Requires: --- The revocation packages define the store contract resource servers use to reject mandates after a session, root session, agent session, or delegation edge has been revoked. ## Install | Ecosystem | Package | | --- | --- | | TypeScript | `npm install @caracalai/revocation` | | Python | `pip install caracalai-revocation` | | Go | `go get github.com/garudex-labs/caracal/packages/revocation/go` | ## Contract | Operation | Meaning | | --- | --- | | `isRevoked(sid)` / `is_revoked(sid)` / `IsRevoked(sid)` | Return whether an anchor is revoked. | | `markRevoked(sid, ttl)` / `mark_revoked(sid, ttl)` / `MarkRevoked(sid, ttl)` | Record an anchor as revoked for a TTL. | ## In-memory stores Use in-memory stores for local development, tests, and single-process examples: | Ecosystem | In-memory API | | --- | --- | | TypeScript | `new InMemoryRevocationStore({ defaultTtlMs, maxEntries })` | | Python | `InMemoryRevocationStore(default_ttl_ms=...)` | | Go | `revocation.NewInMemoryStore(defaultTTL)` | In-memory stores do not share revocation state across processes. Production resource servers should use a shared backend and consume `caracal.sessions.revoke`. ## Production path Use [Redis Revocation Store](/sdks/connectors/redis/) for multi-instance resource servers. The Redis connector reads signed revocation stream messages, marks every revocation anchor, and lets verifiers fail closed when Redis is unavailable. ## Related pages - [Sessions and Revocation](/concepts/sessions-revocation/) - [MCP Auth Transport](/sdks/transport-mcp/) - [Protect an MCP Server](/guides/protect-mcp/) --- # OAuth Package # URL: https://docs.caracal.run/sdks/oauth/ # Markdown: https://docs.caracal.run/markdown/sdks/oauth.md # Type: page # Concepts: # Requires: --- The OAuth packages exchange existing subject authority for resource mandates through the STS `/oauth/2/token` endpoint. ## Install | Ecosystem | Package | | --- | --- | | TypeScript | `npm install @caracalai/oauth` | | Python | `pip install caracalai-oauth` | | Go | `go get github.com/garudex-labs/caracal/packages/oauth/go` | ## Exchange inputs | Option | Meaning | | --- | --- | | Subject token | Existing user, service, agent, or application authority. | | Resource | Resource identifier or audience requested from STS. | | Client secret or assertion | Application authentication for client-secret flows. | | Actor token | Optional actor authority for delegated exchanges. | | Session ID | Subject session anchor. | | Agent session ID | Agent execution anchor. | | Delegation edge ID | Delegated authority anchor. | | Scopes | Requested resource scopes. | | TTL seconds | Requested mandate lifetime. | ## TypeScript example ```ts import { OAuthClient, InteractionRequiredError } from "@caracalai/oauth"; const oauth = new OAuthClient(stsUrl, zoneId, applicationId); try { const token = await oauth.exchange(subjectToken, "https://api.example.com/tickets", { scopes: ["tickets:read"], clientSecret: process.env.CARACAL_APP_CLIENT_SECRET, }); console.log(token.accessToken, token.expiresIn); } catch (error) { if (error instanceof InteractionRequiredError) { console.log("step-up required", error.challengeId); } else { throw error; } } ``` ## Behavior - Successful responses are validated for `access_token`, `token_type`, and `expires_in`. - Token responses are cached by identity, resource, scopes, TTL, and credential context. - Concurrent identical exchanges share in-flight work. - Transient HTTP statuses are retried with bounded backoff. - STS `interaction_required` responses surface as `InteractionRequiredError`. ## Related pages - [Step-Up Re-Authentication](/guides/step-up/) - [Mandates](/concepts/mandate/) - [Use STS Endpoint](/api/sts/) --- # A2A Transport # URL: https://docs.caracal.run/sdks/transport-a2a/ # Markdown: https://docs.caracal.run/markdown/sdks/transport-a2a.md # Type: page # Concepts: # Requires: --- `@caracalai/transport-a2a` helps one agent call another agent endpoint through a Caracal-authorized A2A request. ## Install ```bash npm install @caracalai/transport-a2a ``` The package targets Node `>=22` and depends on `@caracalai/oauth` and `@caracalai/sdk`. ## Core API ```ts import { a2aCall } from "@caracalai/transport-a2a"; const response = await a2aCall({ agentUrl: "https://agent.example.com", resource: "https://agent.example.com", method: "summarize", params: { ticketId: "T-123" }, requestId: crypto.randomUUID(), scopes: ["agent:call"], }, subjectToken, zoneId, applicationId, { stsUrl: process.env.CARACAL_STS_URL!, clientSecret: process.env.CARACAL_APP_CLIENT_SECRET, }); ``` ## Request fields | Field | Meaning | | --- | --- | | `agentUrl` | Target agent base URL; the helper posts to `${agentUrl}/a2a`. | | `resource` | Optional STS resource; defaults to `agentUrl`. | | `method` | A2A method name. | | `params` | Method payload. | | `requestId` | Correlation ID; response must echo it. | | `scopes` | STS scopes for the target call. | | `sessionId`, `agentSessionId`, `delegationEdgeId` | Optional context anchors. | | `metadata` | Optional request metadata forwarded in the A2A body. | ## Options | Option | Meaning | | --- | --- | | `stsUrl` | STS base URL. | | `clientSecret`, `clientAssertion`, `clientAssertionType` | Application authentication for token exchange. | | `ttlSeconds` | Requested mandate TTL. | | `timeoutMs`, `retries`, `retryBaseMs` | Exchange and request retry controls. | | `retryTransientStatuses` | Retry transient A2A HTTP responses when true. | | `fetchImpl` | Custom fetch-like implementation. | | `oauthClient` | Test or custom token-exchange client. | ## Validation behavior The helper exchanges a mandate, injects Caracal envelope headers, posts the A2A JSON body, and verifies the response contains the same `requestId` and a `result` field. --- # Admin Package # URL: https://docs.caracal.run/sdks/admin/ # Markdown: https://docs.caracal.run/markdown/sdks/admin.md # Type: page # Concepts: # Requires: --- `@caracalai/admin` is the automation client for control-plane objects. It mirrors Console management workflows for scripts and services. ## Install ```bash npm install @caracalai/admin ``` ## Create a client ```ts import { AdminClient } from "@caracalai/admin"; const admin = new AdminClient({ apiUrl: process.env.CARACAL_API_URL!, coordinatorUrl: process.env.CARACAL_COORDINATOR_URL, adminToken: process.env.CARACAL_ADMIN_TOKEN!, coordinatorToken: process.env.CARACAL_COORDINATOR_TOKEN, }); ``` `apiUrl` and `adminToken` are required for API-backed resources. `coordinatorUrl` and `coordinatorToken` are required for agent and delegation methods. ## API groups | Group | Methods | | --- | --- | | `zones` | `list`, `get`, `create`, `patch`, `delete` | | `applications` | `list`, `get`, `create`, `patch`, `delete`, `dcr` | | `resources` | `list`, `get`, `create`, `patch`, `delete` | | `providers` | `list`, `get`, `create`, `patch`, `delete` | | `policies` | `list`, `get`, `create`, `validate`, `addVersion`, `delete` | | `policyTemplates` | `list`, `get` | | `policySets` | `list`, `get`, `create`, `addVersion`, `simulate`, `activate`, `delete` | | `grants` | `list`, `get`, `create`, `revoke` | | `sessions` | `list` | | `audit` | `list`, `byRequest`, `explain` | | `agents` | `list`, `get`, `children`, `suspend`, `resume`, `terminate` | | `delegations` | `active`, `inbound`, `outbound`, `traverse`, `impact`, `revoke` | ## Policy activation example ```ts const policy = await admin.policies.create(zoneId, { name: "tickets-read", content: policySource, }); const set = await admin.policySets.create(zoneId, "tickets"); const version = await admin.policySets.addVersion(zoneId, set.id, [ { policy_version_id: policy.version.id }, ]); await admin.policySets.activate(zoneId, set.id, version.id); ``` ## Dynamic Client Registration (DCR) DCR is the **only** way to create short-lived, self-registering client identities, and it is **programmatic-only** — Console creates managed applications, not DCR applications. Use `applications.dcr()` from a control-plane workload (per-tenant onboarding, a CI job, a per-integration identity) that already holds an admin token. ```ts const app = await admin.applications.dcr(zoneId, { name: "tenant-acme-job", expires_in: 900, // seconds; capped at 3600 }); // app.client_secret is returned ONCE and never retrievable again. ``` Creation is hardened server-side and cannot be misused as open self-registration: - **Admin token required** — the endpoint sits behind admin-bearer auth; a workload must hold a real, revocable, zone-scoped admin credential. Agent SDKs (`caracalai`) cannot create applications. - **Zone feature gate** — refused with `dcr_disabled` unless an operator enabled `dcr_enabled` on the zone. - **Rate-limited and capped** — per-actor request limiting plus a per-zone cap on live DCR applications (`dcr_rate_limit_exceeded` / `dcr_limit_exceeded`). - **Short-lived by construction** — `expires_in` is capped at one hour; expired applications are denied at token authentication and later archived by DCR cleanup. - **Secret hygiene** — the client secret is generated server-side, stored only as a hash, and returned exactly once. - **Authority is still policy-bound** — a DCR application is a credential, not a grant. It authenticates default-deny and receives no tool access until a policy (typically keyed on `registration_method == "dcr"`) grants scopes. DCR applications remain visible read-only in Console under the `dcr` method for audit and inspection. ## Rotating a managed application secret A managed application's client secret can be rotated without recreating the application. Send the new secret on a patch; the server stores only its hash and the previous secret stops working immediately. ```ts await admin.applications.patch(zoneId, appId, { client_secret: newSecret, // store it yourself; the patch response does not echo it }); ``` Rotation is rejected with `client_secret_not_configured` if the application has no secret to replace. The Console exposes the same operation as the **rotate secret** action on managed applications, generating a strong secret and revealing it once. DCR applications are not rotated — they are short-lived and replaced by re-registration. ## Error handling Failed HTTP responses throw `AdminApiError` with `status`, `code`, parsed response details, and the base surface (`api` or `coordinator`). Non-idempotent write methods are not retried; read methods retry transient statuses. ## Boundary Use the Admin package for automation. Use the Console for human workflows. Do not expose Admin operations as top-level `caracal` runtime commands. --- # Redis Revocation Store # URL: https://docs.caracal.run/sdks/connectors/redis/ # Markdown: https://docs.caracal.run/markdown/sdks/connectors/redis.md # Type: page # Concepts: # Requires: --- Redis connectors provide shared revocation state for resource servers. They let multiple service instances reject mandates after sessions, roots, agents, or delegation edges are revoked. ## Install | Ecosystem | Package | | --- | --- | | TypeScript | `npm install @caracalai/revocation-redis` | | Python | `pip install caracalai-revocation-redis` | | Go | `go get github.com/garudex-labs/caracal/packages/connectors/redis/go` | ## Defaults | Setting | Default | | --- | --- | | Revocation stream | `caracal.sessions.revoke` | | Consumer group | `resource-revocation` | | Revocation TTL | 24 hours | | Signature support | Optional HMAC verification for stream messages. | ## TypeScript store ```ts import { RedisRevocationStore } from "@caracalai/revocation-redis"; const revocations = new RedisRevocationStore(redis, { defaultTtlMs: 24 * 60 * 60 * 1000, }); await revocations.markRevoked("sess_123"); const blocked = await revocations.isRevoked("sess_123"); ``` ## Stream consumer Use the stream consumer when your resource server should learn revocations from the audit/control plane instead of marking them locally. ```ts import { RedisRevocationConsumer } from "@caracalai/revocation-redis"; const consumer = new RedisRevocationConsumer(redis, revocations, { consumerName: "api-1", hmacSecret: process.env.CARACAL_REVOCATION_HMAC_SECRET, }); await consumer.start(); ``` Stream events can revoke multiple anchors. Resource servers should treat Redis or signature failures according to their risk posture; high-risk resources should fail closed. ## Related pages - [Revocation Package](/sdks/revocation/) - [Sessions and Revocation](/concepts/sessions-revocation/) - [MCP Auth Transport](/sdks/transport-mcp/) --- # Postgres Token State Backend # URL: https://docs.caracal.run/sdks/connectors/postgres/ # Markdown: https://docs.caracal.run/markdown/sdks/connectors/postgres.md # Type: page # Concepts: # Requires: --- `@caracalai/tokenstate-postgres` stores token-state rows in Postgres for TypeScript services that need durable mandate state. ## Install ```bash npm install @caracalai/tokenstate-postgres ``` The package targets Node `>=22`. ## Schema The package exports `MCP_TOKEN_STATE_DDL`, which creates: ```sql CREATE TABLE IF NOT EXISTS mcp_token_state ( zone_id TEXT NOT NULL, sub TEXT NOT NULL, scope TEXT NOT NULL, expires_at TIMESTAMPTZ NOT NULL, updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), PRIMARY KEY (zone_id, sub, scope) ); ``` ## Backend ```ts import { MCP_TOKEN_STATE_DDL, PostgresBackend } from "@caracalai/tokenstate-postgres"; await pool.query(MCP_TOKEN_STATE_DDL); const backend = new PostgresBackend(pool); await backend.set({ zone_id: "zone_prod", sub: "agent_123", scope: "tickets:read", expires_at: new Date(Date.now() + 60_000).toISOString(), }); ``` ## Use cases - Persisting active token state across service restarts. - Sharing token-state reads across multiple TypeScript resource-server instances. - Building custom resource-server controls that need a small `(zone_id, sub, scope)` state table. ## Boundary This package is a token-state backend. It is not the revocation stream consumer. Use [Redis Revocation Store](/sdks/connectors/redis/) when you need to consume `caracal.sessions.revoke`. --- # Secure Caracal # URL: https://docs.caracal.run/security/ # Markdown: https://docs.caracal.run/markdown/security.md # Type: landing # Concepts: # Requires: --- Security in Caracal centers on pre-execution authority: STS issues scoped mandates, Gateway enforces protected-resource access, Coordinator tracks agent/delegation state, Audit preserves evidence, and Operations keeps storage, streams, secrets, and releases recoverable. ## Security Model ```mermaid flowchart LR Operator[Operator] --> API[API and Console] Workload[Workload] --> STS[STS] Request[Protected request] --> Gateway[Gateway] Agent[Agent SDK] --> Coordinator[Coordinator] API --> Audit[Audit evidence] STS --> Audit Gateway --> Audit Coordinator --> Audit Audit --> Postgres[(Append-only audit)] ``` ## Security Review Path | Need | Page | | --- | --- | | Understand assets, boundaries, threats, and mitigations | [Review the Threat Model](/security/threat-model/) | | Harden a production deployment | [Harden Security Posture](/security/hardening/) | | Verify a release before installing | [Verify a Release](/security/verify-releases/) | | Demonstrate assurance to a reviewer or auditor | [Generate an Evidence Pack](/security/evidence-pack/) | | Report a vulnerability privately | [Report a Vulnerability](/security/disclosure/) | | Operate a security incident | [Respond to Incidents](/operations/incident-response/) | | Evaluate the project for enterprise adoption | [Enterprise Security Readiness](/security/enterprise-readiness/) | ## Core Invariants - STS must fail closed on policy, key, session, revocation, replay, step-up, and signing errors. - Gateway must never trust caller-supplied destinations and must deny before upstream dispatch when authority or routing is unsafe. - Redis stream messages require HMAC signing in published modes where configured. - Audit evidence must remain append-only, tamper-evident, and recoverable through replay/DLQ paths. - Secrets must come from secret files or platform secret managers in production. - The open-source product must not depend on enterprise-only code or controls. ## Next Step Read [Review the Threat Model](/security/threat-model/) before hardening a deployment. ## Related Sections - [Enforce Boundaries](/architecture/trust-boundaries/) - [Harden Production](/operations/tls-hardening/) - [Rotate Keys and Secrets](/operations/key-management/) - [Configure Alerts](/operations/alerts/) - [Compare Editions](/enterprise/) --- # Review the Threat Model # URL: https://docs.caracal.run/security/threat-model/ # Markdown: https://docs.caracal.run/markdown/security/threat-model.md # Type: reference # Concepts: # Requires: --- The repository threat model covers the current open-source Caracal runtime, services, packages, deployment assets, and release artifacts. ## Assurance Case Caracal's security requirements are met under an explicit assurance case maintained in [`governance/THREAT_MODEL.md`](https://github.com/Garudex-Labs/caracal/blob/main/governance/THREAT_MODEL.md). It argues the claim from four pillars: - **Threat model:** threats T1–T12 each carry an owner, a required mitigation, and a verification step. - **Trust boundaries:** every boundary names what is untrusted and where mediation happens (see [Enforce Boundaries](/architecture/trust-boundaries/)). - **Secure design principles:** deny-by-default, complete mediation, least privilege, defense in depth, and separation of privilege are enforced in STS, gateway, control, and the data layer. - **Implementation weaknesses countered:** input validation, token-confusion prevention, SSRF blocking, secret redaction, audit/stream integrity, and a signed, provenance-attested release path, each backed by tests and scanners. ## In-Scope Areas | Area | Scope | | --- | --- | | API | Control-plane routes, admin auth, product state, step-up APIs, admin audit. | | Coordinator | Agent lifecycle, delegation, invocation, TTL, retention, outbox. | | STS | OAuth token exchange, signing, JWKS, policy decisions, replay, revocation, step-up, audit emission. | | Gateway | Proxy enforcement, upstream safety, STS exchange, bindings, replay, revocation checks. | | Audit | Redis stream consumption, append-only ledger, HMAC chain, tamper sweeps, retention, export. | | Control | Optional invoke endpoint, engine catalog enforcement, JTI replay, rate limits, audit. | | Relay and streams | Coordinator lifecycle relay, Redis Streams signatures, dedupe, pending-entry handling. | | Packages and infra | SDKs, transports, connectors, Docker, Helm, migrations, installers, release artifacts. | Enterprise-only code and customer deployments outside the provided deployment model are out of scope for this open-source threat model. ## Assets | Asset | Why it matters | | --- | --- | | Agent and application authority | Controls what autonomous agents can access and do. | | Policies, grants, zones, resource bindings | Define authorization and proxy destinations. | | Signing keys, KEKs, admin tokens, client secrets, database/Redis credentials | Compromise enables impersonation or service takeover. | | Tokens, sessions, JTIs, revocations, step-up state | Enforce identity, replay prevention, expiry, and emergency denial. | | Audit events and chain state | Provide evidence for incidents and tamper detection. | | Redis Streams and outboxes | Carry lifecycle, invalidation, audit, and revocation events. | | Images, installers, releases, lockfiles | Define what users execute. | ## Threats and Mitigations | Threat | Required mitigation | | --- | --- | | Control-plane mutation bypass | Mandatory auth hooks, schemas, zone guards, and admin audit. | | STS over-issues authority | Deny-by-default policy, grant/session validation, step-up, replay, revocation, and signing checks. | | Gateway forwards unsafely | Per-request STS exchange, upstream bindings, SSRF guard, header stripping, timeouts, replay and revocation checks. | | Agent/delegation state races | Transactions, locks, outbox events, dedupe, bounded relay retry. | | Secrets leak | File-secret resolution, redaction, no plaintext key material in responses/logs/audit/examples. | | Agent reads operator secrets | Operator secrets outside source workspaces, private operator directories, no secret mounts in agent containers, workload-only token injection, and separate users/containers for untrusted agents. | | Audit evidence is missing or forgeable | Append-only audit role, HMAC signing, DLQ, replay, tamper checks. | | Stream messages are forged or replayed | HMAC signatures, dedupe, pending-entry reclaim, ack only after durable handling. | | Control becomes unsafe command execution | Gate file, JWT auth, scope checks, engine dispatch only, no shelling out, audit. | | Release artifact compromise | Lockfile review, trusted workflows, image/archive validation, installer secret scans. | | Product boundary drift | Update threat model and reject OSS changes that depend on enterprise-only code. | ## Validation Use targeted suites for the changed boundary: API/Coordinator TypeScript tests, STS/Gateway/Audit Go tests, Control/Engine tests, stream consumer tests, release checks, and broader `pnpm run ci` when shared packages, crypto, config, release, or infra change. ## Next Step Use [Harden Security Posture](/security/hardening/) to translate the threat model into deployment checks. ## Related Pages - [Enforce Boundaries](/architecture/trust-boundaries/) - [Respond to Incidents](/operations/incident-response/) - [Validate Changes](/contributing/testing/) --- # Harden Security Posture # URL: https://docs.caracal.run/security/hardening/ # Markdown: https://docs.caracal.run/markdown/security/hardening.md # Type: reference # Concepts: # Requires: --- Use this checklist before exposing Caracal to production workloads or protected upstreams. ## Deployment Mode | Check | Expected | | --- | --- | | Mode | `CARACAL_MODE=stable` or `rc`; never rely on `dev` defaults for production. | | Ports | Public ingress exposes only required API/Gateway endpoints; storage remains private. | | Network | NetworkPolicy or equivalent restricts ingress and egress. | | Containers | Non-root, dropped capabilities, no-new-privileges, read-only root filesystem where supported. | ## Secrets and Keys | Check | Expected | | --- | --- | | Secret delivery | Mounted secret files or platform secret manager, not inline production secrets. | | HMAC keys | `AUDIT_HMAC_KEY`, `STREAMS_HMAC_KEY`, and `GATEWAY_STS_HMAC_KEY` are strong and rotated under control. | | Zone KEK | `ZONE_KEK` is protected and backed up with the database. | | Admin/Coordinator tokens | Stored privately, rotated, and scoped to operator need. | | Runtime profiles | `caracal.toml` and secret files are owner-only when written locally. | ## Access Enforcement | Check | Expected | | --- | --- | | STS | Fails closed on invalid client credentials, policy denial, revoked sessions, replay, invalid delegation, and unsatisfied step-up. | | Gateway | Requires bearer token and `X-Caracal-Resource`, validates binding, rejects path traversal and unsafe upstreams. | | Control | Disabled unless explicitly needed; invoke endpoint requires gate, JWT, replay protection, rate limit, and audit. | | Resource servers | Verify mandate signature, issuer, audience, scopes, token use, agent/delegation requirements, hop limits, and revocation. | ## Audit and Recovery | Check | Expected | | --- | --- | | Audit stream | `caracal.audit.events` and DLQ are monitored. | | Tamper checks | Audit tamper alerts page the security/on-call path. | | Backups | Postgres, runtime secrets, and audit exports are restorable. | | Replay | STS/Gateway audit replay volumes are preserved through rollouts. | ## Next Step Use [Report a Vulnerability](/security/disclosure/) if hardening review uncovers a suspected vulnerability. ## Related Pages - [Harden Production](/operations/tls-hardening/) - [Rotate Keys and Secrets](/operations/key-management/) - [Back Up and Retain Data](/operations/backup-retention/) --- # Report a Vulnerability # URL: https://docs.caracal.run/security/disclosure/ # Markdown: https://docs.caracal.run/markdown/security/disclosure.md # Type: reference # Concepts: # Requires: --- Report suspected vulnerabilities privately. Do not open public issues for credential exposure, policy bypass, unsafe execution, exploitable operational failures, or other security defects. ## Reporting Channels | Channel | Use for | | --- | --- | | GitHub private advisory | Open-source Caracal vulnerabilities: `https://github.com/Garudex-Labs/caracal/security/advisories/new` | | Email | Sensitive reports, attachments, patches, exploit demonstrations, or enterprise-related reports: `support@garudexlabs.com` | Enterprise-related vulnerabilities must be reported by email, not GitHub advisories. ## Email Format ```text Subject: [SECURITY][caracal] Short description 1. Summary 2. Steps to reproduce 3. Impact 4. Affected area 5. Suggested fix 6. Attachments ``` Keep reports clear, reproducible, and private. Do not include secrets in public channels or public pull requests. ## Response and Disclosure The maintainers aim to review and respond within up to 7 days. Resolution timing depends on complexity and validation needs. Public disclosure should wait until a fix or mitigation is available or maintainers decide not to address the issue. ## Related Pages - [Security Policy in the repository](https://github.com/Garudex-Labs/caracal/blob/main/.github/SECURITY.md) - [Respond to Incidents](/operations/incident-response/) - [Review the Threat Model](/security/threat-model/) --- # Use Examples # URL: https://docs.caracal.run/examples/ # Markdown: https://docs.caracal.run/markdown/examples.md # Type: landing # Concepts: # Requires: --- Examples show Caracal integrated into concrete applications and automation scripts. Use them after the first tutorials when you want runnable code that matches a specific integration job. ## Choose an example | Goal | Start here | Code path | | --- | --- | --- | | Need a local protected target for the first Gateway call. | [Run Echo Upstream](/examples/echo-upstream/) | `examples/echoUpstream` | | Want to automate zone setup from a script or pipeline through the Control API. | [Bootstrap Control State](/examples/control-bootstrap/) | `examples/controlBootstrap` | | Need to prove a provider-backed resource is ready. | [Check Provider Readiness](/examples/provider-preflight/) | `examples/providerPreflight` | | Need to turn a denial into a safe policy fix. | [Iterate Policy Safely](/examples/policy-iterate/) | `examples/policyIterate` | | Want to launch a plain CLI agent with injected provider credentials. | [Launch Research Agent](/examples/research-agent/) | `examples/ResearchAgent` | | Want a full app reference lab with agents, providers, Gateway, STS, and Console inspection. | [Run Lynx Capital](/examples/lynx-capital/) | `examples/lynxCapital` | ## Recommended order 1. Start with [Run Echo Upstream](/examples/echo-upstream/) if you have not completed a Gateway-mediated request yet. 2. Use [Bootstrap Control State](/examples/control-bootstrap/) when automation should own zone setup instead of manual Console clicks. 3. Run [Check Provider Readiness](/examples/provider-preflight/) before the first real provider-backed Gateway call. 4. Use [Iterate Policy Safely](/examples/policy-iterate/) when an audit denial needs to become a tested policy change. 5. Try [Launch Research Agent](/examples/research-agent/) to see `caracal run` inject provider-native credentials into an existing-style CLI process. 6. Study [Run Lynx Capital](/examples/lynx-capital/) when you need a full app topology and live Console inspection path. ## Use examples safely - Start Caracal through the released runtime and Console path described by the example. - Use Console for zones, applications, providers, resources, policies, control keys, and runtime profiles. - Keep example fixtures inside their own `examples/` directory. - Run each example's offline tests before adapting it. - Do not commit provider secrets, admin tokens, or real third-party credentials. ## Related sections - [Get Started](/get-started/) - [Tutorials](/tutorials/) - [Guides](/guides/) - [SDKs](/sdks/) - [Runtime and Console](/runtime-console/) --- # Run Echo Upstream # URL: https://docs.caracal.run/examples/echo-upstream/ # Markdown: https://docs.caracal.run/markdown/examples/echo-upstream.md # Type: workflow # Concepts: # Requires: --- Echo Upstream is a zero-dependency HTTP service under `examples/echoUpstream`. It stands in for the API you would put behind Caracal, so you can verify a Gateway-mediated request end to end without hosting your own upstream. Every response states whether the call was brokered by the Gateway or hit the service directly, and shows the evidence behind that verdict. ## What it demonstrates | Area | Behavior | | --- | --- | | Brokered-call proof | Reports `viaGateway` plus the request ID, trace context, and forwarding metadata the Gateway stamped on the request. | | Credential handling | Confirms the Gateway injected the brokered credential and redacts credential values from the echoed headers. | | Gateway reachability | Joins the `caracalData` Docker network so Gateway can reach `http://echoUpstream:8088`. | | Local debugging | Exposes `http://127.0.0.1:8088` on the host and logs one line per request, marked `[gateway]` or `[direct]`. | ## Run with Docker ```bash cd examples/echoUpstream docker compose -f compose.yml up --build ``` Check the local health endpoint: ```bash curl http://127.0.0.1:8088/healthz ``` ## Run with Node ```bash cd examples/echoUpstream npm start ``` The server listens on port `8088`. Set `ECHO_PORT` when you need a different local port. ## Use as the protected upstream In Console guided setup or through the Control API, create a resource with this upstream URL: ```text http://echoUpstream:8088 ``` Then send a request to the Gateway with the Caracal access token and resource ID from guided setup. The Console result view prints the exact command for your setup; it has this shape: ```bash curl http://localhost:8081/v1/hello \ -H "Authorization: Bearer $CARACAL_RESOURCE_PIPERNET_TOKEN" \ -H "X-Caracal-Resource: resource://pipernet" ``` ## Read the response A brokered call returns `"viaGateway": true` with a `gateway` section: | Field | What it proves | | --- | --- | | `viaGateway` | The request carried the Gateway's request ID and forwarding metadata. | | `gateway.requestId` | Audit handle — paste it into Console **explain** to trace the policy decision. | | `gateway.credentialInjected` | The Gateway brokered a credential the client never held. | | `request.headers` | The headers the upstream actually received, with credentials redacted. | Calling `http://127.0.0.1:8088/` directly returns `"viaGateway": false`, which makes the difference between a protected and an unprotected path visible. ## Test ```bash cd examples/echoUpstream npm test ``` ## Next step Continue to [Bootstrap Control State](/examples/control-bootstrap/) when you want repeatable setup for demo resources and policies. --- # Bootstrap Control State # URL: https://docs.caracal.run/examples/control-bootstrap/ # Markdown: https://docs.caracal.run/markdown/examples/control-bootstrap.md # Type: workflow # Concepts: # Requires: --- Control Bootstrap is the canonical Control API automation example under `examples/controlBootstrap`. A small CI/CD-style pipeline keeps one agent's environment — application, provider, resource, and policy — matching a declared plan, without adding product-management verbs to the runtime CLI. ## What it demonstrates | Area | Behavior | | --- | --- | | Automation surface | Calls `/v1/control/invoke` instead of using a root admin token. | | Identity model | Uses scoped, short-lived control keys created in Console, one scope tier per pipeline stage. | | Reconciliation | `apply` creates missing objects, patches drifted ones, and publishes a new policy version on content drift. | | CI gating | `verify` is a read-only drift check that exits non-zero when the zone does not match the plan. | | Safety model | Uses replay protection, rate limits, scoped control permissions, and audit. | The plan describes the PiperNet reporter agent's environment: the agent application, the `provider://pipernet-mandate` provider, the `resource://pipernet` resource wired to it, and the baseline policy that allows `read`. ## Console setup 1. Start the runtime and open Console: ```bash caracal up caracal status --ready caracal console ``` 2. Create or select the target zone. 3. Create a control key with only the scopes the stage needs: read/write on app, provider, resource, and policy for `apply`; read for `verify`; read/delete for `teardown`. 4. Save the one-time `client_id` and `client_secret`. STS resolves the zone from the bound control key. ## Run the pipeline ```bash cd examples/controlBootstrap cp env.example .env $EDITOR .env . .env npm run apply npm run verify ``` `apply` is idempotent: rerunning it against an in-sync zone changes nothing, and rerunning it against a drifted zone converges the drift. Run teardown when you want to remove the environment: ```bash npm run teardown ``` ## Files to study | File | Purpose | | --- | --- | | `controlClient.mjs` | Exchanges client credentials at STS and calls the Control API. | | `plan.mjs` | Declares the desired environment, drift checks, scope tiers, and env-driven config. | | `apply.mjs` | Reconciles the live zone with the plan. | | `verify.mjs` | Read-only drift gate for CI. | | `teardown.mjs` | Removes the environment in reverse dependency order. | ## Test ```bash cd examples/controlBootstrap npm test ``` The tests use a fake zone and mock transport and do not call a live Caracal stack. ## Next step Continue to [Check Provider Readiness](/examples/provider-preflight/) before sending traffic through a provider-backed Gateway resource. --- # Check Provider Readiness # URL: https://docs.caracal.run/examples/provider-preflight/ # Markdown: https://docs.caracal.run/markdown/examples/provider-preflight.md # Type: workflow # Concepts: # Requires: --- Provider Preflight is an automatable readiness check under `examples/providerPreflight`. Use it after creating a provider-backed resource, before cutting traffic over, and from CI before each deploy. It walks the same chain a real Gateway request depends on — control plane, Gateway dependencies, application identity, provider binding and configuration, network paths, and the active policy decision — and reports exactly which link is broken and how to fix it. ## What it checks Checks run in five phases: | Phase | Check | What it confirms | | --- | --- | --- | | readiness | Admin API readiness | The control plane responds ready on `GET /ready`. | | readiness | Gateway readiness | The Gateway responds ready, which verifies its bindings, Postgres, Redis, revocation freshness, audit replay, and STS. | | dependencies | Resource binding | The resource resolves to exactly one credential provider and a Gateway application. | | dependencies | Application | The application exists, is not expired, and matches the resource's routing binding. | | configuration | Provider configuration | Kind-required fields are present and `allowed_token_hosts` covers the token endpoint. | | configuration | Scope coverage | Every requested scope is declared on the resource. | | configuration | Runtime injection | Runtime-injection eligibility is enabled when required. | | connectivity | Token endpoint host | OAuth token endpoints are HTTPS and publicly routable. | | connectivity | Callback reachability | Authorization-code callback origins are HTTPS and reachable. | | connectivity | Upstream reachability | The protected upstream is reachable, with a warning when it resolves to a private address. | | authorization | Policy authorization | Simulating the active policy set with the input shape STS uses for real token exchanges returns `allow`. | The preflight fails closed. Any failed check exits non-zero, and every failure or warning prints a `fix:` line with the concrete remediation. ## Run the preflight ```bash cd examples/providerPreflight CARACAL_API_URL=http://127.0.0.1:3000 \ CARACAL_ADMIN_TOKEN= \ PREFLIGHT_ZONE_ID= \ PREFLIGHT_RESOURCE_ID= \ PREFLIGHT_APPLICATION_ID= \ PREFLIGHT_SCOPES=pipernet:read,pipernet:write \ npm run preflight ``` Optional settings: | Variable | Purpose | | --- | --- | | `PREFLIGHT_GATEWAY_URL` | Probes the Gateway's `/ready` endpoint; without it the Gateway phase reports a warning instead of validating. | | `PREFLIGHT_REQUIRE_RUNTIME_INJECTION=true` | Requires providers to allow runtime injection. | | `PREFLIGHT_OUTPUT=json` | Emits the full report as JSON for CI pipelines and dashboards. | Exit codes: `0` when all checks pass, `1` when any check fails, `2` when the preflight itself cannot run. ## Network position `Upstream reachability` and `Callback reachability` run from the host executing the script. Run it from a network position comparable to Gateway, or inside the cluster, when you need the reachability result to match production routing. ## Test ```bash cd examples/providerPreflight npm test ``` The tests inject network, DNS, and policy responses and do not call live systems. ## Next step Continue to [Iterate Policy Safely](/examples/policy-iterate/) when a denied request needs to become a tested policy change. --- # Iterate Policy Safely # URL: https://docs.caracal.run/examples/policy-iterate/ # Markdown: https://docs.caracal.run/markdown/examples/policy-iterate.md # Type: workflow # Concepts: # Requires: --- Policy Iterate is an audit-driven policy rollout loop under `examples/policyIterate`. Use it when a real request is denied and you want to test a candidate policy change against the exact redaction-safe audit input — and prove it does not change other decisions — before activation. ## Loop 1. **Diagnose** — a request is denied and you capture its audit `request_id`. The audit explain endpoint reconstructs the denied `policy_input` along with the diagnostics and determining policies. 2. **Simulate** — you edit the policy, stage a candidate policy-set version, and the example replays the denied input against it through the same OPA engine that serves live traffic. 3. **Regress** — the example replays your expected-decision cases against the same candidate, so loosening a policy for one caller cannot silently change decisions for everyone else. 4. **Decide** — activation is gated on evidence: the candidate allows the denied input, the rollout contract validates, simulation produced no warnings, and every regression case keeps its expected decision. 5. **Activate** — with `ACTIVATE=true` and a clean verdict, the example activates the version and polls activation status until the STS runtime reports it loaded. ## Run the iteration ```bash cd examples/policyIterate CARACAL_API_URL=http://127.0.0.1:3000 \ CARACAL_ADMIN_TOKEN= \ CARACAL_ZONE_ID= \ DENIED_REQUEST_ID= \ POLICY_SET_ID= \ CANDIDATE_VERSION_ID= \ REGRESSION_FILE=./regressions.json \ npm run iterate ``` The run is a dry run by default: it narrates each phase on stderr, prints a JSON report on stdout, and exits `0` only when the verdict is clean. Re-run with `ACTIVATE=true` to roll out the version and wait for propagation. `REGRESSION_FILE` is optional and points to a JSON array of `{ name, expect, input }` cases — see `regressions.example.json` in the example directory for the format. ## Claims note Actor and subject claims are not written to audit, so reconstructed input does not contain them. For a claim-dependent denial, add the relevant `context.actor_claims` to a regression case built from the printed `policyInput` and iterate with that case instead. ## Test ```bash cd examples/policyIterate npm test ``` The tests inject the Admin API transport and do not call a live Caracal stack. ## Next step Continue to [Launch Research Agent](/examples/research-agent/) to see runtime credential injection for a plain CLI agent. --- # Launch Research Agent # URL: https://docs.caracal.run/examples/research-agent/ # Markdown: https://docs.caracal.run/markdown/examples/research-agent.md # Type: workflow # Concepts: # Requires: --- Research Agent is a `caracal run` example under `examples/ResearchAgent`. It launches a normal Node.js CLI agent and injects provider-native credentials only into the child process after Caracal authorizes the run. ## What it demonstrates | Resource | Injected env | Used for | | --- | --- | --- | | `resource://google-drive` | `GOOGLE_DRIVE_ACCESS_TOKEN` | Searching and exporting Google Drive documents. | | `resource://google-calendar` | `GOOGLE_CALENDAR_ACCESS_TOKEN` | Reading relevant Calendar events. | | `resource://openai` | `OPENAI_API_KEY` | Answering the terminal question with model context. | The agent has no Caracal SDK dependency. It reads provider-native environment variables and behaves like an existing third-party terminal tool. ## Console setup Use Console to create or select: | Object | Purpose | | --- | --- | | Zone | Owns the application, providers, resources, and policies. | | Application | Workload identity used by `caracal run` to call STS. | | Google Drive provider | Returns a Drive read token. | | Google Calendar provider | Returns a Calendar read token. | | OpenAI provider | Returns an OpenAI-compatible credential. | | Resources | Map `resource://google-drive`, `resource://google-calendar`, and `resource://openai` to providers. | | Policy | Allows the run application to request all three resources. | Enable runtime injection on each provider and map each run credential with `credential_type = "provider_token"`. ## Write the runtime profile Use Console-generated values for the zone id and application id. Store the one-time application secret in the local runtime secret path and write a `caracal.toml` profile: ```toml zone_id = "zone_prod" application_id = "app_support_research_agent" ttl_seconds = 900 continue_on_failure = false [[credentials]] env = "GOOGLE_DRIVE_ACCESS_TOKEN" resource = "resource://google-drive" credential_type = "provider_token" [[credentials]] env = "GOOGLE_CALENDAR_ACCESS_TOKEN" resource = "resource://google-calendar" credential_type = "provider_token" [[credentials]] env = "OPENAI_API_KEY" resource = "resource://openai" credential_type = "provider_token" ``` Do not export `GOOGLE_DRIVE_ACCESS_TOKEN`, `GOOGLE_CALENDAR_ACCESS_TOKEN`, or `OPENAI_API_KEY` yourself. Caracal injects them into the child process after STS authorization. ## Launch the agent ```bash cd examples/ResearchAgent export CARACAL_CONFIG="$HOME/.config/caracal/research-agent/caracal.toml" caracal run -- node agent.mjs ``` The agent confirms the injected credentials (values masked) and opens an interactive prompt: ```text [agent] credential preflight (values masked, injected by launcher): [agent] GOOGLE_DRIVE_ACCESS_TOKEN present -> Google Drive (read-only scope) [agent] GOOGLE_CALENDAR_ACCESS_TOKEN present -> Google Calendar (read-only scope) [agent] OPENAI_API_KEY present -> OpenAI Caracal run research agent ready. Ask about Drive docs or Calendar events. Type "exit" to quit. > ``` Started directly with `node agent.mjs`, the preflight fails with exit code 2 before any network call. Credentials disappear with the child-process environment when the process exits. ## Test ```bash cd examples/ResearchAgent pnpm test ``` The tests do not contact Google, OpenAI, or Caracal. ## Next step Continue to [Run Lynx Capital](/examples/lynx-capital/) when you want a full app reference lab with live Console inspection. --- # Run Lynx Capital # URL: https://docs.caracal.run/examples/lynx-capital/ # Markdown: https://docs.caracal.run/markdown/examples/lynx-capital.md # Type: workflow # Concepts: # Requires: --- Lynx Capital is a runnable reference under `examples/lynxCapital`. It models a finance-operations platform: an LLM swarm of orchestrators, regional workflows, and thousands of ephemeral domain workers executes payout cycles across twenty partner providers, with every agent and every provider call governed by Caracal. It is the primary reference for modelling permission boundaries, spawned agents, providers, resources, and policies on Caracal. ## Architecture | Building block | Role | | --- | --- | | Applications | `lynx-operations`, `lynx-intake`, `lynx-ledger`, `lynx-compliance`, `lynx-treasury`, `lynx-payments`, `lynx-audit` — one **managed application** per permission boundary, each holding only its own partner authority. | | Agents | Every spawned agent — orchestrator or worker — is its own **agent session** under its role's application, labeled `[role, lynx-swarm]` with run and agent metadata, narrowed by a delegation edge to its role's scopes and views. | | Providers | Twenty partner **credential providers** (`provider://`), each registered in the exact config shape its kind supports: API key, bearer token, OAuth client credentials, OAuth authorization code, Caracal mandate, or none. | | Resources | Per-application **resource views** (`resource://-`). The Gateway binds each view to exactly one application, so shared partners expose one view per boundary, each carrying only that boundary's scopes. | | Policy set | `lynx-finance-ops`: default-deny base, generated bindings and grants data, and one shared decisions policy allowing exactly each application's role-granted mandate mints and gateway calls. | The model is declared once in `config/tenancy.yaml`; the SDK seam, agent runner, provisioning, and policy all read from it. ### Why one application per permission boundary A Caracal application is a credential and trust boundary, and the Gateway binds each resource to exactly one application. Splitting the swarm by permission boundary means a payments worker and an audit worker can both reach the same partner — through different views, with different scopes — while a compromised intake agent can never present payment authority. Agents are sessions, not applications: each spawn gets its own identity, labels, delegation edge, and audit trail without minting new application credentials. ### Spawned agents The swarm's runner gives every agent its own session via the SDK's `spawn`: orchestrators inherit under the operations boundary; each domain worker spawns under its application's per-run dispatcher root with `Grant.narrow(role scopes, views, max_hops=1, run TTL)`. Ad-hoc partner-integration workers resolve their boundary, scope, and view dynamically from the requested provider operation. Logs and policy decisions identify exactly which agent did what. ### Customer attribution and confinement Workers acting on one customer's records — invoicing, dunning, payment application — spawn with a `customer:` label and a `customer_id` metadata key. The metadata key makes per-customer audit a direct filter over the shared zone trail; the label is policy input, and the base policy confines customer-labeled agents to the customer-record scopes, so a worker dunning one customer can never mint treasury or payment-rail authority. This is the [Serve Your Own Customers](/guides/serve-customers/) pattern applied to app-only work: one zone, customer separation carried by sessions, labels, metadata, and policy. ## Setup flow ```mermaid flowchart LR Install[Install Python deps] --> Zone[Console: zone + Control key] Zone --> Provision[scripts/provision.py] Provision --> Objects[Applications + providers + views + policy set] Objects --> Env[Export per-application credentials] Env --> Reference[scripts/reference.py] Reference --> Inspect[Inspect sessions and delegation in Console] ``` ## Commands ```bash cd examples/lynxCapital python -m venv .venv source .venv/bin/activate pip install -e ".[dev]" cp -n .env.example .env ``` The workload `.env` carries the zone and one `LYNX_CARACAL__APPLICATION_ID` / `_CLIENT_SECRET` pair per boundary. Provisioning uses a separate operator file and a scoped Control key created once in Console. ```bash cp -n .env.provision.example .env.provision # set CONTROL_CLIENT_ID / _SECRET . .env.provision python scripts/provision.py # applications, providers, views, policy set (idempotent) python scripts/reference.py # SDK walkthrough: labeled spawns, narrowed grants, mandates python scripts/teardown.py # remove the provisioned objects ``` `provision.py` prints the per-application credential exports as it creates each application; each client secret is returned exactly once. It also renders the application-id bindings into the policy library before authoring it, so policy decisions key on the real control-plane UUIDs. ## Policies `policies/` is an importable, OPA-tested library. The base policy default-denies and owns the decision contract; generated data documents carry the application bindings and the resource-view grants; one shared decisions policy allows mandate mints (scope ∩ delegation edge, role label granted, view owned by the caller) and gateway uses (mandate target includes the view), naming the deciding application boundary in every decision. Expected access behavior is documented in `policies/README.md`. ```bash opa test policies/ -v ``` ## SDK integration Application code uses two seams: `app/caracal.py` (per-application runtimes, worker authority, mandate minting, gateway calls) and `app/agents/runner.py` (per-agent session lifecycle): ```python handle = await runner.aspawn("payment-execution", "payments.us", parent=fc, layer="worker") result = partners.call("meridian-pay", "create_payout", payload, authority=handle.authority) ``` Every partner call resolves the operation's scope from the model, verifies the calling agent's grant client-side, mints (or reuses) a resource mandate for the agent's view, and posts through the Gateway — which re-evaluates policy, natively enforces the resource's declared operation authority, injects the provider credential, and forwards upstream. Agents never hold partner secrets. ## Tests ```bash opa test policies/ -v pytest tests/ ``` The tests cover the policy decision suite, the identity-model and provisioning-plan builders, the runner and authority seams, and the provider transports, topology, and lifecycle of the bundled workload. ## Bundled demo workload The repository ships the FastAPI and LangGraph swarm with a simulated payout cycle against local provider fixtures under `_mock/`. ```bash docker compose -f _mock/docker-compose.yml up -d --build --wait python -m uvicorn app.main:app --reload --port 8000 docker compose -f _mock/docker-compose.yml down ``` Open `http://localhost:8000`; the guided `/setup` wizard teaches the one-zone, per-boundary-application, provider, resource-view, and policy-library flow. ## Related examples - [Run Echo Upstream](/examples/echo-upstream/) - [Launch Research Agent](/examples/research-agent/) --- # Use API Reference # URL: https://docs.caracal.run/api/ # Markdown: https://docs.caracal.run/markdown/api.md # Type: landing # Concepts: # Requires: --- Use this section when you need wire-level behavior. For task workflows, start in [Guides](/guides/). For service ownership and operations, use [Understand Services](/services/) and [Operations](/operations/). ## API Surfaces | Surface | Base | Purpose | | --- | --- | --- | | [Use Management API](/api/control-plane/) | API service `/v1` | Zones, applications, providers, resources, policies, policy sets, grants, sessions, audit, step-up, and templates. | | [Use Coordinator API](/api/coordinator/) | Coordinator service | Agent sessions, agent services, invocations, delegation edges, and SDK lifecycle endpoints. | | [Use STS Endpoint](/api/sts/) | STS service | OAuth token exchange, JWKS, step-up status, and internal policy operations. | | [Proxy Through Gateway](/api/gateway/) | Gateway service | Protected reverse-proxy behavior rather than CRUD endpoints. | | [Use Event Topics](/api/event-topics/) | Redis Streams | Audit, invalidation, revocation, agent, invocation, and delegation topics. | ## Service Ports | Service | Local port | | --- | --- | | API | `3000` | | Coordinator | `4000` | | STS | `8080` | | Gateway | `8081` | | Audit | `9090` | | Control | API `3000` when enabled | ## Error Shape Caracal service errors use the shared OAuth-compatible shape: ```json { "error": "access_denied", "error_description": "policy denied request", "requestId": "018f..." } ``` See [Error Codes](/reference/errors/) for canonical codes. ## Reading Path | Need | Page | | --- | --- | | Manage product objects over HTTP | [Use Management API](/api/control-plane/) | | Manage agent and delegation runtime state | [Use Coordinator API](/api/coordinator/) | | Exchange authority for mandates | [Use STS Endpoint](/api/sts/) | | Understand Gateway proxy requirements | [Proxy Through Gateway](/api/gateway/) | | Consume or verify stream contracts | [Use Event Topics](/api/event-topics/) | ## Next Step Start with [Use Management API](/api/control-plane/) when automating product setup. --- # Use Management API # URL: https://docs.caracal.run/api/control-plane/ # Markdown: https://docs.caracal.run/markdown/api/control-plane.md # Type: api # Concepts: # Requires: --- The Control-Plane REST API is served by the API service on port `3000`. Management routes are registered under `/v1` and are protected by admin authentication. ## Health and Diagnostics | Method | Path | Purpose | | --- | --- | --- | | `GET` | `/health` | Liveness check. | | `GET` | `/ready` | Dependency and service readiness. | | `GET` | `/metrics` | Service metrics. | | `GET` | `/docs` | Optional API docs when enabled. | ## Core Resources | Resource | Collection | Item | Extra | | --- | --- | --- | --- | | Zones | `GET`, `POST /v1/zones` | `GET`, `PATCH`, `DELETE /v1/zones/:id` | — | | Applications | `GET`, `POST /v1/zones/:zoneId/applications` | `GET`, `PATCH`, `DELETE /v1/zones/:zoneId/applications/:id` | `POST /v1/zones/:zoneId/applications/dcr` | | Providers | `GET`, `POST /v1/zones/:zoneId/providers` | `GET`, `PATCH`, `DELETE /v1/zones/:zoneId/providers/:id` | — | | Resources | `GET`, `POST /v1/zones/:zoneId/resources` | `GET`, `PATCH`, `DELETE /v1/zones/:zoneId/resources/:id` | — | ## Policy and Access | Area | Read | Write | | --- | --- | --- | | Policy validation | — | `POST /v1/policies/validate` | | Policies | `GET /v1/zones/:zoneId/policies`, `GET /v1/zones/:zoneId/policies/:id` | `POST /v1/zones/:zoneId/policies`, `POST /v1/zones/:zoneId/policies/:id/versions`, `DELETE /v1/zones/:zoneId/policies/:id` | | Policy sets | `GET /v1/zones/:zoneId/policy-sets`, `GET /v1/zones/:zoneId/policy-sets/:id`, `GET /v1/zones/:zoneId/policy-sets/:id/versions/:versionId`, `GET /v1/zones/:zoneId/policy-sets/:id/activation-status` | `POST /v1/zones/:zoneId/policy-sets`, `POST /v1/zones/:zoneId/policy-sets/:id/versions`, `POST /v1/zones/:zoneId/policy-sets/:id/activate`, `POST /v1/zones/:zoneId/policy-sets/:id/simulate`, `DELETE /v1/zones/:zoneId/policy-sets/:id` | | Policy templates | `GET /v1/policy-templates` | — | | Grants | `GET /v1/zones/:zoneId/grants`, `GET /v1/zones/:zoneId/grants/:id` | `POST /v1/zones/:zoneId/grants`, `DELETE /v1/zones/:zoneId/grants/:id` | Use Console when you are performing these operations interactively. Use the Admin SDK or Control API when automation needs the same management behavior. ## Audit, Sessions, and Step-Up | Resource | Methods and paths | | --- | --- | | Audit | `GET /v1/zones/:zoneId/audit`, `GET /v1/zones/:zoneId/audit/by-request/:requestId`, `GET /v1/zones/:zoneId/audit/by-request/:requestId/explain` | | Sessions | `GET /v1/zones/:zoneId/sessions` | | Agent sessions | `GET /v1/zones/:zoneId/agent-sessions` (filter by `status`, `lifecycle`, `label`, `parent_id`, `application_id`; `format=csv` to export) | | Step-up challenges | `GET /v1/zones/:zoneId/step-up-challenges`, `GET /v1/zones/:zoneId/step-up-challenges/:id`, `POST /v1/zones/:zoneId/step-up-challenges/:id/satisfy` | | Zone events | Zone-scoped event routes under `/v1/zones/:zoneId/...` | ## Usage Notes - Use Console for human workflows and Admin SDK or Control API for automation. - For scoped, non-interactive provisioning, drive the Control API with a control key. See [Automate Management](/services/control/#non-interactive-automation) and the `controlBootstrap` example. - Prefer declarative reconciliation over scripting individual writes: [Declarative Management](/services/control/#declarative-management) converges a zone from a desired-state document with idempotent apply, dry-run plan, and CI-friendly verify. - Policy activation and simulation are API operations, but top-level `caracal` runtime commands do not expose them. - Writes that produce downstream state changes enqueue signed Redis stream events through the API outbox. ## Next Step Continue to [Use Coordinator API](/api/coordinator/) when automation needs agent session, invocation, or delegation endpoints. ## Related Pages - [Admin Package](/sdks/admin/) - [Manage Product Objects](/runtime-console/admin/) - [Manage Product State](/services/api/) --- # Use Coordinator API # URL: https://docs.caracal.run/api/coordinator/ # Markdown: https://docs.caracal.run/markdown/api/coordinator.md # Type: api # Concepts: # Requires: --- Coordinator is served on port `4000`. It owns agent runtime state and delegation graph state. ## Health and Metrics | Method | Path | Purpose | | --- | --- | --- | | `GET` | `/health` | Liveness check. | | `GET` | `/ready` | Dependency and job readiness. | | `GET` | `/metrics` | Service metrics. | ## Agent Management | Method | Path | Purpose | | --- | --- | --- | | `POST` | `/zones/:zoneId/agents` | Create an agent session. | | `GET` | `/zones/:zoneId/agents` | List agent sessions. | | `GET` | `/zones/:zoneId/agents/:id` | Inspect one session. | | `GET` | `/zones/:zoneId/agents/:id/children` | List child sessions. | | `PATCH` | `/zones/:zoneId/agents/:id/suspend` | Suspend a session subtree. | | `PATCH` | `/zones/:zoneId/agents/:id/resume` | Resume a suspended session subtree. | | `DELETE` | `/zones/:zoneId/agents/:id` | Terminate a session. | ## Service Agents and Invocations | Method | Path | Purpose | | --- | --- | --- | | `POST` | `/zones/:zoneId/agent-services` | Register or update a service agent. | | `GET` | `/zones/:zoneId/agent-services` | List service agents. | | `POST` | `/zones/:zoneId/agents/:id/heartbeat` | Refresh a service-agent lease. | | `POST` | `/zones/:zoneId/invocations` | Create an invocation. | | `GET` | `/zones/:zoneId/invocations/:id` | Inspect an invocation. | | `PATCH` | `/zones/:zoneId/invocations/:id/start` | Mark an invocation running. | | `PATCH` | `/zones/:zoneId/invocations/:id/cancel` | Cancel an invocation. | | `PATCH` | `/zones/:zoneId/invocations/:id/complete` | Complete an invocation. | ## Delegation | Method | Path | Purpose | | --- | --- | --- | | `POST` | `/zones/:zoneId/delegations` | Create a delegation edge. | | `GET` | `/zones/:zoneId/delegations/active` | List active delegation edges. | | `GET` | `/zones/:zoneId/delegations/inbound/:sessionId` | List inbound edges for a session. | | `GET` | `/zones/:zoneId/delegations/outbound/:sessionId` | List outbound edges for a session. | | `GET` | `/zones/:zoneId/delegations/:id/traverse` | Traverse a delegation edge. | | `GET` | `/zones/:zoneId/delegations/:id/impact` | Compute revocation impact. | | `GET` | `/zones/:zoneId/agents/:sessionId/effective-authority` | Compute effective authority for a session. | | `PATCH` | `/zones/:zoneId/delegations/:id/revoke` | Revoke delegated authority. | ## SDK Lifecycle Endpoints | Method | Path | | --- | --- | | `POST` | `/v1/begin` | | `POST` | `/v1/end` | | `POST` | `/v1/spawn-child` | | `POST` | `/v1/exchange` | | `POST` | `/v1/delegate-to-existing-agent` | | `POST` | `/v1/revoke-delegation` | | `POST` | `/v1/verify` | SDKs usually call these endpoints through the language-specific Caracal client rather than direct HTTP. ## Next Step Continue to [Use STS Endpoint](/api/sts/) to see how agent sessions and delegation anchors participate in token exchange. ## Related Pages - [Coordinate Agent State](/services/coordinator/) - [Coordinate Agents](/architecture/delegation-flow/) - [Manage Agents and Delegation](/runtime-console/agents/) --- # Use STS Endpoint # URL: https://docs.caracal.run/api/sts/ # Markdown: https://docs.caracal.run/markdown/api/sts.md # Type: api # Concepts: # Requires: --- STS is served on port `8080` and issues scoped Caracal mandate JWTs. ## Public Endpoints | Method | Path | Purpose | | --- | --- | --- | | `POST` | `/oauth/2/token` | OAuth token exchange for resource, session, Gateway, or delegated mandates. | | `GET` | `/.well-known/jwks.json?zone_id={zone}` | Public signing keys for mandate verification, scoped per zone. | | `GET` | `/step-up/{id}` | Step-up challenge status. | | `GET` | `/health` | Liveness check. | | `GET` | `/ready` | Readiness check. | | `GET` | `/metrics` | Prometheus metrics. | | `GET` | `/metrics.json` | JSON metrics. | ## Token Exchange Request `POST /oauth/2/token` accepts form-encoded parameters. | Parameter | Purpose | | --- | --- | | `grant_type` | OAuth grant type. | | `subject_token`, `subject_token_type` | Existing authority to exchange. | | `actor_token` | Optional actor authority. | | `resource` | One or more target resource identifiers. | | `scope` | Requested scopes. | | `zone_id` | Zone boundary. | | `application_id` | Calling application. | | `client_secret`, `client_assertion`, `client_assertion_type` | Application authentication. | | `session_id`, `agent_session_id`, `delegation_edge_id` | Session and delegation anchors. | | `ttl_seconds` | Requested TTL. | | `challenge_id`, `challenge_response` | Step-up completion. | ## Responses Successful exchanges return an OAuth-style token response with `access_token`, `token_type`, `expires_in`, and related fields. Step-up returns `interaction_required` with challenge data. Errors use the shared `error` and `error_description` shape. ## Internal Endpoints | Method | Path | Purpose | | --- | --- | --- | | `POST` | `/internal/policy/simulate` | Simulate policy input. | | `GET` | `/internal/policy/status/{zoneID}` | Inspect policy load status. | | `POST` | `/internal/zones/{zoneID}/signing-key/rotate` | Rotate zone signing key. | Internal endpoints are for service/admin integration, not normal application traffic. ## Next Step Use [Proxy Through Gateway](/api/gateway/) to understand how Gateway validates inbound authority and exchanges with STS per request. ## Related Pages - [Exchange Tokens](/architecture/token-exchange-flow/) - [OAuth Package](/sdks/oauth/) - [Mandates](/concepts/mandate/) --- # Proxy Through Gateway # URL: https://docs.caracal.run/api/gateway/ # Markdown: https://docs.caracal.run/markdown/api/gateway.md # Type: api # Concepts: # Requires: --- Gateway is served on port `8081`. It is not a CRUD API; it is a protected reverse proxy for configured resources. ## Operator Endpoints | Method | Path | Purpose | | --- | --- | --- | | `GET` | `/health` | Liveness check. | | `GET` | `/ready` | Readiness check. | | `GET` | `/metrics` | Prometheus metrics. | | `GET` | `/metrics.json` | JSON metrics. | | `POST` | `/internal/revocations/reload` | Reload revocation snapshot. | ## Request Requirements | Input | Required | Purpose | | --- | --- | --- | | `Authorization: Bearer ...` | yes | Inbound Caracal mandate. | | `X-Caracal-Resource` | yes | Resource identifier used to find a Gateway binding. | | Request path | yes | Forwarded to the configured upstream after traversal checks. | Gateway ignores client attempts to set `X-Caracal-Client-ID`; the client ID is bound by Gateway configuration. ## Denial Checks Gateway rejects before upstream dispatch. Treat these checks as the safety gate in front of every protected upstream. | Stage | Checks | | --- | --- | | Request preflight | Bearer token exists, token size is bounded, `X-Caracal-Resource` exists, and request path is not traversal. | | Token validation | Token is well formed, not expiring inside the preflight window, signature-valid, not replayed, not revoked, and includes a zone. | | Binding resolution | `(zone_id, resource)` has a Gateway binding and the bound upstream passes host safety checks. | | STS exchange | Gateway’s STS circuit is closed and the signed exchange succeeds. | ## Forwarding Behavior After checks pass, Gateway performs a signed STS exchange, receives a resource mandate, forwards the request to the bound upstream, and emits audit evidence. Request and upstream timeouts are controlled by Gateway service config. ## Next Step Continue to [Use Event Topics](/api/event-topics/) to understand the Redis Stream topics that carry audit, invalidation, revocation, and agent lifecycle events. ## Related Pages - [Protect Upstreams](/services/gateway/) - [Enforce Boundaries](/architecture/trust-boundaries/) - [Protect an MCP Server](/guides/protect-mcp/) --- # Use Event Topics # URL: https://docs.caracal.run/api/event-topics/ # Markdown: https://docs.caracal.run/markdown/api/event-topics.md # Type: api # Concepts: # Requires: --- Caracal uses Redis Streams for propagation. Published modes sign stream messages with `STREAMS_HMAC_KEY`. ## Topics | Topic | Producers | Consumers | | --- | --- | --- | | `caracal.audit.events` | API, STS, Gateway, Coordinator, Control | Audit `audit-ingestor`, SIEM exporters | | `caracal.audit.events.dlq` | Audit | DLQ observers | | `caracal.policy.invalidate` | API | STS policy loader | | `caracal.sessions.revoke` | API, Coordinator | STS, Gateway, resource-server revocation consumers | | `caracal.keys.invalidate` | API, STS | STS key caches | | `caracal.agents.lifecycle` | Coordinator | Coordinator lifecycle relay job | | `caracal.invocations.lifecycle` | Coordinator | Invocation observers | | `caracal.delegations.invalidate` | Coordinator | Delegation observers | | `caracal.providers.ratelimit` | Provisioner/provider coordination | Provider rate-limit coordination | ## Consumer Groups | Topic | Groups | | --- | --- | | `caracal.audit.events` | `audit-ingestor`, `siem-export` | | `caracal.audit.events.dlq` | `audit-dlq-observer` | | `caracal.policy.invalidate` | `opa-engine` | | `caracal.sessions.revoke` | `sts-revocation` | | `caracal.keys.invalidate` | `sts-keys` | | `caracal.agents.lifecycle` | `coordinator-relay` | | `caracal.invocations.lifecycle` | `invocations-observer` | | `caracal.delegations.invalidate` | `delegations-observer` | ## Message Integrity Signed stream messages include the `_sig` field. Consumers in published modes must reject unsigned or mismatched messages for streams that require origin verification. ## Related Pages - [Propagate Events](/architecture/event-streams/) - [Operate Redis Streams](/operations/redis/) - [Wire Contracts](/reference/interoperability-contracts/) --- # Understand Services # URL: https://docs.caracal.run/services/ # Markdown: https://docs.caracal.run/markdown/services.md # Type: landing # Concepts: # Requires: --- Caracal services are small, explicit runtime components. Each service owns a bounded part of the authority lifecycle and exposes health/readiness endpoints for operations. ## Service Map | Service | Port | Owns | | --- | --- | --- | | [Manage Product State](/services/api/) | `3000` | Product state, management routes, policy/grant resources, admin audit, API outbox. | | [Coordinate Agent State](/services/coordinator/) | `4000` | Agent sessions, service leases, delegation edges, invocations, Coordinator outbox. | | [Issue Mandates](/services/sts/) | `8080` | Token exchange, mandate issuance, JWKS, policy evaluation, step-up status. | | [Protect Upstreams](/services/gateway/) | `8081` | Protected reverse proxy, per-request exchange, revocation checks, upstream safety. | | [Ingest Audit Evidence](/services/audit/) | `9090` | Audit ingestion, DLQ, tamper checks, retention, search. | | [Automate Management](/services/control/) | API `3000` | Optional in-process remote management invocation through shared engine dispatch. | ## Dependency Map ```mermaid flowchart LR API --> Postgres[(Postgres)] API --> Redis[(Redis)] Coordinator --> Postgres Coordinator --> Redis STS --> Postgres STS --> Redis Gateway --> Postgres Gateway --> Redis Gateway --> STS Audit --> Postgres Audit --> Redis Control --> API Control --> Redis ``` ## Service Reading Path | Path | Pages | | --- | --- | | Management plane | [Manage Product State](/services/api/) → [Coordinate Agent State](/services/coordinator/) | | Authority path | [Issue Mandates](/services/sts/) → [Protect Upstreams](/services/gateway/) | | Evidence and automation | [Ingest Audit Evidence](/services/audit/) → [Automate Management](/services/control/) | ## Next Step Start with [Manage Product State](/services/api/) to understand where Caracal product objects are owned. ## Related Sections - [Understand Architecture](/architecture/) - [Operations](/operations/) - [Use API Reference](/api/) --- # Manage Product State # URL: https://docs.caracal.run/services/api/ # Markdown: https://docs.caracal.run/markdown/services/api.md # Type: reference # Concepts: # Requires: --- The API service owns product-management state and route handlers for zones, applications, providers, resources, policies, policy sets, grants, step-up challenges, policy templates, and zone events. ## Runtime | Property | Value | | --- | --- | | Port | `3000` | | Health | `GET /health` | | Readiness | `GET /ready` | | Metrics | `GET /metrics` | | Main route prefix | `/v1` | | Optional docs route | `/docs` when API docs are enabled | ## Dependencies | Dependency | Purpose | | --- | --- | | Postgres | Product state, admin tokens, policy versions, grants, sessions, outbox, audit admin events. | | Redis | Event streams and outbox dispatch. | | STS | Service URL used by management flows that need STS coordination. | | Runtime secrets | Admin token, zone KEK, stream HMAC, audit HMAC, Gateway-STS HMAC, database and Redis URLs. | ## Event Output API writes `event_outbox` rows in the same transaction as product-state changes. The dispatcher signs payloads with `STREAMS_HMAC_KEY` and publishes Redis stream events such as policy invalidation and audit records. ## Operational Signals | Signal | Meaning | | --- | --- | | API readiness | Database, Redis, outbox dead rows, and service config are healthy enough for management operations. | | `API_READY_OUTBOX_DEAD_MAX` | Readiness threshold for dead outbox rows. | | Admin auth failures | Token, scope, or rate-limit issue for Console/Admin clients. | | Outbox age/dead metrics | Redis or dispatcher is not draining management events. | ## Next Step Use [Coordinate Agent State](/services/coordinator/) to understand agent sessions, invocations, and delegation runtime state. ## Related Pages - [Use Management API](/api/control-plane/) - [Manage Product Objects](/runtime-console/admin/) - [Operate PostgreSQL](/operations/postgres/) --- # Coordinate Agent State # URL: https://docs.caracal.run/services/coordinator/ # Markdown: https://docs.caracal.run/markdown/services/coordinator.md # Type: reference # Concepts: # Requires: --- Coordinator owns agent and delegation runtime state. SDKs and Console use it to create, inspect, suspend, resume, terminate, traverse, and revoke agent/delegation state. ## Runtime | Property | Value | | --- | --- | | Port | `4000` | | Health | `GET /health` | | Readiness | `GET /ready` | | Metrics | `GET /metrics` | | Auth scope | `AGENT_COORDINATOR_SCOPE`, default `agent:lifecycle` in deployment templates | ## Dependencies | Dependency | Purpose | | --- | --- | | Postgres | Agent sessions, service leases, invocations, delegation edges, graph epochs, Coordinator outbox. | | Redis | Lifecycle, invocation, delegation, and revocation streams. | | STS/JWKS | Token validation and authority alignment. | | Coordinator token | Console/operator access to agent and delegation views. | ## Jobs | Job | Purpose | | --- | --- | | TTL sweeper | Expires stale sessions and delegation state. | | Service lease sweeper | Marks stale service agents unhealthy. | | Deadline enforcer | Handles invocation deadlines. | | Retention cleaner | Removes expired historical coordination records. | | Outbox publisher | Publishes Coordinator outbox messages to Redis. | | Lifecycle relay | Observes `caracal.agents.lifecycle` delivery with HMAC origin verification and dedupe. | ## Event Topics Coordinator publishes `caracal.agents.lifecycle`, `caracal.invocations.lifecycle`, `caracal.delegations.invalidate`, and session revocation events. ## Next Step Use [Issue Mandates](/services/sts/) to understand how STS validates sessions and delegation edges during token exchange. ## Related Pages - [Use Coordinator API](/api/coordinator/) - [Coordinate Agents](/architecture/delegation-flow/) - [Manage Agents and Delegation](/runtime-console/agents/) --- # Issue Mandates # URL: https://docs.caracal.run/services/sts/ # Markdown: https://docs.caracal.run/markdown/services/sts.md # Type: reference # Concepts: # Requires: --- STS is the authority issuance boundary. It authenticates applications, validates sessions and delegated authority, evaluates policies, handles step-up challenges, and returns scoped mandate JWTs. ## Runtime | Property | Value | | --- | --- | | Port | `8080` | | Token exchange | `POST /oauth/2/token` | | JWKS | `GET /.well-known/jwks.json?zone_id={zone}` | | Step-up status | `GET /step-up/{id}` | | Health/readiness | `GET /health`, `GET /ready` | | Metrics | `GET /metrics`, `GET /metrics.json` | | Internal policy simulation | `POST /internal/policy/simulate` | ## Dependencies | Dependency | Purpose | | --- | --- | | Postgres | Applications, grants, resources, policies, sessions, step-up challenges, signing keys. | | Redis | Policy/key invalidation, revocation, provider coordination, audit emission. | | Zone KEK | Decrypt signing and secret material. | | Gateway HMAC key | Verify Gateway-authenticated exchanges. | | Audit replay dir | Persist audit events while Redis/Audit is unavailable. | ## Issuance Rules | Rule | Behavior | | --- | --- | | Resource mandate TTL | Capped at 15 minutes. | | Session mandate TTL | Capped at 60 minutes. | | `MAX_GRANT_TTL_SECONDS` | Defaults to `3600`. | | `OPA_POLL_SECONDS` | Defaults to `60`, capped at `300`. | | Published modes | Require `GATEWAY_STS_HMAC_KEY` and audit/stream keys where configured. | ## Failure Posture Invalid client credentials, missing resources, invalid subject tokens, revoked sessions, unsatisfied step-up, policy denial, invalid delegation, and failed Gateway signatures deny exchange. Gateway-authenticated mandate use is additionally held to the resource's native operation floor: an operation that is not declared on an enforced resource, or whose required scope is absent from the mandate, is denied with `operation_not_permitted` independently of policy. ## Next Step Use [Protect Upstreams](/services/gateway/) to understand how Gateway validates inbound authority and performs per-request STS exchange. ## Related Pages - [Exchange Tokens](/architecture/token-exchange-flow/) - [Use STS Endpoint](/api/sts/) - [Step-Up Challenges](/concepts/step-up/) --- # Protect Upstreams # URL: https://docs.caracal.run/services/gateway/ # Markdown: https://docs.caracal.run/markdown/services/gateway.md # Type: reference # Concepts: # Requires: --- Gateway is the protected-resource ingress. It validates inbound Caracal authority, checks replay and revocation, exchanges with STS, and forwards to configured upstreams only after safety checks pass. ## Runtime | Property | Value | | --- | --- | | Port | `8081` | | Health/readiness | `/health`, `/ready` | | Metrics | `/metrics`, `/metrics.json` | | Revocation reload | `POST /internal/revocations/reload` | ## Request Path ```mermaid sequenceDiagram participant Client participant Gateway participant STS participant Upstream participant Audit Client->>Gateway: bearer token + X-Caracal-Resource Gateway->>Gateway: verify JWT, jti, revocation, binding, path, upstream safety Gateway->>STS: signed per-request exchange STS-->>Gateway: resource mandate Gateway->>Upstream: proxied request Gateway->>Audit: audit event or replay file ``` ## Deny-Before-Upstream Checks - Missing, malformed, oversized, expiring, replayed, revoked, or signature-invalid bearer token. - Missing `X-Caracal-Resource`. - No binding for `(zone_id, resource)`. - Client-supplied `X-Caracal-Client-ID`. - Path traversal. - Operation (method and path) not declared on an enforced resource, or missing the operation's required scope. - STS exchange failure or open STS circuit. - Unsafe upstream destination. ## Configuration Highlights | Variable | Purpose | | --- | --- | | `STS_URL` | STS endpoint for exchange and JWKS. | | `GATEWAY_STS_HMAC_KEY` | Signs Gateway exchange requests. | | `MAX_REQUEST_BYTES` | Request body limit, default 10 MiB. | | `UPSTREAM_HOST_ALLOWLIST` | Optional egress allowlist pinning upstreams to named hosts. | | `JTI_FAIL_OPEN` | Forbidden in published modes. | | `AUDIT_REPLAY_DIR` | Replay directory for audit events. | ## Next Step Use [Ingest Audit Evidence](/services/audit/) to understand how Gateway, STS, API, Coordinator, and Control evidence becomes searchable audit state. ## Related Pages - [Proxy Through Gateway](/api/gateway/) - [Enforce Boundaries](/architecture/trust-boundaries/) - [Harden Production](/operations/tls-hardening/) --- # Ingest Audit Evidence # URL: https://docs.caracal.run/services/audit/ # Markdown: https://docs.caracal.run/markdown/services/audit.md # Type: reference # Concepts: # Requires: --- Audit consumes signed audit events from Redis, verifies integrity, writes append-only audit rows to Postgres, manages DLQ, and exposes operator search and metrics. ## Runtime | Property | Value | | --- | --- | | Port | `9090` | | Health/readiness | `GET /health`, `GET /ready` | | Metrics | `GET /metrics`, `GET /metrics.json` | | Search | `GET /api/audit/search` | | DLQ | `GET /api/audit/dlq`, `GET /api/audit/dlq/{id}`, `POST /api/audit/dlq/replay` | ## Consumer Behavior Audit consumes `caracal.audit.events` with the `audit-ingestor` consumer group. On startup it drains pending entries for its consumer, periodically claims orphaned entries, retries until `AUDIT_MAX_DELIVERIES`, and moves permanent failures to `caracal.audit.events.dlq`. ## Integrity Controls | Control | Meaning | | --- | --- | | `AUDIT_HMAC_KEY` | Verifies producer-signed events in published modes. | | Tamper checks | Detect content hash mismatch, chain breaks, and HMAC failures. | | Append-only database role | Audit role cannot update or delete `audit_events`. | | Retention | `AUDIT_RETENTION_DAYS`, partitions, and optional export watermarks. | ## Readiness Signals | Signal | Meaning | | --- | --- | | DLQ threshold | `AUDIT_READY_DLQ_MAX` controls readiness tolerance. | | Consumer lag | `AUDIT_READY_LAG_MAX` controls accepted stream lag. | | PEL age | `AUDIT_READY_PEL_OLDEST_SECS_MAX` controls pending-entry staleness. | ## Next Step Use [Automate Management](/services/control/) when remote automation needs the same product-management operations available in Console. ## Related Pages - [Audit and Request Traces](/concepts/audit-ledger/) - [Export Audit Evidence](/operations/compliance-audit-integration/) - [Configure Alerts](/operations/alerts/) --- # Automate Management # URL: https://docs.caracal.run/services/control/ # Markdown: https://docs.caracal.run/markdown/services/control.md # Type: reference # Concepts: # Requires: --- Control is an optional automation surface for remote management operations. It runs as an in-process plugin inside the API service and dispatches through the shared engine after gate, authentication, replay, rate-limit, and scope checks. ## Runtime Control has no separate deployable or port. The plugin is mounted inside API when `CARACAL_CONTROL_ENABLED=true` and serves on the API port. A runtime gate file (`CONTROL_GATE_FILE`) is the instant on/off switch: when the file is absent the endpoint returns `503` without a restart. | Property | Value | | --- | --- | | Host | API service (port `3000`) | | Health | `GET /health` | | Readiness | `GET /ready` | | Invoke | `POST /v1/control/invoke` | | Enablement | `CARACAL_CONTROL_ENABLED=true` | | Runtime gate | `CONTROL_GATE_FILE` present | | Helm default | disabled (`control.enabled=false`) | ## Invoke Flow ```mermaid sequenceDiagram participant Client participant API as API (control plugin) participant Redis participant Engine Client->>API: POST /v1/control/invoke API->>API: gate enabled? API->>API: verify JWT and scope API->>Redis: mark JTI replay key API->>API: subject rate limit API->>Engine: dispatch supported management operation ``` ## Required Config | Variable | Purpose | | --- | --- | | `CARACAL_CONTROL_ENABLED` | Mounts the control plugin inside API. | | `STS_JWKS_URL` | JWT verification keys. | | `STS_ISSUER_URL` | Expected issuer. | | `CONTROL_AUDIENCE` | Expected token audience. | | `CONTROL_API_TOKEN` | API token used for downstream dispatch. | | `CONTROL_GATE_FILE` | Runtime gate file that must exist before invoke is available. | ## Non-Interactive Automation Control is the automation counterpart to Console. A CI job, provisioning script, or onboarding tool drives the same management operations a human performs in Console, because both run through the shared engine dispatch. Automation never receives the root admin token; it uses a scoped control key. 1. Create a control key once in Console from the Control menu. Console returns a one-time `client_id` and `client_secret` and records the `control::` scopes the key may request, along with an optional maximum token TTL. The key is bound to the zone where it was created. 2. Exchange the key for a short-lived token at the STS token endpoint. ```bash curl -s -X POST "$STS_URL/oauth/2/token" \ -d grant_type=client_credentials \ -d application_id="$CONTROL_CLIENT_ID" \ -d client_secret="$CONTROL_CLIENT_SECRET" \ -d resource=caracal-control \ -d scope="control:resource:write control:policy:write" ``` 3. Invoke a management command with the returned token. ```bash curl -s -X POST "$CARACAL_API_URL/v1/control/invoke" \ -H "authorization: Bearer $TOKEN" \ -H "content-type: application/json" \ -d '{"command":"resource","subcommand":"create","flags":{"name":"PiperNet","identifier":"resource://pipernet","scopes":["pipernet.read"]}}' ``` STS resolves the bound zone from the authenticated control key; standard bootstrap workflows must not ask users to provide a separate zone id. Supplying `zone_id` on a Control key token exchange remains accepted for compatibility but is deprecated. Each token is short-lived, replay-protected, rate-limited, and audited, so least-privilege automation stays the easy path. The `controlBootstrap` example ships a reusable client plus an apply/verify/teardown pipeline that keeps an agent's environment in sync with a declared plan through this flow. ## Declarative Management `resource create` and the other per-object verbs are imperative: the caller must know whether an object already exists and choose create or patch. For real automation, declare the desired state once and let the server converge to it. The reconciliation logic lives in the engine, so every adopter — any language, any device — gets identical behavior without reimplementing a client-side diff. Three operations cover the full lifecycle, all keyed on each object's stable identity (application `name`, provider and resource `identifier`, policy and policy-set `name`): | Command | Subcommand | Effect | | --- | --- | --- | | `ensure` | `application`, `identity-provider`, `resource`, `policy`, `policy-set` | Create-or-patch a single object from its `--spec`. Safe to repeat. | | `state` | `apply` | Reconcile a whole zone toward a desired-state `--document`: create missing objects, patch drifted ones, publish a new policy version on content change, and optionally `--prune` objects not in the document. | | `state` | `plan` / `verify` | Read-only. Return the same reconciliation report without writing. `verify` is the CI gate: the report's `drift` field is `true` when the live zone does not match the document. | A desired-state document is a flat object set. Pass it as the JSON-encoded `document` flag: ```json { "objects": [ { "kind": "identity-provider", "spec": { "identifier": "provider://pipernet-mandate", "name": "PiperNet Mandate", "kind": "oidc", "config": { "issuer": "https://login.hooli.example" } } }, { "kind": "resource", "spec": { "identifier": "resource://pipernet", "name": "PiperNet", "scopes": ["pipernet.read"] } }, { "kind": "policy", "spec": { "name": "pipernet-baseline", "content": "package pipernet\nallow := true" } } ], "prune": false } ``` ```bash curl -s -X POST "$CARACAL_API_URL/v1/control/invoke" \ -H "authorization: Bearer $TOKEN" \ -H "content-type: application/json" \ -d "{\"command\":\"state\",\"subcommand\":\"apply\",\"flags\":{\"document\":\"$(jq -Rs . < plan.json | sed 's/^.//;s/.$//')\",\"idempotency-key\":\"deploy-2026-06-22\"}}" ``` The result is a reconciliation report: ```json { "ok": true, "result": { "ok": true, "dryRun": false, "prune": false, "zoneId": "", "outcomes": [ { "kind": "resource", "identity": "resource://pipernet", "action": "create", "applied": true, "id": "..." } ], "summary": { "created": 1, "updated": 0, "unchanged": 2, "pruned": 0, "failed": 0 }, "drift": true } } ``` Key properties that make this safe to wire into a pipeline: - **Idempotent.** Re-running `apply` against an in-sync zone changes nothing. - **Convergent and resumable.** Each object is reconciled independently; a failure on one is recorded in its outcome and does not abort the rest, so a re-run heals partial state. No partial-zone rollback is needed. - **Least-privilege.** `state` and `ensure` hold no scope of their own. They authorize each object against the per-noun scope it touches — `apply` of a resource needs `control:resource:write`, `verify` needs only `control:resource:read`, and `--prune` needs `control::delete`. A key scoped to read can run `plan`/`verify` but never write. - **Dry-run on demand.** Add `dry-run` to any `apply`, or call `plan`, to get the full report with `applied: false` and no writes. - **Zone-safe.** A document whose object `spec` names a different `zone_id` than the credential's bound zone fails with `zone_mismatch` instead of provisioning the wrong zone. The `idempotency-key` flag is recorded on the audit event so a retried deployment correlates to its original attempt. Because reconciliation is convergent, the key is for traceability — a retry never double-creates regardless of the key. ## Run From Another Device The Control API is the machine surface; Console is the human surface. When Console and the API service run on a cloud host, an operator's laptop or a CI runner drives management remotely over the network — there is no requirement to run the script on the same machine as the API. A remote caller needs only three things: - The control key's `client_id` and `client_secret` (created once in Console). - The STS token endpoint URL and the API invoke URL, both reachable over TLS. - The scopes the key was granted. The call is two HTTPS requests: exchange the key for a short-lived token at `$STS_URL/oauth/2/token`, then `POST $CARACAL_API_URL/v1/control/invoke`. Nothing else is device-specific. Run both services behind TLS, keep the control key secret in the runner's secret store, and the same script works from any host. See [Harden Production](/operations/tls-hardening/) for terminating TLS in front of STS and API. ## Self-Describing Catalog A generic client should not hardcode the command list, scopes, or document schema. `catalog describe` returns the remotely invocable surface so an agent or tool can discover it at runtime: ```bash curl -s -X POST "$CARACAL_API_URL/v1/control/invoke" \ -H "authorization: Bearer $TOKEN" \ -H "content-type: application/json" \ -d '{"command":"catalog","subcommand":"describe"}' ``` The result lists every command with its subcommands, the `control::` scope each requires, and the flags it accepts, plus the declarative `kinds` with their identity fields and the desired-state document schema. Any control key can call it; it grants no management authority of its own. ## Structured Errors Every failure returns a typed envelope instead of a bare string, so automation can branch on a stable `code` and surface the operator-facing `remediation`: ```json { "ok": false, "error": { "code": "denied", "reason": "missing scope control:resource:write", "remediation": "Grant the control key control:resource:write, or run plan/verify with a read-scoped key." } } ``` | `code` | HTTP | Meaning | | --- | --- | --- | | `denied` | `403` | The key lacks the scope, or policy denied the operation. | | `invalid` | `400` | The request or a flag (for example a malformed `document`) is not well-formed. | | `zone_mismatch` | `409` | A `spec` targets a zone other than the credential's bound zone. | | `conflict` | `409` | The object collides with existing state. | | `not_found` | `404` | A referenced object does not exist. | | `unsupported` | `501` | The command or subcommand is not exposed remotely. | | `upstream` | `502` | A downstream dependency failed. | The successful shape is unchanged: `{ "ok": true, "result": ... }`. Correlate any response with the `x-request-id` response header in Console **request trace**. ## FAQ: Before You Raise an Issue **Should I script `resource create` and `policy create` calls?** Prefer `state apply` or `ensure`. The imperative verbs require the caller to know whether an object exists; the declarative operations converge from any starting state and are safe to repeat. **My second `apply` reports `unchanged` everywhere — is that a bug?** No. That is the idempotent path. `apply` only writes when an object is missing or drifted. Check the `summary` and `drift` fields to confirm. **`apply` failed partway. Did it leave the zone half-configured?** Each object is reconciled independently and failures are recorded per object, not rolled back. Re-run `apply`: it heals the objects that failed and skips the ones already in place. **I got `denied` but the key has scopes.** `state`/`ensure` authorize each object against its own noun. Applying a resource needs `control:resource:write`; pruning needs `control:resource:delete`. Grant the exact `control::` named in the error `reason`, or use `plan`/`verify` with a read-scoped key. **I got `zone_mismatch`.** The document declares a `zone_id` (or `zone`) that differs from the zone your control key is bound to. Remove the field to use the credential's zone, or use a key issued for the target zone. **Can I run this from my laptop or CI against a cloud deployment?** Yes. The Control API is built for remote use. Exchange the key for a token at STS and POST to the invoke URL over TLS from any host. See **Run From Another Device**. **How do I see exactly what would change before applying?** Call `state plan`, or add `dry-run` to `apply`. Both return the full report with `applied: false` and write nothing. **How do I gate CI on configuration drift?** Run `state verify` and fail the job when the report's `drift` field is `true`. **Where do grants fit?** Grants are not part of the declarative object set; express access through `policy` objects in the document. Manage grants with the dedicated grant operations when you need them directly. ## Manage Control Keys Control keys are created and managed in Console from the **Control** menu. A key is a managed, zone-bound application whose only authority is the `control::` scopes you grant it — never the root admin token. When you create a key, Console opens a grouped permission picker: - Each command is a group shown as `control::*` (for example `control:agent:*`). Toggling a group grants every action under that command. - Reveal a group to expand its branch and grant a single action — `read`, `write`, or `delete` — instead of the whole command. - Selections are checkboxes that persist while you move through the tree; the key is written with exactly the scopes you keep when you save. A key can never exceed Console's own zone-bound management surface. The picker offers only the commands Control exposes: local-only operations (the Control lifecycle itself), non-zone operations, and hidden commands are never grantable. A token later exchanged from the key is additionally checked against the scopes recorded on the key, so a token can only narrow, never widen, the granted set. ## Safety Behavior Control returns `503` when the gate is disabled, `401` for authentication or replay failures, and `429` for rate limiting. Dispatch failures return the structured envelope described in [Structured Errors](#structured-errors): `403` denied, `400` invalid, `409` zone mismatch or conflict, `404` not found, `501` unsupported, and `502` upstream. Control is a product-management surface. Do not expose it as a top-level `caracal` runtime command. ## Related Pages - [Choose the Right Surface](/runtime-console/cli-and-console/) - [Enforce Boundaries](/architecture/trust-boundaries/) - [Use Management API](/api/control-plane/) - [Bootstrap Control State](/examples/control-bootstrap/) --- # Use Reference # URL: https://docs.caracal.run/reference/ # Markdown: https://docs.caracal.run/markdown/reference.md # Type: landing # Concepts: # Requires: --- Reference pages are for exact names, defaults, limits, compatibility claims, and schema contracts. Use them after you understand the workflow or when validating automation. ## Find Answers | Need | Page | | --- | --- | | Direct answers to common questions | [FAQ](/reference/faq/) | | Canonical terms | [Glossary](/reference/glossary/) | | Error response codes | [Error Codes](/reference/errors/) | ## Configure and Run | Need | Page | | --- | --- | | Runtime and service configuration keys | [Configuration Keys](/reference/configuration/) | | Configuration source order | [Configuration Order](/reference/config-precedence/) | | Ports, TTLs, limits | [Defaults and Limits](/reference/defaults-and-limits/) | | Runtime CLI exit behavior | [CLI Exit Codes](/reference/runtime-exit-codes/) | ## Versions and Contracts | Need | Page | | --- | --- | | Supported runtimes and deployment targets | [Compatibility](/reference/compatibility/) | | Package, image, and release naming | [Release Map](/reference/release-package-runtime-map/) | | JSON schemas and fixtures | [Wire Contracts](/reference/interoperability-contracts/) | | Community and Enterprise feature boundary | [Compare Editions](/enterprise/) | ## Source-of-Truth Rule When Reference conflicts with API, Operations, or SDK pages, treat code and generated schemas as source of truth and update the docs section coherently. ## Next Step Start with [FAQ](/reference/faq/) for recurring modeling, runtime, troubleshooting, and edition questions. --- # FAQ # URL: https://docs.caracal.run/reference/faq/ # Markdown: https://docs.caracal.run/markdown/reference/faq.md # Type: reference # Concepts: # Requires: --- import FaqRegistryScript from '../../../components/FaqRegistryScript.astro';
26 results
FAQ-001 Platform

What problem does Caracal solve?

Caracal gives agents and automated workflows short-lived, policy-approved authority instead of long-lived credentials. The agent asks for scoped authority at the moment it acts, STS evaluates policy, Gateway or a connector enforces the mandate, and Audit records the decision and result.

See Authority and Enforcement and the Caracal Mental Model.

Related: FAQ-002, FAQ-003, FAQ-016

FAQ-002 Platform

Is Caracal an identity provider, secrets manager, or API gateway?

No. Caracal is an authority broker for agent and workload actions. It can sit in front of HTTP resources like a protected Gateway, and it can broker provider credentials, but it does not replace your IdP, your static config store, or your general API management layer.

Use an IdP for human login, a secret manager for static application configuration, and Caracal when an agent or service needs scoped, auditable authority for a resource.

Related: FAQ-001, FAQ-009, FAQ-013

FAQ-003 Architecture

What should a zone represent?

A zone should represent a trust boundary: the set of resources, sessions, policies, signing keys, and audit records that are allowed to share authority state. Use a separate zone when two workloads need independent signing keys, policy activation, or audit trails. Use one zone with separate resources when the same trust boundary protects multiple upstreams.

For examples, see Model Your Application in Caracal.

Related: FAQ-004, FAQ-009, FAQ-011

FAQ-004 Architecture

Does the open-source edition include managed multi-tenancy?

No. The open-source Community Edition gives you zones as a manual isolation primitive. You can model customers, environments, or trust tiers with zones and automate them through the Admin API, but managed tenant, team, SSO, and hosted lifecycle features are Enterprise Edition features.

To serve many of your own customers from one deployment without per-customer zones, see Serve Your Own Customers.

See Compare Editions for the commercial feature boundary.

Related: FAQ-003, FAQ-019

FAQ-005 Identity

What is the difference between an application, principal, and agent session?

An application is registered software that authenticates to Caracal. A principal is the acting identity, such as a user, service, or agent. An agent session is a time-bounded execution context spawned from a subject session or application context.

Policy decisions consider all of them: which application is asking, which principal is acting, which session anchors the request, and which resource scopes are requested.

Related: FAQ-006, FAQ-007, FAQ-008

FAQ-006 Identity

Should I create one application per agent?

No. This is the most common modeling mistake, and it does not match how Caracal scales. An application and an agent are different layers:

  • An application is the credentialed security boundary. It is operator-provisioned (managed) or dynamically registered (DCR), holds a server-owned secret, and is the identity Caracal authenticates. Creating one is a deliberate, secret-bearing act.
  • An agent session is the scalable runtime unit. The workload that already holds the application credential creates agent sessions at runtime — no secret, no registration, no Console step. One application backs many concurrent agent sessions (up to 200 per application by default).

The default model is one managed application per durable workload, with many agent sessions under it. When that workload fans out — spawning sub-agents, delegating a narrower slice of authority, or building an agent hierarchy for a single request — each of those is a new agent session under the same application, not a new application. Policy and audit still tell them apart by agent session, lifecycle, labels, and delegation chain (see FAQ-008).

One application per agent breaks this. Managed applications are operator-provisioned, so a workload cannot mint one per spawned agent at runtime. DCR can register at runtime, but a DCR application is meant to expire and bind exactly one session, so per-request fan-out would register and tear down an application for every sub-agent — adding registration latency, credential sprawl, and registry churn for no policy or audit benefit. Reach for an application per agent only when a spawned agent genuinely needs its own isolated, expiring credential and a registry-visible identity; then use a DCR application with one session, as described in FAQ-007.

See Identities and Applications and the Caracal Mental Model.

Related: FAQ-005, FAQ-007, FAQ-008

FAQ-007 Identity

When should I use a managed application versus DCR?

Use a managed application for durable software you intentionally operate: a backend service, Gateway application, orchestrator, or agent runtime — including the runtime that spawns and fans out child agents. Ordinary spawn fan-out does not need DCR: every spawned child is an agent session under the runtime's own managed application. Reach for DCR only when an identity needs its own isolated, auto-expiring credential boundary — for example a per-tenant or per-integration identity. DCR applications are registered through the zone DCR endpoint as a control-plane action (the SDK spawn() primitive never registers applications), always expire, and bind exactly one session, so a DCR application is not reused across agents and is not a spawn parent for further sessions.

See Identities and Applications.

Related: FAQ-005, FAQ-006, FAQ-008

FAQ-008 Identity

If many agents share one managed application, can policy and audit still tell them apart?

Yes, with one important distinction between attribution and credential isolation.

The agent's identity in a decision is the agent session, not the application. You do not pre-register agents in the Console; the workload that holds the application credential creates each agent session at runtime and declares its labels then, while the coordinator records the session's lifecycle (task for an ordinary session, service for a heartbeat-leased one). The token exchange carries the acting agent_session_id, lifecycle, and labels, plus the delegation edge and chain, alongside the application_id. Policy reads these as input.principal.agent_session_id, input.principal.lifecycle, and input.principal.labels, and audit records the same fields with the parent and delegation context. The Console agent list shows the lifecycle and labels of every session, so you can see which agent is which under a shared application. So two agents under one managed application present different sessions, labels, and delegation paths, and both policy and audit can tell them apart without referencing the application.

What is actually enforced is the authority, not the label. Spawning is gated by application ownership, the asserted agent_session_id is bound to the owning application at exchange, and requested scopes must stay within the delegation chain, which is itself bounded by the parent's authority. The labels, by contrast, are asserted by the credentialed workload and are not delegation-constrained. They are a sound basis for policy and audit when you control and trust the spawner, but they are not a defense against a compromised workload mislabeling its own sub-agents. For that, the boundary is scopes, delegation, and policy.

A single managed application is therefore the right fit for one durable workload that spawns many agents: per-agent attribution comes from the agent session, lifecycle, and labels, not from one application per agent. To investigate a specific agent, the zone audit endpoint filters directly by agent_session_id (one exact session) or by label (a role across a fleet). Reach for a DCR application per agent only when a spawned agent needs its own isolated, expiring credential and a registry-visible identity. What that buys you is credential isolation, not additional policy or audit distinguishability.

See Identities and Applications and Model Your Application in Caracal.

Related: FAQ-005, FAQ-006, FAQ-007

FAQ-009 Resources

What is the difference between a resource and a provider?

A resource is the protected target and policy audience: the thing a mandate authorizes access to. A provider describes how Gateway authenticates upstream: no credential, Caracal mandate, OAuth, API key, or bearer token.

Keep target identity, scopes, upstream URL, and Gateway binding on the resource. Keep secrets, token endpoints, OAuth settings, API keys, and bearer tokens on the provider.

Related: FAQ-010, FAQ-011, FAQ-013

FAQ-010 Resources

Why must the resource identifier stay stable if the upstream URL can change?

Policies, grants, mandates, and audit records refer to the resource identifier. If you use a mutable deployment hostname as the identifier, changing infrastructure also changes your authority boundary and breaks audit continuity. Use a stable audience URI such as resource://billing, then change the upstream URL when routing changes.

Related: FAQ-003, FAQ-009, FAQ-011

FAQ-011 Resources

How should I design scopes?

Use small action-oriented scopes such as payments:read, tickets:comment, or mcp:tool:call. Do not encode environment, tenant, user, or hostname into scope names when that data belongs in the zone, principal, resource, or policy input.

Scopes answer "what action is allowed?" Resource identifiers answer "what target is protected?"

Related: FAQ-003, FAQ-009, FAQ-012

FAQ-012 Resources

Do I manage grants directly?

In the current Console flow, you usually define resources, scopes, applications, subjects, and policies rather than managing grants as a separate daily object. Grants are an authorization input that policy can read; the active policy set still makes the final allow, deny, or step-up decision.

If access is denied, inspect the active policy, subject, application, resource, and scopes through request trace.

Related: FAQ-011, FAQ-016

FAQ-013 Security

Is an application secret the same as a provider credential?

No. An application secret authenticates the application to Caracal. A provider credential authenticates Gateway or STS to an upstream provider such as Google, Slack, OpenAI, or an internal API. Agents should authenticate to Caracal and receive short-lived mandates; they should not receive long-lived provider credentials.

See Define Resources and Providers.

Related: FAQ-002, FAQ-009, FAQ-014

FAQ-014 Security

When should I use per-user OAuth instead of a shared provider credential?

Use per-user OAuth (oauth2_authorization_code) when the upstream call must act as a specific user and that user must consent. Use a shared service credential (oauth2_client_credentials, api_key, or bearer_token) when the agent acts as the application or service, not as an individual user.

The concrete setup fields are in Provider Recipes.

Related: FAQ-013, FAQ-016

FAQ-015 Runtime

Why are zone and policy commands in the Console instead of the caracal CLI?

The top-level caracal CLI is intentionally limited to local runtime lifecycle and process execution: up, down, status, purge, run, and console. Product-management workflows such as zones, applications, providers, resources, policies, audit, diagnostics, agents, and delegation live in the Console, Admin API, and SDKs so they use one management surface and do not drift into duplicated CLI commands.

See Choose the Right Surface.

Related: FAQ-017, FAQ-018

FAQ-016 Operations

What is the difference between a 403 from STS and a 403 from Gateway?

A 403 from STS means the exchange was authenticated but policy did not allow the requested resource scopes, or a step-up/grant/session condition blocked issuance. A 403 from Gateway or a verifier means the request reached a protected boundary but the mandate, resource binding, scope check, revocation state, or route safety check failed.

Use request trace with the request ID to identify the surface before changing policy or resource configuration.

Related: FAQ-011, FAQ-012, FAQ-017

FAQ-017 Operations

Where is the diagnostic bundle or doctor command?

The diagnostic bundle is exposed through existing surfaces instead of a separate top-level command. Use caracal status --json for runtime status, Console diagnostics for Doctor checks (health, readiness, zones, preflight), audit for recent decisions, and request trace for a known request ID.

See Troubleshoot by Symptom.

Related: FAQ-015, FAQ-016, FAQ-018

FAQ-018 Operations

Why is an audit event missing?

First confirm the request reached a Caracal-protected boundary. If it did, check the selected zone, time window, request ID, Audit service readiness, Redis stream health, replay backlog, and DLQ. If the request failed before STS, Gateway, Coordinator, or a connector emitted evidence, there may be no action-result event for that boundary.

Start with Inspect Diagnostics and Audit and Debug Infrastructure Issues.

Related: FAQ-016, FAQ-017

FAQ-019 Editions

Which features are Enterprise Edition only?

Managed multi-tenancy, hosted management UI, managed Gateway and service operation, SSO, SCIM provisioning, teams, organization-level RBAC, commercial support, and enterprise lifecycle features are Enterprise Edition features. The Community Edition remains self-hosted and uses zones, Admin API automation, Console workflows, SDKs, and connectors.

See Compare Editions or book an enterprise call.

Related: FAQ-004

FAQ-020 Identity

Two agents share the same application and labels — how do I tell which one acted?

Every agent has exactly one canonical identity: the server-minted agent_session_id. It is unique, returned to the caller the moment spawn() or service() creates the session, and stamped onto every token exchange and audit event that session produces. That id — not the application, lifecycle, or labels — is the answer to "which agent did this?". Even two sessions that share the same application, labels, and metadata are still separated by their agent_session_id in audit.

Identical sessions are interchangeable on purpose. A hundred ["pricing-worker"] sessions fanned out under one application are meant to be fungible — that is how fan-out works, and it is why labels are a descriptor rather than a unique name. When you need to tell sessions apart by meaning rather than by raw id, give them distinguishing labels, attach business correlation in metadata, or propagate a trace_id through the work.

To investigate, the zone audit endpoint filters directly on these fields: query agent_session_id to follow one exact session end to end, or label to scope to a role across a whole fleet of sessions. The Console audit view exposes both filters.

To see the sessions themselves — including ones that have ended — the management API exposes GET /v1/zones/{zone}/agent-sessions, filterable by status (active, suspended, terminated, expired), lifecycle, label, parent_id, and application_id. Rows are never deleted on termination — a session keeps its row and its status flips to terminated or suspended — so this is the authoritative place to review past agents. Add format=csv to the same endpoint to export the filtered set for offline analysis.

Related: FAQ-006, FAQ-008, FAQ-021

FAQ-021 Security

A spawned agent inherits its application's policy and also carries a grant — which wins?

Both apply, and they compose as a strict intersection — a logical AND — so the narrower of the two always wins. They are two independent layers with different jobs: a grant (the delegation edge a narrowing spawn(grant=…) creates) caps which scopes the token may carry at all, while policy decides whether the action is allowed. Neither layer can ever add authority; each one can only subtract.

At token exchange a resource is released only if it passes every gate: the requested scopes must be within the resource's own scopes, within the grant edge's scopes (which is itself re-validated to be within the parent's authority), the resource must fall inside the delegation, and policy must return allow. The effective authority is therefore policy ∩ grant ∩ resource ∩ delegation. This holds in both directions: if policy is the narrower of the two, policy wins and the grant cannot widen past it; if the grant is the narrower, the grant wins, because the token cannot request scopes outside the grant and resources outside the delegation are rejected even when policy would have allowed them.

This means there is no "clash" to resolve when a spawned agent runs under its parent's application. A plain spawn() carries no grant edge, so the child runs at the application's full, policy-bounded authority — deliberately identical to the parent, which is what inheriting means. Reach for spawn(grant=Grant.narrow([...])) when the child should hold strictly less; the grant then sub-bounds the authority further while policy continues to apply on top. Because every layer is subtractive, inheriting an application's policy and carrying a grant can only ever narrow access, never expand it.

See Delegation and Policy.

Related: FAQ-006, FAQ-008, FAQ-011

FAQ-022 Identity

If spawned children inherit the parent's application, when is a DCR application actually used?

A DCR application is used by authenticating as it, not by spawning under it. Because spawn() always runs under the caller's own application, a DCR application — bound to exactly one session — cannot be a spawn parent. Its single root session is created by a workload that already holds the DCR credentials.

The credential boundary between durable managed identities and short-lived DCR identities is named by registration_method (managed vs DCR), not by an agent's lifecycle. A session's lifecycle is either task (the default) or service (heartbeat-leased), and a DCR application cannot host a service session, so its one session is always a task session. A short-lived worker is therefore not a separate lifecycle — it is an ordinary task session with a TTL (see FAQ-023).

That root session is created by a workload that holds the DCR application's credentials. The control plane registers the DCR application through the Admin API DCR endpoint, which returns a one-time client secret and a short expiry; the SDK never registers applications. An orchestrator injects those credentials into an independently launched workload — for example a per-tenant container — and that workload, configured with the DCR application_id and secret, authenticates with the client_credentials grant and creates its one root session. The application then expires (one hour or less) and is archived by DCR cleanup.

The credential split is deliberate: minting a new credentialed identity is a privileged control-plane action, so a runtime cannot register applications for itself. The SDK consumes a DCR application by being configured with its credentials; only an operator or orchestrator with Admin API access mints one.

Because a DCR root has no parent and no delegation edge, its authority is decided entirely by policy, not by inheritance — and policy is default-deny, so a DCR identity opens no tools until a policy grants it scopes. Policies receive input.principal.registration_method, so you write one policy class targeting registration_method == "dcr" (optionally narrowed by labels, resource, or zone) that covers every DCR application; you do not author a policy per DCR app. Pair that with per-tenant or per-job resources to keep each DCR identity scoped to its own data.

So DCR has a real, distinct purpose that managed applications do not serve: a per-tenant, per-job, or per-integration identity that is credential-isolated, independently revocable, auto-expiring, and registry-visible. What it does not add is extra policy or audit distinguishability — the agent session already provides that. Reach for DCR when you need an isolated, short-lived credential boundary for an independently launched unit of work; use one managed application with many agent sessions for ordinary spawn fan-out.

See Identities and Applications and FAQ-007.

Related: FAQ-006, FAQ-007, FAQ-008

FAQ-023 Identity

How do I model an orchestrator that spawns managers and short-lived workers, and where does a spawned agent's authority come from?

This is the mainstream multi-agent shape, and it is modelled under one managed application. Every node is an agent session under that app. A short-lived worker is not a separate lifecycle: a spawned agent is task-scoped — it dies when its block exits — and "least-privilege" is expressed by a narrowed grant. An optional ttl_seconds adds a hard upper bound for the case where the agent should also expire on a clock (see FAQ-025):

  • The orchestrator is the durable root session — or a service() handle when it needs a heartbeat lease.
  • Each manager is a plain spawn() that inherits the application's authority.
  • Each task worker is spawn(grant=Grant.narrow([...])) — bounded to a subset of authority and auto-terminated when its block exits — with ttl_seconds=… added only when you also want a wall-clock cap.

On where authority comes from: a spawned agent's authority comes from its application, not from its parent. The SDK authenticates with the client_credentials grant using the configured application_id and secret, so every session under that application already acts under the application's own authority — which is why spawn() needs no parent and a parentless (root) spawn is well-defined. A parent matters only when you narrow authority: grant=Grant.narrow([...]) creates a delegation edge that the server re-validates as a subset of the parent's effective authority, and a cross-application grant goes through delegate(to=peer) with receiver consent. With the default grant=inherit and no edge, the child simply shares the application's authority, bounded by policy.

See Identities and Applications, Delegation, and FAQ-022.

Related: FAQ-006, FAQ-008, FAQ-022

FAQ-024 Security

If A narrows a grant to B, and B spawns C without narrowing, does C stay bounded by B's restriction?

Yes — by default. inherit carries the parent's effective authority forward, so least-privilege is transitive down a same-application spawn tree. Take A spawns B with grant=narrow([tickets:read]), then B spawns C:

  • If B spawns C with inherit (the default), the coordinator mirrors B's delegation edge onto C in the same transaction: a B → C edge is recorded holding B's exact scopes, resource, constraints, and expiry. C is therefore bounded by B's tickets:read slice — it does not regain the application's full authority. A narrowed intermediary contains its descendants without you having to re-narrow at every hop.
  • If B spawns C with a narrowing grant, a B → C edge is recorded and the coordinator rejects it unless C ⊆ B (scopes, TTL, hop count, budget, and resource are all checked). So B can never pass on authority it does not hold.
  • If B is a root session (no inbound edge — it already holds the application's full authority), an inherit child of B creates no edge and runs under the application's authority bounded by policy. There is nothing narrower to carry forward, so this is correct, not an escalation.

Transitive inheritance is enforced for same-application spawns, where the mirrored edge is escalation-proof by construction (the child copies the parent edge and can never broaden it). It is not auto-created across applications: a cross-application grant must go through delegate(to=peer) with receiver consent. The hard, server-enforced boundary that contains every session is still the application plus policy (default-deny); to place a subtree behind a real cross-trust boundary, give it a different application via delegate(to=peer).

See Delegation and FAQ-023.

Related: FAQ-021, FAQ-023, FAQ-008

FAQ-025 Identity

What is the difference between a task and a service lifecycle — and how do I model a task-and-die worker versus a time-limited one?

Every runtime actor is an agent session; lifecycle only describes how it runs. There are two lifecycles, and they map directly onto the two SDK primitives:

  • Task — created with spawn(), recorded as lifecycle = "task". It lives for the duration of its task: when its block exits it is terminated automatically. This single behavior covers both of the cases you are distinguishing. A "do one task and die" worker (for example a search sub-agent) is just spawn() whose block returns when the task is done. A "live up to N seconds then expire" worker is the same spawn() with ttl_seconds=N, which adds a hard wall-clock cap enforced by the TTL sweeper. The difference between "task-and-die" and "time-limited" is whether you set a TTL, not a different lifecycle.
  • Service — created with service(), recorded as lifecycle = "service". It is the genuinely different lifecycle: it outlives any single task and is kept alive by a heartbeat lease, so the runtime can detect and reap it if it stops reporting. A service is governed only by that lease — it is not subject to the wall-clock TTL sweeper that retires tasks, so a faithfully heartbeated service is never terminated on a timer. Each heartbeat() renews the lease (extends the deadline); it does not just check status. The SDKs offer an opt-in background auto-heartbeat (heartbeat_interval / heartbeatIntervalMs) so the lease stays current even while your code is blocked awaiting a long provider stream.

The stored lifecycle column carries exactly these two values, and only service changes runtime behavior (the heartbeat lease). The default is task — named for the lifecycle, not the actor, so it never collides with the "agent" umbrella that every session already belongs to. So how long a worker lives is expressed by task scope and an optional TTL, not by a separate lifecycle. Reach for service() when you need a durable, heartbeat-leased agent; otherwise use spawn() and let task scope (optionally plus a TTL) express how long the worker should live.

See Identities and Applications and FAQ-023.

Related: FAQ-022, FAQ-023, FAQ-007

FAQ-026 Identity

Can two DCR applications have different policies, and can a DCR agent spawn children?

Different policies per DCR app: yes. Policy evaluation receives the full principal, including the specific input.principal.id (the application id), input.principal.labels, and input.principal.registration_method. Matching on registration_method == "dcr" is just the convenient way to write one rule that covers every DCR app; when two DCR apps need different authority, target their distinct application ids or labels, or scope them to different resources. There is no requirement that all DCR apps share a policy.

Can a DCR-application agent spawn children: no, in practice. A DCR application binds exactly one agent session (a second spawn under the same DCR app is rejected with dcr_application_already_bound), so a DCR session cannot create further sessions under its own application — it is a leaf. Spawning a child under a different application would require cross-application spawn authority that a DCR workload is not granted. So a DCR identity is a single, isolated, non-parent session by design.

Which sessions can be parents: any session under a managed application can spawn children via spawn(), with one lifecycle rule — a task parent cannot spawn a service child (task_agent_cannot_spawn_service). A task is bounded by its work or its TTL, so it must not mint a durable, lease-renewable session that could outlive it; a service parent may spawn either lifecycle. The only structural non-parent is the DCR session described above — it can be neither a parent (dcr_application_cannot_spawn) nor a spawned child (dcr_application_cannot_be_child). So the practical answer is that managed sessions spawn (task parents to task children, service parents to either) and DCR sessions are isolated leaves.

See Identities and Applications and FAQ-022.

Related: FAQ-022, FAQ-025, FAQ-007

## Next Step Use [Glossary](/reference/glossary/) when you need canonical terms for concepts, API names, Console labels, and examples. --- # Glossary # URL: https://docs.caracal.run/reference/glossary/ # Markdown: https://docs.caracal.run/markdown/reference/glossary.md # Type: reference # Concepts: # Requires: --- Use these terms consistently across docs, API names, Console labels, and examples. | Term | Meaning | | --- | --- | | Agent | Workload identity that performs actions through Caracal-controlled authority. | | Agent app | Confidential application registered for an agent workload; one agent app backs many agent sessions. | | Agent session | Coordinator record representing one agent execution or child session. | | Application | Registered client in a zone; confidential applications can exchange credentials. | | Audit ledger | Append-only evidence stream and database records for decisions and operations. | | Console | Terminal management surface launched with `caracal console` or `caracal-console`. | | Control API | Optional authenticated automation surface for remote management dispatch. | | Delegation edge | Bounded authority from one agent session to another. | | Gateway | Reverse proxy that verifies inbound authority, exchanges with STS, and forwards to upstreams. | | Grant | Access assignment connecting an application or subject to a resource and scopes. | | Mandate | Short-lived JWT carrying scoped Caracal authority. | | Policy | Rego content that participates in allow/deny decisions. | | Policy set | Activated bundle of policy versions for a zone. | | Principal | User, service, application, or agent identity participating in authority. | | Provider | Credential source or upstream integration for a protected resource. | | Resource | Protected API, tool, MCP server, provider target, or upstream identifier. | | Root session | Originating session at the root of a delegation chain; its ID is propagated as the authority root and checked as a revocation anchor. | | Runtime profile | `caracal.toml` or environment configuration used by `caracal run` and SDKs. | | Service agent | Long-lived agent session started with the SDK `service()` handle; it holds a heartbeat lease and is retired explicitly rather than when a block exits. | | STS | Security Token Service that performs token exchange and mandate issuance. | | Step-up challenge | Additional approval required before STS issues authority. | | Subject session | Original authenticated user or service context that initiates a chain of agent authority. | | Zone | Tenant and trust boundary for product state, policies, grants, sessions, and audit. | ## Naming Rules - Use `Caracal`, not informal product nicknames. - Use `mandate` for Caracal-issued JWT authority, not generic “token” when the distinction matters. - Use `Console` for the terminal UI and `Control API` for automation. - Use top-level `caracal` only for runtime lifecycle and `caracal run`. ## Next Step Use [Error Codes](/reference/errors/) when a service, SDK, Gateway, or verifier returns a machine-readable error. --- # Error Codes # URL: https://docs.caracal.run/reference/errors/ # Markdown: https://docs.caracal.run/markdown/reference/errors.md # Type: reference # Concepts: # Requires: --- Caracal packages and services use a shared machine-readable error shape. ```json { "error": "invalid_token", "error_description": "bearer signature invalid", "requestId": "018f...", "details": {} } ``` `details` appears only when a service or package provides structured context. The same code can be emitted by different surfaces — an `access_denied` from STS is a policy decision, while an `access_denied` surfaced by an in-process verifier is a missing scope on an otherwise valid mandate. Use the `requestId` with Console **request trace** to confirm which surface produced the error before acting. ## Well-Known Codes The **Likely source** column names the surface that most often emits the code, and **First check** is the fastest thing to confirm. When a call fails, start from the surface, not the code. | Code | Meaning | Likely source | First check | | --- | --- | --- | --- | | `access_denied` | Request is authenticated but not authorized. | STS policy decision, or an in-process verifier scope check. | Run `request trace`: confirm the active policy set allows the application, subject, resource, and scopes. | | `invalid_token` | Token is missing, malformed, expired, invalid, replayed, or fails verification. | Gateway or a resource verifier. | Confirm the mandate is unexpired, the verifier trusts the issuing zone JWKS, and the token was not already consumed. | | `resource_not_found` | Requested resource does not exist or is unavailable in the current scope. | API, or Gateway binding resolution. | Confirm the resource identifier and that a Gateway binding exists for `(zone, resource)`. | | `internal_error` | Service failed unexpectedly. | Any service. | Check service readiness and logs for the `requestId`. | | `policy_eval_failed` | Policy evaluation failed. | STS policy engine. | Confirm the active policy set compiles and that required `input` fields are present. | | `provider_rate_limited` | Provider or provider coordination rate limit denied work. | Gateway upstream or provider coordination. | Inspect provider limits and retry/backoff; confirm the provider grant is active. | | `interaction_required` | Step-up challenge is required. | STS step-up gate. | Complete the step-up flow; see [Step-Up Re-Authentication](/guides/step-up/). | | `sts_unavailable` | STS could not satisfy an exchange. | STS, or Gateway's STS circuit. | Run `caracal status --ready`; confirm STS readiness, JWKS, and policy bundle freshness. | | `credential_expired_not_renewable` | Credential is too close to expiry or cannot be renewed. | STS provider-token refresh. | Reconnect the provider grant; confirm OAuth refresh configuration. | | `payload_too_large` | Request body or token exceeded the configured limit. | Gateway or API preflight. | Reduce payload size or adjust the configured limit. | | `zone_invalid` | Zone claim or route zone is invalid. | Gateway or STS. | Confirm the request targets the correct zone and the mandate carries a matching zone claim. | | `scope_insufficient` | Required scope is missing. | A resource verifier, or Gateway. | Confirm the resource defines the scope and the active policy authorizes it for this subject. | | `operation_not_permitted` | The Gateway operation is not declared on an enforced resource, or its required scope is absent from the mandate. | STS native operation floor, on Gateway-authenticated mandate use. | Declare the operation on the resource (`operations`), or set `operation_enforcement` to `transport_uniform`; confirm the mandate carries the operation's scope. | | `agent_identity_required` | A verifier requires an agent identity. | In-process verifier or MCP transport. | Run the call from an agent session, not a bare subject session. | | `delegation_required` | A verifier requires delegated authority. | In-process verifier. | Confirm the call carries a delegation edge granting the required scopes. | | `chain_mismatch` | Delegation chain does not contain a required application/session. | Verifier delegation check. | Inspect the delegation edge in the Console; confirm the chain includes the required hop. | | `hop_count_exceeded` | Delegation hop count exceeds the allowed maximum. | Coordinator or verifier. | Reduce delegation depth or raise the configured hop limit. | | `http_request_failed` | HTTP call failed before a valid response could be used. | SDK or Gateway upstream call. | Confirm endpoint URL, network reachability, and upstream allowlist. | | `config_missing` | Runtime or service configuration is missing. | SDK or service startup. | Confirm the runtime profile, secret file, and required environment variables. | Transport packages may map these codes into framework-specific HTTP responses. ## Unknown Failure Surface If you only have an SDK error and not the surface, start from the symptom-first [Troubleshoot by Symptom](/operations/troubleshooting/). It routes a denied or failing call to the right surface, the object to inspect, and the diagnostic tool to use. :::note[FAQ] [What is the difference between a 403 from STS and a 403 from Gateway?](/reference/faq/#faq-016) ::: ## Next Step Use [Configuration Keys](/reference/configuration/) when an error points to missing runtime, service, or deployment configuration. ## Related Pages - [Troubleshoot by Symptom](/operations/troubleshooting/) - [Debug Infrastructure Issues](/operations/debugging/) - [MCP Auth Transport](/sdks/transport-mcp/) - [Proxy Through Gateway](/api/gateway/) --- # Configuration Keys # URL: https://docs.caracal.run/reference/configuration/ # Markdown: https://docs.caracal.run/markdown/reference/configuration.md # Type: config # Concepts: # Requires: --- Caracal has three configuration domains: | Domain | Used by | Choose it when | | --- | --- | --- | | Runtime workload config | `caracal run` and SDK clients. | A workload needs Caracal-issued resource credentials. | | Service environment config | API, STS, Gateway, Audit, Coordinator, and Control. | A service needs URLs, secrets, limits, or readiness settings. | | Deployment values | Helm, Compose, Postgres, and Redis. | Operators size, schedule, expose, or secure infrastructure. | ```mermaid flowchart TD Need{What are you configuring?} Need -->|workload credentials| Runtime[Runtime profile and CARACAL_CONFIG] Need -->|service behavior| Env[Service environment variables] Need -->|deployment shape| Deploy[Helm values or Compose files] Runtime --> Precedence[Config precedence reference] Env --> Ops[Environment variables reference] Deploy --> Platform[Operations deployment pages] ``` ## Runtime Profile Fields | Field | Meaning | | --- | --- | | `zone_url` | Cloud/custom STS URL override for token exchange. | | `sts_url` | Cloud/custom SDK-readable STS alias/fallback. | | `coordinator_url` | Cloud/custom SDK/Console Coordinator URL override. | | `gateway_url` | Cloud/custom Gateway URL override for SDK transports. | | `zone_id` | Zone identifier. | | `application_id` | Confidential application ID. | | `app_client_secret_file` | Cloud/custom secret-file path override. | | `app_client_secret` | Inline local-development secret. | | `ttl_seconds` | `caracal run` exchange TTL, capped at 900 seconds. | | `continue_on_failure` | Required credential failure behavior. | | `credentials[]` | Required resource credentials. | | `optional_credentials[]` | Optional resource credentials with `on_failure`. | | `mcp_governance.mode` | `block` or `log` for likely MCP subprocesses. | Credential entries use `env`, `resource`, optional `upstream_prefix`, and optional `credential_type`. Use `provider_token` for direct `caracal run` provider-key injection and `caracal_mandate` for mandate-aware workloads. Local dev and stable runtime launches auto-detect the client secret and credential manifest from the OS Caracal config directory. Use explicit secret-file paths and service URLs only for cloud deployments, containers, or custom infrastructure. ## Core Service Environment Keys | Key | Services | | --- | --- | | `CARACAL_MODE` | All services. | | `DATABASE_URL` / `DATABASE_URL_FILE` | API, STS, Gateway, Audit, Coordinator. | | `REDIS_URL` / `REDIS_URL_FILE` | API, STS, Gateway, Audit, Coordinator. | | `STREAMS_HMAC_KEY` / `STREAMS_HMAC_KEY_FILE` | Stream producers and consumers. | | `AUDIT_HMAC_KEY` / `AUDIT_HMAC_KEY_FILE` | Audit producers and Audit service. | | `GATEWAY_STS_HMAC_KEY` / `GATEWAY_STS_HMAC_KEY_FILE` | API, STS, Gateway. | | `ZONE_KEK` / `ZONE_KEK_FILE` | API and STS. | | `CARACAL_ADMIN_TOKEN` / `CARACAL_ADMIN_TOKEN_FILE` | API and management clients. | | `CARACAL_COORDINATOR_TOKEN` / `CARACAL_COORDINATOR_TOKEN_FILE` | Coordinator and Console agent/delegation views. | ## Deployment Values Helm values live under `infra/helm/caracal/values.yaml`. Compose environment and secrets are defined by `infra/docker/docker-compose.yml` and `infra/docker/runtime-compose.yml`. ## Next Step Use [Configuration Order](/reference/config-precedence/) to understand which file, environment variable, or deployment value wins. ## Related Pages - [Configure Workloads](/runtime-console/config-file/) - [Configure Service Environment](/operations/env-vars/) - [Choose a Cloud Profile](/operations/cloud-native-profiles/) --- # Configuration Order # URL: https://docs.caracal.run/reference/config-precedence/ # Markdown: https://docs.caracal.run/markdown/reference/config-precedence.md # Type: config # Concepts: # Requires: --- ## Runtime Workload Config `caracal run` and SDK loaders use this order: 1. `CARACAL_CONFIG`, when set and the file exists. 2. Environment runtime config. 3. `$XDG_CONFIG_HOME/caracal/caracal.toml`. 4. `~/.config/caracal/caracal.toml`. The runtime does not read `./caracal.toml` from the current working directory. ## Environment Runtime Config Detection Environment config is considered when workload identity variables are present. Local dev and stable launches use `CARACAL_ZONE_ID` and `CARACAL_APPLICATION_ID`, then auto-detect credential files from the OS Caracal config directory. Cloud and custom deployments can provide explicit `CARACAL_APP_CLIENT_SECRET_FILE`, `CARACAL_RUN_CREDENTIALS`, `CARACAL_RUN_CREDENTIALS_FILE`, or `CARACAL_RESOURCES_FILE` values when mounted secret or config paths differ from the local convention. Set only one of `CARACAL_RUN_CREDENTIALS` and `CARACAL_RUN_CREDENTIALS_FILE`. Resource bindings resolve from profile credentials, then `CARACAL_RESOURCES_FILE`, then `CARACAL_RESOURCES`. Later bindings with the same resource ID override earlier bindings, so a short environment override can replace one entry from a mounted JSON file without duplicating the file. ## File-Secret Resolution Services support `*_FILE` variants for configured secret keys. File values are resolved before validation so deployment templates can mount secrets instead of placing sensitive material directly in environment variables. ## Deployment Values | Deployment path | Precedence model | | --- | --- | | Docker Compose | Shell environment overrides defaults in compose files; secrets are mounted from files. | | Helm | CLI `--set` and later values files override earlier chart values; runtime Secret keys feed mounted service env. | | Console generated profile | Explicit values from guided setup write a profile and secret file when file writing is enabled. | ## Troubleshooting | Symptom | Check | | --- | --- | | Expected profile is ignored | `CARACAL_CONFIG` may point elsewhere or environment runtime config may be winning. | | Service fails with missing secret | Confirm the `*_FILE` variable name is supported by that service and the file is readable. | | Helm values render unexpected defaults | Check values file order and CLI overrides. | ## Next Step Use [Defaults and Limits](/reference/defaults-and-limits/) to confirm ports, TTLs, request limits, and stream defaults. --- # Defaults and Limits # URL: https://docs.caracal.run/reference/defaults-and-limits/ # Markdown: https://docs.caracal.run/markdown/reference/defaults-and-limits.md # Type: reference # Concepts: # Requires: --- ## Ports | Component | Port | | --- | --- | | API | `3000` | | STS | `8080` | | Gateway | `8081` | | Audit | `9090` | | Coordinator | `4000` | | Postgres | `5432` | | Redis | `6379` | ## Token and Authority Lifetimes | Limit | Default | | --- | --- | | STS resource mandate cap | 15 minutes | | STS session mandate cap | 60 minutes | | STS `MAX_GRANT_TTL_SECONDS` | `3600` | | DCR application lifetime default and maximum | 3600 seconds | | `caracal run` injected credential TTL | 900 seconds | | Runtime step-up polling | every 2 seconds for up to 5 minutes | | Gateway expiring-token preflight window | 35 seconds | ## Service Limits | Limit | Default | | --- | --- | | API body limit | `1_048_576` bytes | | API request timeout | `30_000` ms | | Gateway max request bytes | 10 MiB | | Gateway STS timeout | 5 seconds | | Gateway upstream timeout | 30 seconds | | Gateway STS circuit failure limit | 3 failures | | Gateway STS circuit open window | 10 seconds | | STS `OPA_POLL_SECONDS` | 60 seconds, max 300 | | Control body limit | 64 KiB | | Control rate capacity | 60 per window | | Control rate window | 60 seconds | | Control replay TTL | 3600 seconds | ## Agent and Delegation Limits | Limit | Default | | --- | --- | | Concurrent agent sessions per zone | 50 | | Concurrent agent sessions per application | 200 | | Child agents per parent session | 10 | | Delegation depth | 10 | | Agent labels per session | 32 | | Agent label length | 64 characters | | STS request rate per zone, resource, and acting application | 1000 per minute | ## Storage and Stream Defaults | Default | Value | | --- | --- | | Audit retention | 365 days | | Audit max deliveries before DLQ | 8 | | Audit claim idle | 30 seconds | | Audit tamper rolling window | 4 hours | | Redis audit stream intended max length | 1,000,000 | | Redis audit DLQ intended max length | 100,000 | | Redis policy/revocation/key stream intended max length | 10,000 | ## Helm Defaults | Service | Replicas | Max HPA replicas | | --- | --- | --- | | API | 2 | 8 | | STS | 2 | 8 | | Gateway | 2 | 16 | | Audit | 2 | 8 | | Coordinator | 2 | 8 | | Control | disabled | 2 when enabled | ## Next Step Use [CLI Exit Codes](/reference/runtime-exit-codes/) when automating top-level `caracal` runtime commands. --- # CLI Exit Codes # URL: https://docs.caracal.run/reference/runtime-exit-codes/ # Markdown: https://docs.caracal.run/markdown/reference/runtime-exit-codes.md # Type: reference # Concepts: # Requires: --- Top-level `caracal` commands are limited to local runtime lifecycle, status, purge, `run`, and Console launch. ## Exit Behavior | Command | Success | Failure | | --- | --- | --- | | `caracal up` | `0` after stack start succeeds. | Non-zero when Compose/build/start fails. | | `caracal down` | `0` after stack stop succeeds. | Non-zero when Compose stop fails. | | `caracal status` | `0` when health probes pass. | `1` when service health fails. | | `caracal status --ready` | `0` when readiness probes pass. | `1` when dependency readiness fails. | | `caracal purge` | `0` after selected state is removed. | Non-zero when cleanup fails. | | `caracal run -- ` | Child process exit code after credentials are acquired. | Non-zero before spawn when config, credential exchange, step-up, or governance fails. | | `caracal console` | Console process exit code. | Non-zero when Console launcher or TTY requirements fail. | ## Structured Output `caracal status --json` emits machine-readable status. Console-owned diagnostics are available through the Console `diagnostics` view and management command catalog, not as top-level runtime CLI workflows. ## Next Step Use [Compatibility](/reference/compatibility/) before changing supported runtime, package manager, deployment, or docs build targets. ## Related Pages - [Choose the Right Surface](/runtime-console/cli-and-console/) - [Start and Check the Stack](/runtime-console/stack/) - [Run Workloads](/runtime-console/runtime/) --- # Compatibility # URL: https://docs.caracal.run/reference/compatibility/ # Markdown: https://docs.caracal.run/markdown/reference/compatibility.md # Type: reference # Concepts: # Requires: --- ## Language Runtimes | Area | Current target | | --- | --- | | TypeScript/Node packages | Node `>=22` where package engines are declared. | | Python packages | Python `>=3.12`. | | Go modules | Go `1.26` where modules declare a version. | | Root package manager | `pnpm@11.1.1`. | ## Deployment Targets | Target | Source | | --- | --- | | Local development | `caracal up` and `infra/docker/docker-compose.yml`. | | Self-hosted Compose | `infra/docker/runtime-compose.yml` with versioned GHCR images. | | Kubernetes | `infra/helm/caracal`. | | Docs site | Astro `^6.3.1` and Starlight `0.39.2`. | ## Platform Notes - Runtime and Console binaries are documented for Linux, macOS, and Windows launchers where release packaging provides them. - Local OSS ports must remain distinct from any enterprise product ports. - Published modes are `rc` and `stable`; both enforce stricter service config. ## Next Step Use [Release Map](/reference/release-package-runtime-map/) to map product versions to packages, images, and release surfaces. ## Related Pages - [Release Map](/reference/release-package-runtime-map/) - [Deploy with Helm](/operations/kubernetes-helm/) - [Set Up Locally](/contributing/setup/) --- # Release Map # URL: https://docs.caracal.run/reference/release-package-runtime-map/ # Markdown: https://docs.caracal.run/markdown/reference/release-package-runtime-map.md # Type: reference # Concepts: # Requires: --- ## Repository Release Surfaces | Surface | Current source | | --- | --- | | Root package manager | `package.json` declares `pnpm@11.1.1`. | | Release scripts | `scripts/release.sh`, `scripts/release.mjs`, `release.config.json`, Changesets. | | Docker runtime compose | `infra/docker/runtime-compose.yml`. | | Helm chart | `infra/helm/caracal/Chart.yaml` and values files. | | Docs build | `docs/package.json`. | ## Service Images Caracal ships two application images. Each runs multiple roles selected by the container `command`; this keeps the trust boundary (separate processes/pods per role) while cutting build, signing, and supply-chain surface. | Image name | Roles | Runtime | | --- | --- | --- | | `caracal-node` | API, Coordinator | Node | | `caracal-go` | STS, Gateway, Audit | Go | | `caracal-postgres` | Postgres | — | | `caracal-redis` | Redis | — | Role selection: - `caracal-node` → `command: ["/app/api/dist/main.js"]` or `["/app/coordinator/dist/main.js"]`. - `caracal-go` → `command: ["/usr/local/bin/sts"]`, `["/usr/local/bin/gateway"]`, or `["/usr/local/bin/audit"]`. Runtime Compose references `${CARACAL_REGISTRY:-ghcr.io/garudex-labs/}:v${CARACAL_VERSION}`. ## Published Package Names | Area | TypeScript | Python | Go | | --- | --- | --- | --- | | SDK | `@caracalai/sdk` | `caracalai-sdk` | `github.com/garudex-labs/caracal/packages/sdk/go` | | Identity | `@caracalai/identity` | `caracalai-identity` | `github.com/garudex-labs/caracal/packages/identity/go` | | OAuth | `@caracalai/oauth` | `caracalai-oauth` | `github.com/garudex-labs/caracal/packages/oauth/go` | | Revocation | `@caracalai/revocation` | `caracalai-revocation` | `github.com/garudex-labs/caracal/packages/revocation/go` | | MCP transport | `@caracalai/transport-mcp` | `caracalai-transport-mcp` | `github.com/garudex-labs/caracal/packages/transport/mcp/go` | | A2A transport | `@caracalai/transport-a2a` | Not published in this repository | Not published in this repository | ## Versioning Notes - Pin exact image tags in production values. - Run migration and readiness validation after image or chart changes. - Do not use product-management runtime CLI aliases as release surfaces; management automation belongs to Control/Admin APIs. ## Next Step Use [Wire Contracts](/reference/interoperability-contracts/) when validating SDK, connector, exporter, or provider-plugin compatibility. --- # Wire Contracts # URL: https://docs.caracal.run/reference/interoperability-contracts/ # Markdown: https://docs.caracal.run/markdown/reference/interoperability-contracts.md # Type: reference # Concepts: # Requires: --- Caracal publishes schema files under `docs/public/schemas/` and test fixtures under `tests/shared/fixtures/interoperability/`. ## Schema Files | Contract | Schema | | --- | --- | | Agent connector manifest | `caracal-agent-connector-manifest-2026-05-21.schema.json` | | Audit event | `caracal-audit-event-2026-05-21.schema.json` | | Audit exporter manifest | `caracal-audit-exporter-manifest-2026-05-21.schema.json` | | Gateway upstream manifest | `caracal-gateway-upstream-manifest-2026-05-21.schema.json` | | JWT claims | `caracal-jwt-claims-2026-05-21.schema.json` | | Policy input | `caracal-policy-input-2026-05-20.schema.json` | | Policy pack manifest | `caracal-policy-pack-manifest-2026-05-21.schema.json` | | Policy result | `caracal-policy-result-2026-05-20.schema.json` | | Provider credential plugin manifest | `caracal-provider-credential-plugin-manifest-2026-05-21.schema.json` | | Resource verifier manifest | `caracal-resource-verifier-manifest-2026-05-21.schema.json` | | Revocation event | `caracal-revocation-event-2026-05-21.schema.json` | | Token response | `caracal-token-response-2026-05-21.schema.json` | | W3C baggage | `caracal-w3c-baggage-2026-05-21.schema.json` | ## Fixture Files Fixtures include valid examples for audit events, audit exporter manifests, Gateway upstream manifests, JWT claims, policy input/results, policy pack manifests, provider credential plugins, resource verifier manifests, revocation events, token responses, W3C baggage, trace context headers, and stream signature canonicalization vectors. ## Usage - Use schemas when building connectors, exporters, or provider plugins. - Use fixtures when adding SDK or interoperability tests. - Preserve schema filenames when documenting versioned contracts. ## Next Step Use [Compare Editions](/enterprise/) when deciding between self-hosted Community Edition and managed Enterprise Edition. ## Related Pages - [Use Event Topics](/api/event-topics/) - [MCP Auth Transport](/sdks/transport-mcp/) - [Validate Changes](/contributing/testing/) --- # Compare Editions # URL: https://docs.caracal.run/enterprise/ # Markdown: https://docs.caracal.run/markdown/enterprise.md # Type: page # Concepts: # Requires: --- Caracal ships under a dual-license model. Everything in this documentation describes the **Community Edition**, licensed under Apache 2.0 and free to self-host. The **Enterprise Edition** is a commercial product from Garudex Labs that adds managed multi-tenant operation, a hosted control plane, single sign-on, and a fully managed data plane on top of the same authority model. The two editions share one model, one mandate format, and one set of SDKs. Enterprise does not fork the concepts — it removes the operational work of running and scaling them yourself. :::note[FAQ] [Which features are Enterprise Edition only?](/reference/faq/#faq-019) ::: :::tip[Talk to us] For pricing, a demo, or a deployment fit assessment, [book an enterprise call](https://cal.com/rawx18/caracal-enterprise-sales). ::: ## Edition Comparison | Capability | Community Edition (Apache 2.0) | Enterprise Edition (commercial) | | --- | --- | --- | | Authority model, mandates, delegation, policy, audit | Included | Included | | SDKs and connectors (TypeScript, Python, Go) | Included | Included | | Zones as a manual isolation primitive | Included | Included | | Managed multi-tenancy | Self-modeled with zones and the Admin API | Native tenant, organization, and workspace lifecycle | | Single sign-on (SSO) | Not included | SAML and OIDC SSO with SCIM provisioning | | Teams and role-based access control | Not included | Org, team, and role management across tenants | | Control surface | Local Console and self-hosted Admin API | Hosted management UI for all tenants | | Gateway and services | You deploy and operate every service | Fully managed Gateway, STS, Coordinator, and audit plane | | Integration effort | Run the full stack, then integrate the SDK | Integrate the SDK against managed endpoints; no services to run | | Support | Community and issues | Commercial SLA, priority support, and onboarding | ## Enterprise Additions ### Managed multi-tenancy The Community Edition treats a [zone](/concepts/zone/) as a manual isolation boundary, and you map tenants onto zones yourself as described in [Model Your Application in Caracal](/guides/modeling-recipes/). The Enterprise Edition provides a managed tenancy layer with first-class tenants, organizations, and workspaces, including provisioning, lifecycle, and isolation guarantees handled for you instead of automated by hand through the Admin API. ### Hosted management UI The Community Edition is operated through the local Console and the self-hosted Admin API. The Enterprise Edition replaces the local Console with a hosted management UI that administers every tenant, application, resource, provider, policy set, session, and audit stream from one place, with role-aware access for your teams. ### Managed data plane In the Community Edition you deploy and operate the Gateway, STS, Coordinator, audit, and supporting stores yourself. In the Enterprise Edition these run as a managed data plane: the Gateway and every service are operated, scaled, and upgraded for you. Your application work reduces to integrating the SDK against managed endpoints — there are no Caracal services for your team to host, patch, or scale. ### Single sign-on and team access The Enterprise Edition adds SAML and OIDC single sign-on with SCIM user provisioning, plus organizations, teams, and role-based access control so administrators and reviewers get scoped access across tenants. None of this is part of the open-source edition. ### Commercial support and assurances Enterprise customers receive a support SLA, priority response, guided onboarding, and the commercial agreement terms covering warranty, indemnification, and enterprise integrations. ## Licensing The Community Edition is governed by the Apache 2.0 license and is free to use, modify, and self-host. Enterprise Components are proprietary and require a written commercial subscription from Garudex Labs. Using the managed multi-tenancy, hosted UI, managed data plane, or SSO components without that agreement is not permitted under Apache 2.0. ## Get Enterprise - [Book an enterprise call](https://cal.com/rawx18/caracal-enterprise-sales) for pricing, a demo, or a deployment fit assessment. - Email `support@garudexlabs.com` for licensing and procurement questions. - Visit [garudexlabs.com](https://www.garudexlabs.com) for company and product details. ## Related Pages - [Compliance and Operations Readiness](/enterprise/compliance-readiness/) - [Zones](/concepts/zone/) - [Model Your Application in Caracal](/guides/modeling-recipes/) - [Authority and Enforcement](/concepts/authority-model/) --- # Contribute to Caracal # URL: https://docs.caracal.run/contributing/ # Markdown: https://docs.caracal.run/markdown/contributing.md # Type: landing # Concepts: # Requires: --- Caracal uses a multi-language workspace: TypeScript apps/packages, Go services/packages, Python packages/examples, Docker/Helm infra, and Astro docs. ## Contributor Path | Need | Page | | --- | --- | | Prepare your machine | [Set Up Locally](/contributing/setup/) | | Learn project boundaries and naming conventions | [Follow Project Standards](/contributing/style/) | | Work on an issue or pull request | [Make a Change](/contributing/workflow/) | | Run the right checks | [Validate Changes](/contributing/testing/) | ## Maintainer Path | Need | Page | | --- | --- | | Understand review, ownership, and security process | [Understand Governance](/contributing/governance/) | | Prepare, publish, or recover a release | [Release Caracal](/contributing/release/) | ## Current Toolchain | Tool | Version | | --- | --- | | Node.js | 24+ | | pnpm | 11.1.1 | | Docker + Compose v2 | 24+ | | Go | 1.26+ | | Python | 3.14+ | | Bun | latest stable used by package scripts | ## Before You Start Read [Follow Project Standards](/contributing/style/) before editing code, docs, commands, or product boundaries. ## Next Step Start with [Set Up Locally](/contributing/setup/) before making source changes. --- # Set Up Locally # URL: https://docs.caracal.run/contributing/setup/ # Markdown: https://docs.caracal.run/markdown/contributing/setup.md # Type: workflow # Concepts: # Requires: --- ## Prerequisites | Tool | Version | | --- | --- | | Node.js | 24+ | | pnpm | 11.1.1 | | Docker + Compose v2 | 24+ | | Go | 1.26+ | | Python | 3.14+ | | Bun | latest stable used by package scripts | ## Install Dependencies ```bash git clone https://github.com/Garudex-Labs/caracal.git cd caracal pnpm install ``` `pnpm install` is the standard dependency install command for the Node workspace. Run `pnpm run setup` when you need the full cross-platform developer environment: it runs `pnpm install`, downloads Go module dependencies, creates `.venv`, installs Python test and style tools, and installs local Python packages in editable mode. ## Start the Local Stack ```bash pnpm secrets:init pnpm caracal up pnpm caracal status --ready ``` The local stack uses `infra/docker/docker-compose.yml`, builds local service images, and binds service ports to loopback. ## Open Console ```bash pnpm caracal console ``` Use Console for zones, applications, providers, resources, policies, sessions, audit, agents, delegation, diagnostics, and Control. Do not expect top-level runtime CLI commands for product management. ## Stop or Reset ```bash pnpm caracal down pnpm caracal purge ``` Use `purge` only when you intentionally want to remove local stack/runtime state. ## Next Step Read [Follow Project Standards](/contributing/style/) before editing source files. ## Related Pages - [Runtime and Console](/runtime-console/) - [Deploy with Docker Compose](/operations/docker-compose/) - [Validate Changes](/contributing/testing/) --- # Follow Project Standards # URL: https://docs.caracal.run/contributing/style/ # Markdown: https://docs.caracal.run/markdown/contributing/style.md # Type: reference # Concepts: # Requires: --- Caracal style favors small, explicit boundaries and source-aligned documentation. ## Language Style Guides | Language | Required style | Enforcement | | --- | --- | --- | | TypeScript and JavaScript | TypeScript Handbook style plus the repository Prettier profile in `.prettierrc.json`. | `pnpm run style` checks changed TS/JS source files with Prettier. | | Go | Effective Go with canonical `gofmt` formatting. | `pnpm run style` checks changed Go source files with `gofmt -l`. | | Python | PEP 8 layout as formatted by Ruff. | `pnpm run style` checks changed Python source files with `ruff format --check`. | Use `pnpm run style:fix` to format changed files before committing. Pull requests run the same changed-file style gate automatically for primary-language source files. ## Code Conventions | Convention | Apply it | | --- | --- | | Keep changes focused | Avoid unrelated refactors in feature or docs PRs. | | Prefer explicit validation | Fail closed on auth, policy, config, stream, and key errors. | | Preserve product boundaries | Do not couple open-source code to enterprise-only code. | | Respect command ownership | Runtime CLI is lifecycle/setup; Console/Admin/Control own product management. | | Use existing shared layers | Reuse core config, errors, crypto, logging, engine dispatch, and SDK helpers. | ## Documentation Conventions | Page type | Pattern | | --- | --- | | Landing | Purpose, audience, map, recommended reading path. | | Workflow | Prerequisites, steps, validation, troubleshooting, related links. | | Reference | Exact names, defaults, tables, examples, source-of-truth links. | | Architecture | Diagram, component responsibilities, flow, boundaries, related pages. | Do not use docs to preserve stale command names, screenshots, package names, or workflows. Update the whole affected page coherently. ## Project Boundaries - Top-level `caracal` commands are limited to runtime lifecycle, setup, and `caracal run`. - Product-management workflows for zones, policies, grants, audit, agents, delegation, and Control belong in Console, Admin SDK, or Control API docs. - Open-source code must not import, reference, or depend on enterprise-only code. - Security-sensitive findings must follow [Report a Vulnerability](/security/disclosure/), not public issues. ## Naming Use canonical terms from [Glossary](/reference/glossary/): zone, principal, resource, grant, policy, policy set, mandate, agent session, delegation edge, audit ledger, Console, Control API, STS, Gateway. ## Next Step Use [Make a Change](/contributing/workflow/) to plan and submit a focused pull request. --- # Make a Change # URL: https://docs.caracal.run/contributing/workflow/ # Markdown: https://docs.caracal.run/markdown/contributing/workflow.md # Type: workflow # Concepts: # Requires: --- ## Standard Flow 1. Start from `main`. 2. Keep the change focused on one component or workflow. 3. Read the relevant service/package instructions before editing. 4. Update docs when behavior, APIs, commands, config, examples, or operations change. 5. Run targeted tests first, then broader checks when shared boundaries changed. 6. Open a pull request with the problem, solution, validation, and risk notes. ## Choose the Area | Area | Common sources | | --- | --- | | Runtime CLI | `apps/runtime`, `packages/engine`, runtime tests. | | Console | `apps/console`, Admin SDK, Console tests. | | API | `apps/api`, migrations, Admin package, API tests. | | Coordinator | `apps/coordinator`, Coordinator tests, SDK tests. | | STS/Gateway/Audit | `services/*`, Go tests, operations docs. | | SDKs/connectors | `packages/*`, language-specific tests, interoperability fixtures. | | Infra | `infra/docker`, `infra/helm`, Postgres/Redis scripts. | | Docs | `docs/src/content/docs`, `docs/astro.config.mjs`, generated docs pages. | ## Command Boundary Do not add top-level runtime CLI commands for zones, policies, grants, audit, agents, delegation, or Control. Human workflows belong to Console; automation belongs to Control/Admin APIs. ## Security Reports Do not discuss suspected vulnerabilities in public issues. Use GitHub private advisories or the email path in [Report a Vulnerability](/security/disclosure/). ## Next Step Use [Validate Changes](/contributing/testing/) to choose the narrowest useful test command before opening a pull request. --- # Validate Changes # URL: https://docs.caracal.run/contributing/testing/ # Markdown: https://docs.caracal.run/markdown/contributing/testing.md # Type: reference # Concepts: # Requires: --- Run the smallest relevant suite first, then broaden when a change touches shared code, security boundaries, generated artifacts, or deployment behavior. ## Testing Policy This policy is mandatory and is enforced during review: - Major new functionality MUST add automated tests covering that functionality, in the same change that introduces it. - Every bug fix MUST add a regression test that fails without the fix and passes with it. - Reviewers MUST confirm the required tests exist and run in CI before approving; pull requests that omit them are not merged. ## Root Commands | Command | Purpose | | --- | --- | | `pnpm run build:typescript` | Build TypeScript apps and packages. | | `pnpm run lint` | Run package linters where present. | | `pnpm run typecheck` | Run TypeScript type checks. | | `pnpm run test` | Full TypeScript, Go, and Python test suite. | | `pnpm run test:typescript` | TypeScript app/package tests. | | `pnpm run test:go` | Go service/package and interoperability tests. | | `pnpm run test:python` | Python package tests. | | `pnpm run ci` | Build, lint, typecheck, and test sequence. | ## Targeted Examples | Area | Command | | --- | --- | | Runtime CLI | `pnpm --dir apps/runtime test` | | Console | `pnpm --dir apps/console test` | | API | `pnpm --dir apps/api test` | | Coordinator | `pnpm --dir apps/coordinator test` | | STS | `go test ./services/sts/...` | | Gateway | `go test ./services/gateway/...` | | Audit | `go test ./services/audit/...` | | Docs | `pnpm --dir docs build` | | Lynx Capital | `cd examples/lynxCapital && pytest tests/` | ## CI Mirror ```bash scripts/testCi.sh scripts/testCi.sh --smoke scripts/testCi.sh --go scripts/testCi.sh --py scripts/testCi.sh --ts ``` Use broader checks when a change affects auth, crypto, config, release, infra, shared packages, SDK contracts, or interoperability schemas. ## Next Step After validation, review [Understand Governance](/contributing/governance/) for contribution scale, review ownership, and private security process. --- # Understand Governance # URL: https://docs.caracal.run/contributing/governance/ # Markdown: https://docs.caracal.run/markdown/contributing/governance.md # Type: reference # Concepts: # Requires: --- Caracal is maintained by Garudex Labs. Maintainers listed in `.github/MAINTAINERS` are the primary decision-makers for project areas, reviews, triage, standards, and releases. ## Contribution Scale | Change size | Expected process | | --- | --- | | Small focused fix | Pull request with clear validation. | | Medium bug or feature | GitHub issue with context and expected outcome before implementation. | | Large or cross-cutting change | Proposal with problem statement, alternatives, trade-offs, open questions, and smaller sub-issues. | | Security-sensitive change | Private security process and maintainer coordination. | ## Maintainer Responsibilities - Review changes in owned areas. - Enforce repository standards and product boundaries. - Keep security reports private. - Approve releases and release workflow changes. - Preserve open-source and enterprise product isolation. ## Code Review Requirements Every change is proposed as a pull request and reviewed before merge or release. | Requirement | Expectation | | --- | --- | | Independent review | At least one maintainer other than the author approves each pull request; authors do not approve or merge their own changes. | | Area ownership | `.github/CODEOWNERS` owners are requested automatically for their paths. | | What reviewers check | Correctness and edge cases, focused scope, Testing Policy compliance with passing CI, the `pnpm run style` gate, input validation and trust boundaries, secret hygiene, OSS/enterprise isolation, and updated docs. | | Acceptance bar | One approving non-author review, all required CI checks green, resolved comments, and a judgment that the change is worthwhile and free of known disqualifying defects. | | Release approval | Stable releases require `release-approval` from a maintainer other than the release preparer. | The full contributor-facing policy lives in [`.github/CONTRIBUTING.md`](https://github.com/Garudex-Labs/caracal/blob/main/.github/CONTRIBUTING.md#code-review). ## Community Standards The project follows the repository [Code of Conduct](https://github.com/Garudex-Labs/caracal/blob/main/.github/CODE_OF_CONDUCT.md). Harassment, private-information disclosure, and disruptive behavior are not acceptable. ## Security Governance Security concerns must be reported through [Report a Vulnerability](/security/disclosure/). Public issues are not appropriate for vulnerabilities, credential exposure, unsafe execution, or exploitable operational failures. ## Next Step Maintainers preparing a cut should use [Release Caracal](/contributing/release/). ## Related Pages - [Make a Change](/contributing/workflow/) - [Review the Threat Model](/security/threat-model/) - [Respond to Incidents](/operations/incident-response/) --- # Release Caracal # URL: https://docs.caracal.run/contributing/release/ # Markdown: https://docs.caracal.run/markdown/contributing/release.md # Type: reference # Concepts: # Requires: --- Caracal release artifacts use CalVer product versions and package versions defined in `release.config.json`. ## Release Surfaces | Surface | Source | | --- | --- | | Product version | `release.config.json` product version, currently `2026.06.22-rc.1`. | | Runtime binary | `apps/runtime/dist/caracal-*`. | | Console binary | `apps/console/dist/caracal-console-*`. | | Containers | API, STS, Gateway, Audit, Coordinator, Control, Postgres, Redis, Runtime. | | Helm | `infra/helm/caracal`. | | npm packages | Public `@caracalai/*` packages from `release.config.json`. | | PyPI packages | Public `caracalai-*` packages from `release.config.json`. | ## Build Targets Runtime and Console release builds compile Linux x64/arm64, macOS x64/arm64, and Windows x64 binaries through Bun compile scripts. Go-based service container builds preserve debug information by default and honor native build arguments passed to Docker: `CGO_ENABLED`, `CC`, `CFLAGS`, `CXX`, `CXXFLAGS`, `LDFLAGS`, `GOFLAGS`, `GO_BUILDFLAGS`, and `GO_LDFLAGS`. The Dockerfiles add `-mod=readonly` and `-trimpath`; pass linker options through `GO_LDFLAGS` for diagnostic or release-specific builds. ## Release Flow | Stage | Command pattern | | --- | --- | | RC prepare | `scripts/release.sh rc prepare --base-version --suffix rc.1` | | RC dry run | `scripts/release.sh rc dry-run --base-version --suffix rc.1 --local` | | Stable dry run | `scripts/release.sh stable --dry-run` | | Stable publish | `scripts/release.sh stable` | Protected GitHub workflows publish npm and PyPI packages. Local stable publishing requires explicit approval and environment override. ## Changesets Only public `@caracalai/*` packages under `packages/` are released through Changesets. Internal-only packages must stay in the Changesets ignore list. ## Rollback Rule Do not delete published tags. Roll forward with a new CalVer tag. Pinned `v` tags are immutable; floating monthly tags move with the new cut. ## Related Pages - [Release Map](/reference/release-package-runtime-map/) - [Upgrade Caracal](/operations/upgrade/) --- # Page not found # URL: https://docs.caracal.run/404/ # Markdown: https://docs.caracal.run/markdown/404.md # Type: page # Concepts: # Requires: --- import LandingFooter from '../../components/LandingFooter.astro'

Error · 404

This docs path is not available.

The page may have moved during the documentation refresh, or the link may point to an older route. Use search, start from the docs home, or open the onboarding path below.

--- # Caracal Joins LFX Mentorship # URL: https://docs.caracal.run/blog/caracal-joins-lfx-mentorship/ # Markdown: https://docs.caracal.run/markdown/blog/caracal-joins-lfx-mentorship.md # Type: page # Concepts: # Requires: --- import BlogPost from '../../../components/BlogPost.astro' import LandingFooter from '../../../components/LandingFooter.astro' Open source does not grow by code alone. It grows when people contribute, learn, review, document, and help others build with confidence. That is the real value of mentorship in open source. Caracal is a security-first AI authority, delegation, and identity system for autonomous agents. As the project evolves, one of the most important priorities is making it easier for developers to integrate, understand, and operate Caracal in real-world environments. That is why participating in LFX Mentorship is such a strong fit. ## Real work, not a classroom LFX Mentorship gives contributors the chance to work on real open source projects with guidance from maintainers and mentors. It is not a classroom exercise. It is hands-on work inside an active project, with real problems, real collaboration, and real outcomes. For mentees, the value is practical experience. They learn how open source projects are built, maintained, and sustained. They gain exposure to code review, documentation, workflows, communication, and the day-to-day reality of contributing to a project that serves a broader community. They also learn how to think beyond features and focus on maintainability, usability, and long-term project health. ## What the project gets For projects like Caracal, the value is equally important. Mentorship brings fresh perspective, additional contributors, and focused effort on areas that directly improve adoption. In this mentorship, the emphasis is on developer experience and ease of integration and operation. That means making Caracal clearer, smoother, and easier to work with, while keeping its security-first foundation intact. This year, Caracal's LFX Mentorship project includes Ashutosh Singh and Mohammad Haider Ali. Their work is part of a larger effort to strengthen the project, improve the contributor experience, and show how open source software is sustained over time through collaboration and shared ownership. ## Investing in the ecosystem Open source projects survive when people invest in them. Mentorship is one of the best ways to do that. It helps new contributors grow, helps projects improve, and helps communities stay strong. Caracal is proud to take part in LFX Mentorship and continue building with the open source ecosystem. --- # Blogs # URL: https://docs.caracal.run/blogs/ # Markdown: https://docs.caracal.run/markdown/blogs.md # Type: page # Concepts: # Requires: --- import BlogsDirectory from '../../components/BlogsDirectory.astro' import LandingFooter from '../../components/LandingFooter.astro' --- # Compliance and Operations Readiness # URL: https://docs.caracal.run/enterprise/compliance-readiness/ # Markdown: https://docs.caracal.run/markdown/enterprise/compliance-readiness.md # Type: reference # Concepts: # Requires: --- This page is a compliance and operations readiness review of the Caracal **Community Edition** (open-source, Apache 2.0, self-hosted), written for enterprise security, compliance, audit, procurement, and operations teams evaluating adoption. It is written conservatively and is intended to be verifiable. Read it with the following ground rules. ## Reading Rules and Non-Claims - **Caracal is not certified.** This project does not hold, claim, or imply certification or attestation under any framework named on this page. No framework is represented as fully satisfied or guaranteed by Caracal. - **Best-effort alignment.** Maintainers implement controls in a best-practice-oriented way and try to keep them aligned with common compliance needs. That alignment is best effort, not a guarantee, and not every requirement of any framework is implemented or verified. - **Caracal is a control, not a compliance program.** Caracal is infrastructure that can support specific technical controls inside *your* compliance program. Adopting Caracal does not make an organization compliant with any framework. The control environment, the audit, and the attestation remain the adopter's responsibility. - **Evidence over assertion.** Every "covered" or "partially covered" statement below cites a path in this repository or docs so you can confirm it against the exact commit you deploy. Where a control depends on settings outside the source tree (organization policy, cloud configuration, operator action), it is marked accordingly and not asserted as implemented. - **Status vocabulary.** - Covered — Caracal implements a relevant control in code, schema, CI, or the release pipeline, verifiable at the cited path. - Partially covered — Caracal implements part of the control or bounds the risk, but full coverage depends on operator configuration or application behavior. - Not covered — Caracal does not address this item, and it is the adopter's or another system's responsibility within a domain Caracal could otherwise contribute to. - Not applicable — The item does not map to Caracal's function — for example model behavior, model output, an organizational process, or data-subject content. Where Caracal supplies inputs to such an item, that is noted in the row. - **Framework labels.** OWASP Agentic AI categories, NIST AI RMF functions, EU AI Act articles, SOC 2 Trust Services Criteria, and the agentic authority controls (AARM R1–R9) are referenced as evaluation lenses. Control titles evolve across framework versions; reconcile the exact wording of each item against your authoritative copy of the framework. The AARM R1–R9 mappings use a working interpretation of agentic-authority control intents and must be reconciled with your authoritative control definitions before use in an audit. --- ## A. Executive Summary for Compliance and Operations Teams Caracal is a pre-execution authority layer for agents and automated workloads. Its Security Token Service (STS) is the sole issuer of scoped, signed mandates; the Gateway enforces protected-resource access before execution; the Coordinator tracks agent and delegation state; and the Audit service preserves tamper-evident decision evidence. Postgres is the system of record with per-zone row-level security; Redis Streams move revocation and operational events. For a compliance and operations reader, the honest summary is: - **What Caracal is good at, and can evidence today:** least-privilege machine authority (short-lived signed mandates instead of long-lived credentials), centralized revocation, zone-based isolation and blast-radius containment, and tamper-evident, HMAC-chained audit. These map cleanly to logical-access, traceability, and accountability controls. - **What Caracal supports but does not own:** human/administrator identity and SSO (intentionally external to the runtime trust boundary — see [the SSO position](#e-oss-vs-enterprise-edition-boundary)), risk management process, data-subject rights, and the overall control environment. Caracal supplies technical evidence into these; it does not satisfy them. - **What Caracal does not address:** model-level agent behavior (prompt injection, hallucination, memory poisoning, goal manipulation). Caracal constrains what an agent is *authorized* to do and records what it *did*; it does not judge whether the agent *should* have wanted to. - **Certification posture:** none claimed. No FIPS validation is claimed ([details](/security/enterprise-readiness/#regulated-enterprise-and-fips-readiness)). SOC 2 is an attestation of an operating organization, which the open-source project is not; Caracal provides controls that can support an adopter's SOC 2, not a SOC 2 report. Adoption recommendation in one line: **suitable as a least-privilege authority, revocation, and audit-evidence control for agentic systems, on the explicit understanding that the adopter owns the surrounding compliance program and the items marked partially covered below.** --- ## B. Control-by-Control Matrix ### B.1 OWASP Agentic AI Top 10 (ASI01–ASI10) and Agent Traceability Mapped to the recognized agentic threat categories. Caracal is an authority and audit control, so it addresses the *authority, identity, traceability, and revocation* classes well and the *model-behavior* classes not at all. This is by design. | # | Agentic risk category | Status | Caracal control | | --- | --- | --- | --- | | ASI01 | Memory poisoning | Not applicable | Agent memory and state do not map to Caracal's function. Mitigate in the application. | | ASI02 | Tool misuse | Partially covered | Mandates are scoped per resource and policy-gated before execution; Gateway enforces resource binding and route safety. Caracal cannot prevent misuse *within* legitimately granted scope. | | ASI03 | Privilege compromise / excessive agency | Covered | STS issues least-privilege, short-lived mandates; delegation is re-validated server-side as a subset of parent authority; zones and the application-vs-session ceiling bound escalation; Postgres RLS enforces isolation. | | ASI04 | Resource overload | Partially covered | API rate limiting and a provider rate-limit stream bound request volume; Caracal does not provide general DoS protection or compute quotas. | | ASI05 | Cascading hallucination | Not applicable | Model-output correctness does not map to Caracal's function. | | ASI06 | Intent breaking / goal manipulation | Partially covered | Caracal bounds the *authority* an agent can exercise regardless of manipulated intent; it does not detect manipulated reasoning. | | ASI07 | Misaligned / deceptive behavior | Partially covered | Authority bounds + tamper-evident audit constrain and record impact; behavioral alignment is not assessed. | | ASI08 | Repudiation / untraceability | Covered | Append-only, per-zone HMAC-chained audit with a tamper-detection sweep; decisions for STS, Gateway, Coordinator, policy, and admin activity are recorded. | | ASI09 | Identity spoofing / impersonation | Covered | Machine identity: STS is the sole token issuer; mandates are signed; the `identity` package verifies JWT/JWKS, issuer, audience, zone, scope, and delegation claims and fails closed. Human identity is external by design. | | ASI10 | Human-in-the-loop manipulation / fatigue | Partially covered | Step-up re-authentication exists as a policy-driven control; Caracal does not manage approval-fatigue UX. | | — | Agent traceability | Covered | Every decision is attributable to app, session, policy, resource, and result and is preserved as tamper-evident evidence. | ### B.2 NIST AI RMF 100-1 Caracal is a tool used *within* an organization's RMF, not an implementation of the RMF. It contributes most to **Govern** and **Manage**. | RMF function | Status | Caracal contribution | | --- | --- | --- | | Govern | Partially covered | Enforceable, version-controlled policy (Rego); accountability via attributable audit; documented threat model and incident response. The governance *process* is the adopter's. | | Map | Not applicable | Risk and context mapping is an organizational process, not a Caracal function. Its trust-boundary and resource model are inputs an adopter can map against. | | Measure | Partially covered | Operational and decision metrics, audit lag/tamper metrics, and exportable decision evidence support measurement; Caracal does not measure model risk. | | Manage | Partially covered | Centralized revocation, short-lived authority, alerting, and incident runbooks support active risk management of agent authority. | ### B.3 EU AI Act (Regulation 2024/1689) Caracal is not an AI system and does not place an AI system on the market. It can supply technical evidence toward specific provider/deployer obligations. It does not make any party compliant. | Obligation area | Status | Caracal contribution | | --- | --- | --- | | Art. 9 — Risk management system | Not applicable | A risk-management system is an organizational process, not a Caracal function. Caracal provides enforcement and revocation controls an adopter can reference within it. | | Art. 12 — Record-keeping / logging | Partially covered | Tamper-evident, append-only audit with export to object storage and SIEM provides durable activity records for the authority layer. Scope is Caracal decisions, not full AI-system lifecycle logging. | | Art. 14 — Human oversight | Partially covered | Step-up re-authentication and revocation give operators intervention points; oversight design remains the adopter's. | | Art. 15 — Accuracy, robustness, cybersecurity | Partially covered | Fail-closed enforcement, signed mandates, RLS isolation, supply-chain controls, and tamper detection support the cybersecurity dimension; accuracy/robustness of the AI model is out of scope. | ### B.4 SOC 2 Type II — Trust Services Criteria SOC 2 attests an operating organization's controls over a period and requires an independent auditor. The open-source project is not an operating service organization, so **Caracal cannot hold a SOC 2 report**. The mapping below shows which Caracal controls can serve as *evidence* inside an adopter's SOC 2. | TSC | Status | Caracal control that can serve as evidence | | --- | --- | --- | | CC6 — Logical & physical access | Covered | Sole-issuer STS, signed least-privilege mandates, per-zone RLS, service DB roles with `NOBYPASSRLS`, centralized revocation. | | CC7 — System operations / monitoring | Partially covered | Health/readiness ladders, metrics, alert rules, DLQ and tamper monitoring, incident runbooks. | | CC8 — Change management | Partially covered | Expand/contract migrations with a CI guard, signed reproducible releases, provenance/SBOM attestations, `caracal upgrade`. | | A1 — Availability | Partially covered | HA Helm profile (replicas, PDBs, anti-affinity), Redis durability requirements, backup/retention guidance. Operator owns the SLO. | | C1 — Confidentiality | Partially covered | Per-purpose key separation, zone KEK, RLS, secret files at `0444`, TLS terminated by operator. | | PI1 — Processing integrity | Partially covered | Fail-closed policy evaluation, HMAC-signed streams, append-only audit with continuity checks. | | P-series — Privacy | Not applicable | Caracal processes authority metadata, not subject data sets, and does not implement data-subject-rights workflows. | ### B.5 Agentic Authority Controls (AARM R1–R9) Mapped using a working interpretation of agentic-authority control intents. **Reconcile each control definition against your authoritative source before audit use.** | Ctrl | Working interpretation | Status | Caracal control | | --- | --- | --- | --- | | R1 | Authority is issued by a single, auditable authority | Covered | STS is the sole mandate issuer. | | R2 | Least-privilege, time-bounded authority | Covered | Short-lived scoped mandates; max-TTL bounds. | | R3 | Delegation is constrained and re-validated | Covered | Delegation edges re-validated server-side as subsets; cross-application delegation requires receiver consent. | | R4 | Authority is revocable centrally and promptly | Covered | Redis-backed revocation propagated to STS and Gateway consumers; short mandate TTLs bound exposure. | | R5 | Actions are traceable to an identity | Covered | Attributable audit per app/session/policy/resource/result. | | R6 | Evidence is tamper-evident | Covered | Per-zone HMAC hash chain + startup and rolling tamper sweep. | | R7 | Isolation bounds blast radius | Covered | Zones + per-zone KEK + Postgres RLS. | | R8 | Keys are separated by purpose and rotatable | Partially covered | Per-purpose keys exist and are documented for rotation; rotation is an operator runbook, not automated verification. | | R9 | Operational recoverability | Partially covered | Backup/retention, incident response, failure drills, and rollback are documented; execution and testing are the operator's. | --- ## C. Evidence Map to Code, Docs, and Tests Confirm each item against the commit you deploy. | Control | Evidence path | | --- | --- | | STS sole issuer / signed mandates | `services/sts/`, [Token Exchange Flow](/architecture/token-exchange-flow/) | | Gateway pre-execution enforcement | `services/gateway/internal/`, [Threat Model](/security/threat-model/) | | Identity verification (fail-closed) | `packages/identity/` (`verify.go`, `scope.go`), [Identity Package](/sdks/identity/) | | Token exchange (RFC 8693) | `packages/oauth/` (`client.go`), [OAuth Package](/sdks/oauth/) | | Zone isolation via RLS | `infra/postgres/migrations/0001_baseline.up.sql` (`ENABLE ROW LEVEL SECURITY`, `CREATE POLICY zone_isolation`, `NOBYPASSRLS` roles) | | Append-only, HMAC-chained audit | `services/audit/internal/tamper.go`, `services/audit/internal/rehash.go`, [Audit Ledger](/concepts/audit-ledger/) | | Audit export to object store / SIEM | `services/audit/internal/parquet.go`, `services/audit/internal/server.go`, [Export Audit Evidence](/operations/compliance-audit-integration/) | | Centralized revocation | `services/gateway/internal/revocations.go`, `packages/revocation/`, [Sessions and Revocation](/concepts/sessions-revocation/) | | Redis durability + eviction guard | `packages/core/go/redisguard/redisguard.go`, `infra/redis/redis.conf`, [Operate Redis Streams](/operations/redis/) | | Expand/contract migrations + upgrade | `infra/postgres/scripts/migrate.sh`, `infra/postgres/scripts/validateMigrations.sh`, `apps/runtime/src/commands/stack.ts`, [Upgrade Caracal](/operations/upgrade/) | | Generated secrets (CSPRNG, file-backed) | `packages/engine/src/secrets.ts`, [Configure Service Environment](/operations/env-vars/) | | Key inventory and rotation | [Rotate Keys and Secrets](/operations/key-management/) | | Policy engine (Rego) | [Policy](/concepts/policy/), [Author Policy Data](/guides/author-policy/) | | Supply-chain controls (CI) | `.github/workflows/{codeql,containerScan,supplyChainScan,scorecard,dependencyReview}.yml`, [Enterprise Security Readiness](/security/enterprise-readiness/) | | Release signing / SBOM / provenance | [Verify a Release](/security/verify-releases/), [Enterprise Security Readiness](/security/enterprise-readiness/) | | Generated evidence pack | `infra/scripts/evidencePack.sh`, [Generate an Evidence Pack](/security/evidence-pack/) | | Threat model | `governance/THREAT_MODEL.md`, [Threat Model](/security/threat-model/) | | Incident response | `governance/INCIDENT_RESPONSE.md`, [Incident Response](/operations/incident-response/) | | Backup and retention | [Backup and Retention](/operations/backup-retention/) | --- ## D. Gaps, Limitations, and Non-Claims Each gap is classified as a **product**, **documentation**, or **operations** gap so it can be triaged and, where appropriate, fed back to the project to improve it. | Item | Classification | Statement | | --- | --- | --- | | FIPS-validated cryptography | Product gap (intentional, documented) | Not claimed. Default artifacts are not built against FIPS-validated modules. Operators with a FIPS mandate must supply a validated runtime/host; not verified by the project. See [FIPS Readiness](/security/enterprise-readiness/#regulated-enterprise-and-fips-readiness). | | Human/admin SSO (OIDC/SAML, SCIM) | Boundary (Enterprise Edition) | Intentionally outside the runtime trust boundary. OSS admin access uses an internal admin token that must sit behind an authenticating ingress/proxy. Native SSO/SCIM is Enterprise Edition. | | Key rotation verification | Operations gap | Per-purpose keys and a rotation order are documented, but there is no automated rotation-freshness check. Rotation is an operator runbook. | | Model reasoning & output threats (ASI01, ASI05; detection aspect of ASI06/07) | Out of scope (by design) | Caracal governs authority and records actions; it does not store agent memory, evaluate model reasoning, or judge output correctness. It still bounds the *authority* an agent can exercise under ASI06/07. | | Data-subject-rights / privacy workflows | Out of scope | Caracal processes authority metadata, not subject data sets; privacy obligations are the adopter's. | | SOC 2 report | Not applicable to OSS project | The open-source project is not an operating service organization; it cannot hold a SOC 2 report. It provides controls that support an adopter's SOC 2. | | HA / availability SLO | Operations gap | HA primitives are shipped (Helm profile); the SLO, capacity, and DR testing are the operator's. | | Internal checks | Limitation | CI guards (e.g., the expand-only migration guard) and the tamper sweep are best-effort detections, not guarantees of correctness or compliance. | --- ## E. OSS vs Enterprise Edition Boundary The enforcement core — STS sole issuance, Gateway enforcement, zones, RLS, revocation, and tamper-evident audit — is identical in both editions. **Enforcement strength is not gated behind Enterprise.** The Enterprise Edition adds *managed and organizational* capabilities, not stronger enforcement. | Capability | Community Edition | Enterprise Edition | | --- | --- | --- | | Pre-execution enforcement, revocation, audit | Included | Included (identical) | | Zones as isolation primitive | Included (manual) | Managed multi-tenancy over the same primitive | | Human SSO (SAML/OIDC) + SCIM | Not included (front with your own OIDC proxy) | Included | | Organizations, teams, org-level RBAC | Not included | Included | | Hosted management UI / managed data plane | Not included | Included | | Commercial support / lifecycle | Not included | Included | The intentional position on identity: **STS remains the sole runtime authority and human identity stays external.** This is a deliberate architectural boundary, not a missing OSS feature; an enterprise fronts the Console/Admin API with its existing OIDC-aware ingress today. Enterprise SSO/SCIM is managed convenience over that same seam. See [Compare Editions](/enterprise/). --- ## F. Enterprise Verification Steps Run these against your own copy after adoption. They are designed to be repeatable and to produce artifacts you can retain. 1. **Pin and verify the supply chain.** Verify release signatures, provenance, and SBOMs for the exact artifacts you run — see [Verify a Release](/security/verify-releases/). Record the digests you deploy. 2. **Generate an evidence pack.** Run `infra/scripts/evidencePack.sh` to capture provenance, schema-enforcement, and runtime-readiness evidence in one artifact — see [Generate an Evidence Pack](/security/evidence-pack/). 3. **Confirm zone isolation.** Inspect `infra/postgres/migrations/0001_baseline.up.sql` for `ENABLE ROW LEVEL SECURITY` and `zone_isolation` policies, and confirm service roles are `NOBYPASSRLS`. Then test cross-zone access denial against a running instance. 4. **Confirm audit tamper-evidence.** Review `services/audit/internal/tamper.go`; on a running instance, attempt an out-of-band mutation of `audit_events` in a test environment and confirm the sweep flags a chain break (`CaracalAuditTamperDetected`). 5. **Confirm revocation propagation.** Revoke a session and confirm the Gateway denies a subsequent call within your bound, using `services/gateway/internal/revocations.go` as the reference path. 6. **Confirm fail-closed posture.** Run in published mode (`CARACAL_MODE=stable`) and confirm metrics are authenticated, HMAC keys are required, and the services refuse to start with incomplete configuration. 7. **Confirm change-window safety.** Exercise `caracal upgrade` and the expand-only guard (`infra/postgres/scripts/validateMigrations.sh`) on a staging copy; confirm a contract-phase DDL change is rejected when untagged. 8. **Confirm Redis durability.** Verify `maxmemory-policy noeviction` and AOF on your Redis; confirm the startup eviction warning fires when set to an unsafe policy — see [Operate Redis Streams](/operations/redis/). 9. **Run an agent-assisted review.** Use the [compliance playbook prompts](#g-agent-assisted-review-playbooks) to have a coding agent reproduce this mapping against your checked-out copy and flag drift. --- ## F.1 Operational Readiness Review | Domain | Stance | Reference | | --- | --- | --- | | Secret management | Generated with a CSPRNG, written as `*_FILE` secrets at mode `0444`; derived DB/Redis URLs auto-rewritten. Provision external secrets via your manager; never commit plaintext. | `packages/engine/src/secrets.ts`, [Configure Service Environment](/operations/env-vars/) | | Key rotation | Per-purpose keys with a documented inventory, storage boundary, and rotation order. Rotation is an operator runbook; plan windows and dual-key overlap where required. | [Rotate Keys and Secrets](/operations/key-management/) | | Redis durability | Correctness-critical store, not a cache. Require `noeviction` + AOF; bundled image is safe, external/managed Redis is the operator's responsibility; startup guard warns on unsafe policy. | [Operate Redis Streams](/operations/redis/) | | Migration & upgrade | Expand/contract migrations with a CI guard and `caracal upgrade` for a no-maintenance-window roll. Always run on staging first. | [Upgrade Caracal](/operations/upgrade/) | | Observability | Health/readiness ladders, authenticated metrics, alert rules, DLQ/tamper/lag metrics. Wire to your monitoring and SIEM before go-live. | [Observe Caracal](/operations/observability/), [Alerts](/operations/alerts/) | | Audit export | Append-only `audit_events` plus object-store/SIEM export; preserve audit HMAC keys for the retention window. | [Export Audit Evidence](/operations/compliance-audit-integration/) | | Rollback & incident response | Severity triggers, first-hour runbook, evidence preservation, containment, and Helm rollback. | [Incident Response](/operations/incident-response/), [Failure Drills](/operations/failure-drills/) | --- ## G. Agent-Assisted Review Playbooks To let your own team reproduce this review against the codebase you adopt, the repository ships copy-pasteable prompts for three coding agents. Each instructs the agent to run the same compliance and operations review against a checked-out Caracal copy and produce a truthful, evidence-cited report with verification steps and an explicit non-claims section. | Agent | Path | | --- | --- | | Claude Code | `playbooks/compliance/claude-code/CLAUDE.md` | | GitHub Copilot | `playbooks/compliance/github-copilot/AGENTS.md` | | OpenAI Codex | `playbooks/compliance/openai-codex/.codex/AGENTS.md` | The prompts enforce the same conservatism as this page: no certification claims, evidence paths for every assertion, and clear separation of covered, partially covered, not covered, and not applicable. --- ## H. Final Adoption Recommendation **Adopt Caracal Community Edition as a least-privilege authority, revocation, and audit-evidence control for agentic and automated workloads, with a clear-eyed view of its boundary.** - It is a strong, evidenceable fit for the *authority, identity, isolation, revocation, and traceability* control classes (ASI03, ASI08, ASI09, agent traceability; SOC 2 CC6; AARM R1–R7). - It is a *supporting* control — not a solution — for risk-management process, human oversight, availability SLOs, and key-rotation verification (NIST RMF Govern/Manage; EU AI Act Art. 12/14; SOC 2 CC7/CC8/A1; AARM R8–R9). Own these explicitly. - It does **not** address model-behavior threats or privacy-rights workflows, and it makes **no** certification claim. Do not represent it as compliance, certification, or FIPS validation to a review board. The deciding question for an architecture and compliance review board is not whether Caracal is certified — it is not, and says so — but whether its enforced, tamper-evident, least-privilege authority model materially reduces the security, implementation, and audit burden of governing agents versus building the same controls in-house. For organizations that need pre-execution authority and durable decision evidence for agents, it does, provided the partially covered items above are owned by the adopter and the not-applicable items are recognized as outside Caracal's function. ## Related Pages - [Enterprise Security Readiness](/security/enterprise-readiness/) - [Review the Threat Model](/security/threat-model/) - [Generate an Evidence Pack](/security/evidence-pack/) - [Export Audit Evidence](/operations/compliance-audit-integration/) - [Compare Editions](/enterprise/) --- # Protect a FastAPI App # URL: https://docs.caracal.run/guides/protect-fastapi/ # Markdown: https://docs.caracal.run/markdown/guides/protect-fastapi.md # Type: page # Concepts: # Requires: --- Use `caracalai-asgi` when a FastAPI, Starlette, or other ASGI app should verify Caracal mandates before request handlers run. This is the provider-side boundary: a partner serving Caracal-governed customers verifies each inbound mandate against the zone's keys before doing any work. ## Install ```bash pip install caracalai-asgi caracalai-revocation ``` Use a Redis-backed revocation store in production. The in-memory store is only suitable for local development and tests. ## Add middleware ```python from caracalai_asgi import CaracalASGIAuth from caracalai_revocation import InMemoryRevocationStore from fastapi import FastAPI, Request app = FastAPI() app.add_middleware( CaracalASGIAuth, audience="resource://billing-api", revocations=InMemoryRevocationStore(), required_scopes=["billing:read"], routes={ "/payouts": {"required_scopes": ["billing:payout"], "require_delegation": True}, }, exclude=["/healthz"], ) @app.post("/payouts/create") async def create_payout(request: Request): principal = request.state.caracal return {"actor": principal.sub, "agent": principal.agent_session_id} ``` With `CARACAL_STS_URL` and `CARACAL_ZONE_ID` set — the standard Caracal workload variables — the middleware resolves the issuer and zone itself; you state only your own audience and revocation store. Requests reach your handlers only after the mandate's signature, issuer, audience, zone, token use, scopes, and revocation anchors all verify. ## Enforce the right boundary | Option | Use it for | | --- | --- | | `required_scopes` | Route-level scope checks. | | `required_targets` | Resource-target checks. | | `require_agent` | Reject non-agent mandates. | | `require_delegation` | Require delegated authority. | | `max_hop_count` | Limit delegation depth. | | `routes` | Apply any of the above per path prefix; the longest matching prefix wins. | ## Validate 1. Exchange for a mandate that targets the resource. 2. Call the protected route with `Authorization: Bearer `. 3. Remove a required scope and confirm the route returns `403`. 4. Revoke the session and confirm the route rejects the old mandate. Related pages: [Mandates](/concepts/mandate/) and [Sessions and Revocation](/concepts/sessions-revocation/). --- # Releases # URL: https://docs.caracal.run/releases/ # Markdown: https://docs.caracal.run/markdown/releases.md # Type: page # Concepts: # Requires: --- import ReleasesDirectory from '../../components/ReleasesDirectory.astro' import LandingFooter from '../../components/LandingFooter.astro' --- # ASGI Connector # URL: https://docs.caracal.run/sdks/connectors/asgi/ # Markdown: https://docs.caracal.run/markdown/sdks/connectors/asgi.md # Type: page # Concepts: # Requires: --- The ASGI connector protects any Python ASGI application — FastAPI, Starlette, Quart, Django ASGI — with fail-closed Caracal mandate verification. It is pure ASGI: it imports no web framework and delegates every check to [MCP Auth Transport](/sdks/transport-mcp/). ## Install ```bash pip install caracalai-asgi ``` ## Add middleware ```python from caracalai_asgi import CaracalASGIAuth from caracalai_revocation import InMemoryRevocationStore from fastapi import FastAPI app = FastAPI() app.add_middleware( CaracalASGIAuth, audience="resource://billing-api", revocations=InMemoryRevocationStore(), required_scopes=["billing:read"], routes={ "/payouts": {"required_scopes": ["billing:payout"], "require_delegation": True}, }, exclude=["/healthz"], ) @app.get("/balances") async def balances(request): principal = request.state.caracal return {"sub": principal.sub, "scopes": principal.scope} ``` `issuer` defaults to `CARACAL_STS_URL` and `expected_zone_id` to `CARACAL_ZONE_ID`, so a provider deployed with the standard Caracal workload variables only states its own audience and revocation store. Construction fails if no issuer can be resolved. ## Options | Option | Use it for | | --- | --- | | `audience` | The provider's own resource identifier; mandates minted for other resources are rejected. | | `revocations` | Revocation store consulted on every request. Use [Redis Revocation Store](/sdks/connectors/redis/) in production. | | `required_scopes` / `required_targets` | Default scope and target requirements for every route. | | `require_agent` / `require_delegation` / `max_hop_count` | Agent identity, delegated authority, and delegation-depth requirements. | | `routes` | Per-route overrides by path prefix; the longest matching prefix wins. Any option above can be overridden. | | `exclude` | Path prefixes served without verification (health and readiness probes). | ## Behavior - Verified claims are stored as `scope["state"]["caracal"]`, surfaced as `request.state.caracal` in Starlette and FastAPI. - Failed verification answers with the shared status mapping (`http_status_for_auth_error`): 401 for credential failures, 403 for insufficient authority, with the standard `error`/`error_description` JSON body. - WebSocket connections that fail verification are closed with policy code `1008`; lifespan events pass through. - Call `await middleware.warmup()` at startup to prefetch the zone JWKS before the first request. ## Boundary The connector verifies inbound mandates; it does not create agent sessions or delegation edges. Use the [Python SDK](/sdks/python/) to create Caracal context before making outbound calls. ## Related pages - [Protect a FastAPI App](/guides/protect-fastapi/) - [MCP Auth Transport](/sdks/transport-mcp/) - [Sessions and Revocation](/concepts/sessions-revocation/) --- # Enterprise Security Readiness # URL: https://docs.caracal.run/security/enterprise-readiness/ # Markdown: https://docs.caracal.run/markdown/security/enterprise-readiness.md # Type: reference # Concepts: # Requires: --- This page is the single source of truth for evaluating Caracal's security posture and supply-chain trust. It is written for enterprise security, procurement, and platform teams who need to verify that the project follows security and operational best practices before adopting it. Each control carries a status: - **Implemented** — enforced in code, CI, or the release pipeline today. - **Documented** — a stated policy or expectation the project operates under. - **Planned** — intended, not yet in place. - **Not claimed** — deliberately not asserted, because it is not independently verified. Nothing here asserts a compliance status that cannot be proven. Where a control depends on organization settings that are not visible in the repository, it is marked **Documented**, not **Implemented**. ## What Caracal Is Caracal is a pre-execution authority layer for agents and workloads. Its Security Token Service (STS) issues scoped, signed mandates; the Gateway enforces protected-resource access; the Coordinator tracks agent and delegation state; and the Audit service preserves tamper-evident evidence. The Community Edition is licensed under Apache 2.0 and is designed to be self-hosted. The data Caracal handles, where encryption occurs, how keys and tokens are isolated, and how trust boundaries work are covered in [Review the Threat Model](/security/threat-model/). Start there before a deeper review. ## Security Control Summary | Area | Control | Status | | --- | --- | --- | | Static analysis | CodeQL (security-extended) for Go, Python, TypeScript | Implemented | | Dependency vulnerabilities | OSV-Scanner sweep across npm, Go, Python | Implemented | | Dependency vulnerabilities | Per-pull-request dependency review before merge | Implemented | | License compliance | Copyleft license deny-list on introduced dependencies | Implemented | | Dependency updates | Dependabot for actions, npm, Go, pip, Docker | Implemented | | Container CVEs | Trivy scan of published images (Critical/High) | Implemented | | Container hardening | Distroless, non-root, pinned base digests | Implemented | | Release signing | Sigstore keyless build provenance attestations | Implemented | | SBOM | SBOM and provenance attached to every image | Implemented | | Reproducible builds | Archives rebuilt and byte-compared in CI | Implemented | | Release verification | Post-release attestation and provenance checks | Implemented | | Supply-chain posture | OpenSSF Scorecard, results in code scanning | Implemented | | Vulnerability disclosure | Private reporting policy and process | Implemented | | Threat model | Assets, boundaries, encryption, key isolation | Documented | | Incident response | Defined response process | Documented | | Branch protection | Protected `main`, required review | Documented | | Maintainer MFA | MFA required for maintainers and committers | Documented | | FIPS validation | FIPS-validated cryptographic modules | Not claimed | ## How Releases Are Signed Releases are signed with [GitHub Artifact Attestations](https://docs.github.com/en/actions/security-guides/using-artifact-attestations-to-establish-provenance-for-builds), built on the [Sigstore](https://www.sigstore.dev/) keyless model. Signatures are produced in the release workflow using short-lived keys issued to the workflow's OpenID Connect identity, so no long-lived signing key is stored on any distribution server. The trust root is Sigstore's transparency log: verification confirms that an artifact was built by the `Garudex-Labs/caracal` release workflow. - CLI and Console archives are checksummed in `SHA256SUMS` and carry build-provenance attestations. - npm packages publish with provenance. - Container images carry both provenance and SBOM attestations. Verification steps for archives, images, and packages are in [Verify a Release](/security/verify-releases/). The bundled installers run `gh attestation verify` automatically. ## How SBOMs Are Generated Every container image is built with SBOM and provenance generation enabled in the release pipeline. The SBOM is attached to the image as an attestation in the registry and can be inspected with the registry tooling: ```sh docker buildx imagetools inspect ghcr.io/garudex-labs/caracal-: \ --format '{{ json .SBOM }}' ``` For a source-level dependency inventory across all ecosystems, the weekly OSV-Scanner workflow enumerates and checks the npm, Go, and Python dependency graphs from their lockfiles. ## How Vulnerabilities Are Handled - **Reporting.** Vulnerabilities are reported privately through GitHub Security Advisories or email, never as public issues. The full policy, required report format, and response timeline are in [Report a Vulnerability](/security/disclosure/). - **Detection.** CodeQL (security-extended) runs on pull requests and on a schedule. OSV-Scanner runs weekly across every dependency ecosystem and reports into GitHub code scanning. Trivy scans published container images weekly for Critical and High CVEs. - **Triage.** Findings surface in the repository's code scanning view as a single consolidated feed, so maintainers and evaluators review one source instead of disparate scan logs. - **Response.** Confirmed incidents follow the documented incident response process, and fixes ship through the standard signed release pipeline. ## How Dependencies Are Reviewed - **Automated gate.** A scheduled dependency review runs every two days over the changes merged to `main` in that window and fails on newly introduced High-or-higher vulnerabilities and on a copyleft license deny-list. It is skipped automatically when no commits landed in the window. - **Updates.** Dependabot opens grouped weekly update pull requests for GitHub Actions, npm, Go modules, pip, and Docker base images. Updates pass the same CI, dependency review, and code review as any other change. - **Pinning.** GitHub Actions are pinned to commit SHAs, container base images are pinned to digests, and language dependencies are resolved from committed lockfiles (`pnpm-lock.yaml`, `go.sum`, Python `*.lock`). Reviewer guidance for dependency changes is in the repository [contributing guide](https://github.com/Garudex-Labs/caracal/blob/main/.github/CONTRIBUTING.md). ## How Maintainers Enforce Controls The following are **Documented** expectations for the project's maintainer organization. They depend on GitHub organization and repository settings rather than repository files, so they are stated as policy rather than asserted as independently verifiable from the source tree: - Multi-factor authentication is required for every maintainer and anyone with commit access. - The default branch `main` is protected: direct pushes are disallowed and changes land through reviewed pull requests. - Contributor pull requests require at least one approving maintainer review and passing CI; authors do not approve or merge their own changes. - Release and publish workflows run from protected CI environments with minimal, scoped permissions. OpenSSF Scorecard runs weekly and publishes its results, giving an external, repeatable signal for branch protection, token permissions, pinned dependencies, and other repository health checks. ## How Container Images Are Hardened All service images are built to run safely with least privilege: - **Minimal base.** Final stages use Google distroless images pinned by digest, with no shell or package manager in the runtime layer. - **Non-root.** Images run as an unprivileged user. - **Read-only root filesystem.** Images are designed to run with a read-only root filesystem; only declared state directories under `/var/lib/caracal` need to be writable. - **Dropped capabilities.** Images require no added Linux capabilities; deployments should drop all capabilities and set `no-new-privileges`. - **Reproducible, multi-stage builds.** Build and runtime stages are separated so build tooling never ships in the runtime image. Recommended Kubernetes `securityContext` and runtime hardening settings are in [Harden Security Posture](/security/hardening/). When a base image carries a CVE that upstream has not yet rebuilt to fix, and no newer pinned digest is available, the finding is tracked in a time-boxed `.trivyignore.yaml` entry with a justification and an expiry date. The suppression expires automatically so the finding resurfaces for re-evaluation once a fixed base image is published, keeping the weekly container scan free of un-actionable noise without losing the audit trail. ## How the Project Approaches Supply-Chain Trust Caracal aligns with the SLSA build-provenance model and OpenSSF best practices: - One clean source of truth per signal: code scanning for vulnerabilities and static analysis, Scorecard for repository health, attestations for build provenance. Scans are scheduled rather than duplicated per push to avoid noise. - Build provenance is generated and verifiable for every released artifact. - Releases are reproducible: the pipeline rebuilds archives and byte-compares them against the published artifacts before signing. - Dependencies are pinned, updated on a fixed cadence, and gated on review. ## Regulated-Enterprise and FIPS Readiness This section states clearly what is and is not supported for regulated customers. - **Cryptography in use.** Caracal signs mandates and audit evidence and uses HMAC for stream and audit integrity. Cryptographic operations rely on the standard libraries of the Go, Node.js, and Python runtimes used to build and run the services. - **FIPS status — Not claimed.** The project does **not** claim FIPS 140-2/140-3 validation. The default release artifacts are not built against FIPS-validated cryptographic modules, and no FIPS self-test or certificate is asserted. - **What regulated customers can do today.** Because services are distributed as container images, an operator who requires FIPS-mode cryptography can run them on a FIPS-validated host or base platform and supply a FIPS-validated runtime where their compliance program requires it. This is the operator's responsibility and is **not** verified by the project. - **Encryption boundaries.** Transport security (TLS) is terminated and configured by the operator; see [Harden Production](/operations/tls-hardening/). Key and secret isolation is described in [Rotate Keys and Secrets](/operations/key-management/) and the threat model. - **Planned.** A documented, tested FIPS-mode build path is a candidate for a future release and is currently marked **Planned**, not available. ## What an Enterprise Team Should Review 1. [Review the Threat Model](/security/threat-model/) — assets, trust boundaries, encryption, and key and token isolation. 2. [Report a Vulnerability](/security/disclosure/) — the vulnerability disclosure policy and response timeline. 3. The repository's **OpenSSF Scorecard** result and **code scanning** alerts for current, independent security signals. 4. [Verify a Release](/security/verify-releases/) — confirm you can validate signatures, provenance, and SBOMs for the artifacts you intend to run. 5. [Harden Security Posture](/security/hardening/) — the production hardening checklist, including container `securityContext`. 6. The [Regulated-Enterprise and FIPS Readiness](#regulated-enterprise-and-fips-readiness) section if you operate under FIPS or similar mandates. ## Related Pages - [Compliance and Operations Readiness](/enterprise/compliance-readiness/) - [Secure Caracal](/security/) - [Verify a Release](/security/verify-releases/) - [Harden Security Posture](/security/hardening/) - [Report a Vulnerability](/security/disclosure/) - [Compare Editions](/enterprise/) --- # Generate an Evidence Pack # URL: https://docs.caracal.run/security/evidence-pack/ # Markdown: https://docs.caracal.run/markdown/security/evidence-pack.md # Type: workflow # Concepts: # Requires: --- A distributed topology is not the same as a large attack surface. During evaluation that distinction is easy to lose: reviewers see several services, Postgres, and Redis, and assume proportional risk. The evidence pack closes that gap by turning the architecture into demonstrated assurance — a single, reproducible bundle a security or compliance reviewer can run, read, and attach to an approval record. The pack does not add new trust claims. It runs the verifiers Caracal already ships and captures their output in one place, mapped to the written assurance case in [`governance/THREAT_MODEL.md`](https://github.com/Garudex-Labs/caracal/blob/main/governance/THREAT_MODEL.md). ## What It Proves | Pillar | Check | Demonstrates | | --- | --- | --- | | Supply chain | Release provenance | Every image you run was built by the `Garudex-Labs/caracal` release workflow and carries provenance and SBOM attestations. | | Enforcement (data) | Schema enforcement | Row-level security is fail-closed, audit is append-only, and policy versions are immutable — enforced by the database independent of application code. | | Enforcement (runtime) | Runtime readiness | Every mediation point (API, STS, Gateway, Audit, Coordinator) is live and answering readiness. | | Assurance case | Threat model | The written threat model the checks above exercise, captured with the pack. | ## Generate the Pack Run the generator against a deployment. Each check runs only when its inputs are present, so the pack is useful at any stage of evaluation and records which checks were skipped and why. ```bash # Supply-chain provenance (release tag + gh CLI authenticated) export CARACAL_VERSION= # Schema enforcement (point at the running database) export PGHOST= PGPORT=5432 PGUSER=caracal PGDATABASE=caracal export PGPASSWORD= # Runtime readiness (defaults to 127.0.0.1) export CARACAL_SMOKE_HOST= bash infra/scripts/evidencePack.sh ``` The pack is written to `evidence/caracal-evidence-/`: - `REPORT.md` — summary table and per-check narrative with status and links. - `raw/` — the unmodified output of each verifier, including a copy of the threat model. The generator exits non-zero if any executed check fails, so it can also gate CI or a release-readiness review. Skipped checks do not fail the run. ## Inputs | Variable | Used by | Effect when unset | | --- | --- | --- | | `CARACAL_VERSION` | Release provenance | Provenance check is skipped. | | `CARACAL_REGISTRY` | Release provenance | Defaults to `ghcr.io/garudex-labs`. | | `PGPASSWORD` (+ `PG*`) | Schema enforcement | Schema check is skipped. | | `CARACAL_SMOKE_HOST` | Runtime readiness | Defaults to `127.0.0.1`. | | `OUT_DIR` | All | Defaults to `./evidence`. | Generated packs are ignored by Git. Treat a completed pack as evidence: store it with the change record or compliance artifact it supports. ## Related Pages - [Verify a Release](/security/verify-releases/) - [Review the Threat Model](/security/threat-model/) - [Enterprise Security Readiness](/security/enterprise-readiness/) - [Export Audit Evidence](/operations/compliance-audit-integration/) --- # Verify a Release # URL: https://docs.caracal.run/security/verify-releases/ # Markdown: https://docs.caracal.run/markdown/security/verify-releases.md # Type: reference # Concepts: # Requires: --- Every published Caracal release is cryptographically signed and verifiable. Verify a download before you install or deploy it, especially when you fetch archives or images directly instead of through the installer. ## How Releases Are Signed Caracal signs releases with [GitHub Artifact Attestations](https://docs.github.com/en/actions/security-guides/using-artifact-attestations-to-establish-provenance-for-builds), built on the [Sigstore](https://www.sigstore.dev/) keyless model: - Signatures are produced in the release workflow with short-lived keys issued to the workflow's OpenID Connect identity. No long-lived private signing key exists, so no signing key is stored on the servers that distribute the artifacts. - The public trust root is Sigstore's transparency log. The verifier confirms the artifact was built by the `Garudex-Labs/caracal` release workflow rather than checking a static public key you download separately. - Release archives are additionally checksummed in `SHA256SUMS`, npm packages publish with provenance, and container images carry provenance and SBOM attestations. ## Prerequisites Install the GitHub CLI and authenticate it: ```sh gh auth login ``` `gh attestation verify` needs network access to the Sigstore transparency log on first use. ## Verify CLI and Console Archives Download the archives, `SHA256SUMS`, and installers from the release page, then verify checksums and provenance: ```sh # 1. Confirm archive integrity against the published checksums. sha256sum --check SHA256SUMS # 2. Confirm each archive was built by the Caracal release workflow. gh attestation verify caracal-runtime-*.tar.gz --repo Garudex-Labs/caracal gh attestation verify caracal-console-*.tar.gz --repo Garudex-Labs/caracal ``` `gh attestation verify` exits non-zero and prints the failing policy if the signature, identity, or transparency-log entry does not match. The bundled installers run the same `gh attestation verify ... --repo Garudex-Labs/caracal` check automatically; pass `--no-verify-provenance` (Unix) or `-NoVerifyProvenance` (Windows) only when you must skip it. ## Verify Container Images Pull the image by digest, then verify its build provenance: ```sh gh attestation verify oci://ghcr.io/garudex-labs/caracal-go:vYYYY.MM.DD \ --repo Garudex-Labs/caracal ``` Inspect the attached SBOM and provenance attestations with the registry tooling: ```sh docker buildx imagetools inspect ghcr.io/garudex-labs/caracal-go:vYYYY.MM.DD \ --format '{{ json .Provenance }}' ``` ## Verify Published Packages npm packages publish with provenance; verify before installing: ```sh npm audit signatures ``` ## If Verification Fails Do not install or run the artifact. Re-download from the official release page, confirm you used the correct release tag, and if verification still fails, [report it privately](/security/disclosure/). ## Related Pages - [Install Caracal](/get-started/install-caracal/) - [Harden Security Posture](/security/hardening/) - [Review the Threat Model](/security/threat-model/) --- # Vlogs # URL: https://docs.caracal.run/vlogs/ # Markdown: https://docs.caracal.run/markdown/vlogs.md # Type: page # Concepts: # Requires: --- import VlogsDirectory from '../../components/VlogsDirectory.astro' import LandingFooter from '../../components/LandingFooter.astro'