---
title: "Debug Infrastructure Issues"
url: "https://docs.caracal.run/operations/debugging/"
markdown_url: "https://docs.caracal.run/markdown/operations/debugging.md"
description: "Diagnose Caracal deployment, configuration, token exchange, Gateway, audit, and delegation issues."
page_type: "workflow"
concepts: []
requires: []
---

# Debug Infrastructure Issues

Canonical URL: https://docs.caracal.run/operations/debugging/
Markdown URL: https://docs.caracal.run/markdown/operations/debugging.md
Description: Diagnose Caracal deployment, configuration, token exchange, Gateway, audit, and delegation issues.
Page type: workflow
Concepts: none
Requires: none

---

Debug from the boundary inward: runtime lifecycle, readiness, storage, streams, service config, then request-specific audit evidence.

For an application-level failure where you have an error or request ID but not the failing surface, start from the symptom-first [Troubleshoot by Symptom](/operations/troubleshooting/) instead.

## Triage Flow

```mermaid
flowchart TD
  Symptom[Symptom] --> Ready[caracal status --ready or Kubernetes readiness]
  Ready --> Storage[Postgres and Redis]
  Storage --> Streams[Redis streams and outboxes]
  Streams --> Service[Service logs and metrics]
  Service --> Audit[Audit/explain request evidence]
  Audit --> Fix[Apply focused remediation]
```

## Commands

| Environment | Commands |
| --- | --- |
| Local | `caracal status --ready`, `docker compose ps`, `docker compose logs <service>` |
| Helm | `kubectl -n caracal get pods,svc,jobs`, `kubectl -n caracal describe pod <pod>`, `kubectl -n caracal logs <pod>` |
| Storage | `infra/postgres/scripts/validateMigrations.sh`, `infra/redis/scripts/verify.sh` |
| App-level | Console `diagnostics`, `audit`, and `request trace` |

## Common Cases

| Symptom | Likely area |
| --- | --- |
| `401` or `403` from API | Admin token, scope, Control token, or workload credential source. |
| STS exchange fails | Zone ID, application ID, grant, policy, client secret, step-up, or STS readiness. |
| Gateway fails before upstream | Mandate verification, STS exchange, binding, revocation snapshot, or upstream allowlist. |
| Agent views fail | Coordinator URL/token, selected zone, or Coordinator readiness. |
| Audit event missing | Redis stream health, Audit readiness, DLQ, replay backlog, or request never reaching protected boundary. |

## Request Investigation

1. Capture request ID, zone ID, subject, resource, and timestamp.
2. Open Console `audit` or `request trace`.
3. Confirm policy decision, scopes, target resource, session, agent session, and delegation edge.
4. Compare token claims with resource-server verifier settings.
5. Check revocation and step-up state when authority appears valid but access fails.

## Escalation Bundle

Include readiness output, relevant logs, service versions, Helm values diff, Redis stream status, Postgres migration status, request ID, audit explanation, and any recent secret or policy changes.

## Next Step

Use [Recover from Failures](/operations/failure-modes/) when infrastructure debugging identifies a degraded dependency, service, stream, or safety guarantee.
