---
title: "Scale Capacity"
url: "https://docs.caracal.run/operations/scale-capacity/"
markdown_url: "https://docs.caracal.run/markdown/operations/scale-capacity.md"
description: "Size Caracal services, storage, and queues for production traffic."
page_type: "reference"
concepts: []
requires: []
---

# Scale Capacity

Canonical URL: https://docs.caracal.run/operations/scale-capacity/
Markdown URL: https://docs.caracal.run/markdown/operations/scale-capacity.md
Description: Size Caracal services, storage, and queues for production traffic.
Page type: reference
Concepts: none
Requires: none

---

Scale Caracal around three bottlenecks: token exchange and policy evaluation in STS, upstream proxying in Gateway, and durable writes in Postgres/Audit.

## Scaling Levers

| Component | Primary levers |
| --- | --- |
| API | Replicas, `DB_POOL_MAX`, outbox batch/interval settings, request rate limits. |
| STS | Replicas, OPA policy age, Redis invalidation health, `MAX_GRANT_TTL_SECONDS`, replay persistence. |
| Gateway | Replicas, `MAX_REQUEST_BYTES`, `STS_TIMEOUT`, `UPSTREAM_TIMEOUT`, upstream allowlist, revocation snapshot health. |
| Audit | Replicas, Postgres write capacity, DLQ thresholds, retention, S3 export settings. |
| Coordinator | Replicas, DB pool, sweeper intervals, outbox batch, service-agent leases. |
| Postgres | Connection limits, indexes, partitions, storage IOPS, backup windows. |
| Redis | Stream memory, pending entries, consumer lag, persistence, network latency. |

## Helm Defaults

The chart defaults to two replicas for API, STS, Gateway, Audit, and Coordinator. Gateway has a higher maximum HPA ceiling because protected traffic fans through it.

| Service | Default port | Default max HPA replicas |
| --- | --- | --- |
| API | `3000` | `8` |
| STS | `8080` | `8` |
| Gateway | `8081` | `16` |
| Audit | `9090` | `8` |
| Coordinator | `4000` | `8` |

## Capacity Signals

| Signal | Meaning |
| --- | --- |
| Postgres pool ratio near `0.9` | Service pool saturation; inspect queries and pool size. |
| Audit consumer lag | Audit ingestion cannot keep up with Redis stream input. |
| Audit replay backlog age | STS/Gateway cannot emit audit events to Redis/Audit promptly. |
| Gateway STS circuit open | Gateway is fast-failing because STS exchange is unhealthy. |
| Revocation propagation lag | Access-safety state is not reaching Gateway within the expected window. |
| Readiness flapping | Pods or dependencies are unstable under current load. |

## Scale Procedure

1. Identify the bottleneck from metrics and logs.
2. Scale stateless service replicas first when storage is healthy.
3. Increase Postgres and Redis capacity before raising service pools.
4. Verify readiness, lag, replay backlog, and DLQ after each change.
5. Document the new limit and alert threshold.

## Worked Escalation Patterns

| Signal | First action | Expected outcome |
| --- | --- | --- |
| Postgres pool ratio stays near `0.9` | Inspect slow queries, then raise storage capacity or service pool size one step. | `/ready` stabilizes and pool saturation falls before replicas increase further. |
| Gateway STS circuit opens | Check STS readiness and exchange latency, then scale STS or reduce Gateway fan-in. | Gateway stops fast-failing protected requests and audit evidence resumes. |
| Audit consumer lag grows | Check Postgres write IOPS, partition health, and Audit replicas before extending retention or exports. | Redis pending entries and DLQ growth stop increasing. |
| Revocation propagation lags | Check Redis latency, stream consumers, and Gateway snapshot freshness. | Gateway denial decisions reflect current revocation state. |

## Troubleshooting

| Symptom | First check |
| --- | --- |
| Higher replicas make failures worse | Postgres connection pressure or Redis latency. |
| Gateway latency spikes | STS exchange latency, upstream timeout, JWKS cache, and private upstream egress. |
| Audit cannot catch up | Postgres write IOPS, partition health, Audit replicas, and Redis pending entries. |

## Next Step

Use [Monitor Health and Metrics](/operations/observability/) to turn capacity signals into readiness gates and operator dashboards.
