Skip to main content

Rate Limits

Adjudon runs four independent rate-limit layers on every request, each with its own purpose. There is no single "rate limit" number to quote. There is the per-IP API limiter that protects the platform from a single misconfigured customer. There is the per-user dashboard limiter that protects an org from a single misbehaving browser tab. There is the per-agent ingestion limiter that the operator configures per agent on the agent's policy. And there is the auth-endpoint limiter that protects login from credential- stuffing.

This page explains each layer, what triggers a 429, what the SDK should do when one fires, and the small list of routes that have their own dedicated limiter on top of the shared ones. The integration surface is not separately documented — rate limits are observed at runtime, not configured via API. Per-agent limits are configurable, via the Agents API.

The four layers

Every public request passes through at least two of these in order: the IP-keyed apiLimiter always runs; the user-keyed userApiLimiter runs on JWT-authenticated routes; the per-agent limiter runs on POST /traces when authenticated with an agent API key.

LayerWindowLimitKeyWhere it runs
apiLimiter15 min1,000client IPevery /api/v1/* route
userApiLimiter15 min600user:<jwt.id> (falls back to IP)dashboard / JWT-authenticated routes
authLimiter15 min100client IP/api/v1/auth/* only
agentRateLimit1 minconfigurable per agentagent ObjectIdPOST /traces when req.agent is set

Numbers are exact — lifted from backend/server.js:201-261 and backend/middleware/agentRateLimit.js:9.

A single SDK retry never trips a limiter on its own; a tight loop or a runaway worker does. The 600/user/15min budget on the dashboard side translates to roughly 40 req/min sustained, which is generous for hand-driven UI work and tight on accidental polling loops.

The apiLimiter and userApiLimiter stack on JWT-authenticated dashboard routes — the IP limiter still applies, the user limiter applies on top. A single user behind a shared NAT can exhaust the IP budget without ever hitting their own user budget; the API will return 429 RATE_LIMIT_EXCEEDED and the response honours the same envelope as every other error.

What 429 looks like

Every limiter returns the same JSON shape via apiResponse.fail():

{
"success": false,
"error": "Too many requests from this IP, please try again after 15 minutes",
"code": "RATE_LIMIT_EXCEEDED"
}

The error string varies per limiter (auth says "authentication attempts"; the per-agent limiter includes the configured limit and the 60-second window in a data envelope). The code is always RATE_LIMIT_EXCEEDED. Treat the code as the machine-readable contract; the string is for log diagnostics.

The standard RateLimit-* headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset) ride along on every response per the express-rate-limit defaults. Reading them is the correct way to plan retry timing client-side.

The per-agent limiter

agentRateLimit is the only configurable limiter. It runs after requireApiKey on the trace-ingestion path and reads agent.policies.maxRequestsPerMinute from the Agent record. If the field is null, 0, or negative, the limiter short-circuits and the request passes through — per-agent throttling is opt-in. Setting it to 120 enforces 120 requests/min per agent, with a fixed-window counter that resets every 60 seconds.

   ┌─────────────────────────────────────────────────────────────┐
│ agentRateLimit — fixed-window counter │
├─────────────────────────────────────────────────────────────┤
│ │
│ per-agent map: agentId → { count, windowStart } │
│ │
│ on request: │
│ if (now - windowStart >= 60s) │
│ { count = 0; windowStart = now } │
│ count += 1 │
│ if (count > limit) → 429 RATE_LIMIT_EXCEEDED │
│ │
│ periodic prune every 5 min → │
│ drop entries whose windowStart > 60s ago │
│ │
│ storage: in-process Map() — NOT shared across server │
│ instances (acceptable: per-instance budget is the │
│ minimum effective rate) │
└─────────────────────────────────────────────────────────────┘

The in-process Map() is deliberate. A Redis-backed limiter would add a network hop to the trace-ingestion p95 budget; the in-process counter is microseconds. The trade-off is that with multiple backend instances behind the load-balancer, an agent can in theory exceed maxRequestsPerMinute by a factor of N (instances). For typical Adjudon deployments (1-3 backend instances on Fly.io Frankfurt) this is a known approximation; for stricter enforcement contact support to switch the deployment to a centralised store.

Dedicated limiters on a few routes

Two routes carry an additional limiter on top of the shared ones:

  • CPI compliance reports (/api/v1/cpi/report) — capped at 10/hour per org via cpiRateLimit because PDF generation is expensive. Past that, the dashboard recommends Scheduled Reports for higher volume.
  • Onboarding endpoints (/api/v1/onboarding/*) — capped via onboardingRateLimit to defend the public-facing wizard flows from form-spam. The exact numbers are documented inline in the middleware.
  • Demo-request form (public, unauthenticated) — capped at 5/hour per IP; authenticated admin requests bypass the cap.

These dedicated limiters return the same 429 RATE_LIMIT_EXCEEDED envelope.

What clients should do

When a 429 returns, the SDK and any direct HTTP client should:

  1. Read RateLimit-Reset and wait that long before retrying.
  2. If RateLimit-Reset is absent, fall back to exponential backoff: 1s, 2s, 4s, 8s, 16s capped at 60 seconds.
  3. Do not retry indefinitely. A trace that has been retried five times with backoff and still returns 429 is signalling that the per-agent limit is too low for the workload — raise it via the Agents API or break up the workload.
  4. Respect idempotency. Retries against the same trace payload share the auto-generated key; the Idempotency layer collapses duplicates into a single ingestion.

Adjudon's official SDKs implement this loop natively; direct HTTP clients (curl scripts, hand-rolled integrations) should mirror it.

What this is NOT

  • Not a unified limiter. Different routes have different budgets; quoting "the rate limit" without specifying which layer is wrong. Read all four layers above.
  • Not a token bucket today. The shared limiters are fixed-window counters via express-rate-limit; the per-agent limiter is a fixed-window counter on an in-process Map. Bursts that fit inside a single window go through; bursts that span window boundaries benefit from the reset. This is documented exactly because some customers expect bucket-style smoothing — that is a roadmap item, not a current behaviour.
  • Not a paywall. Hitting 429 means slow down; it does not unlock by upgrading the plan. The per-agent limit is what the operator chooses to set on their Agent; the IP/user budgets are platform-wide protection, not plan-tiered features.
  • Not configurable per-route by the customer. Operators configure the per-agent limit; the IP, user, and auth limits are Adjudon-internal defences and not exposed for editing.

Performance posture

The shared limiters add a single in-memory increment per request (by default; the express-rate-limit package's MemoryStore is the default for Node.js single-instance deployments and the MemoryStore reset interval is bounded). The per-agent limiter adds one Map.get/Map.set pair plus an O(active-agents) prune every five minutes. None of these layers are on a path that touches external storage; rate-limit checks fit inside the published p95 < 25 ms budget on POST /traces with a comfortable margin.

See also

  • Idempotency — the sibling layer that absorbs the retries a 429 triggers
  • Performance SLOs — the latency budget these layers respect
  • Agents API — where policies.maxRequestsPerMinute is configured per agent
  • Error Codes — the RATE_LIMIT_EXCEEDED code in the broader taxonomy
  • POST /traces — the most-throttled hot path