Rate Limits
Adjudon runs four independent rate-limit layers on every request, each with its own purpose. There is no single "rate limit" number to quote. There is the per-IP API limiter that protects the platform from a single misconfigured customer. There is the per-user dashboard limiter that protects an org from a single misbehaving browser tab. There is the per-agent ingestion limiter that the operator configures per agent on the agent's policy. And there is the auth-endpoint limiter that protects login from credential- stuffing.
This page explains each layer, what triggers a 429, what the SDK
should do when one fires, and the small list of routes that have
their own dedicated limiter on top of the shared ones. The
integration surface is not separately documented — rate limits
are observed at runtime, not configured via API. Per-agent limits
are configurable, via the Agents API.
The four layers
Every public request passes through at least two of these in order:
the IP-keyed apiLimiter always runs; the user-keyed
userApiLimiter runs on JWT-authenticated routes; the per-agent
limiter runs on POST /traces when authenticated with an agent API
key.
| Layer | Window | Limit | Key | Where it runs |
|---|---|---|---|---|
apiLimiter | 15 min | 1,000 | client IP | every /api/v1/* route |
userApiLimiter | 15 min | 600 | user:<jwt.id> (falls back to IP) | dashboard / JWT-authenticated routes |
authLimiter | 15 min | 100 | client IP | /api/v1/auth/* only |
agentRateLimit | 1 min | configurable per agent | agent ObjectId | POST /traces when req.agent is set |
Numbers are exact — lifted from backend/server.js:201-261
and backend/middleware/agentRateLimit.js:9.
A single SDK retry never trips a limiter on its own; a tight loop or a runaway worker does. The 600/user/15min budget on the dashboard side translates to roughly 40 req/min sustained, which is generous for hand-driven UI work and tight on accidental polling loops.
The apiLimiter and userApiLimiter stack on JWT-authenticated
dashboard routes — the IP limiter still applies, the user
limiter applies on top. A single user behind a shared NAT can
exhaust the IP budget without ever hitting their own user budget;
the API will return 429 RATE_LIMIT_EXCEEDED and the response
honours the same envelope as every other error.
What 429 looks like
Every limiter returns the same JSON shape via apiResponse.fail():
{
"success": false,
"error": "Too many requests from this IP, please try again after 15 minutes",
"code": "RATE_LIMIT_EXCEEDED"
}
The error string varies per limiter (auth says "authentication
attempts"; the per-agent limiter includes the configured limit and
the 60-second window in a data envelope). The code is always
RATE_LIMIT_EXCEEDED. Treat the code as the machine-readable
contract; the string is for log diagnostics.
The standard RateLimit-* headers (RateLimit-Limit,
RateLimit-Remaining, RateLimit-Reset) ride along on every
response per the express-rate-limit defaults. Reading them is the
correct way to plan retry timing client-side.
The per-agent limiter
agentRateLimit is the only configurable limiter. It runs after
requireApiKey on the trace-ingestion path and reads
agent.policies.maxRequestsPerMinute from the Agent record. If the
field is null, 0, or negative, the limiter short-circuits and
the request passes through — per-agent throttling is opt-in.
Setting it to 120 enforces 120 requests/min per agent, with a
fixed-window counter that resets every 60 seconds.
┌─────────────────────────────────────────────────────────────┐
│ agentRateLimit — fixed-window counter │
├──────────────────────────────────────────────── ─────────────┤
│ │
│ per-agent map: agentId → { count, windowStart } │
│ │
│ on request: │
│ if (now - windowStart >= 60s) │
│ { count = 0; windowStart = now } │
│ count += 1 │
│ if (count > limit) → 429 RATE_LIMIT_EXCEEDED │
│ │
│ periodic prune every 5 min → │
│ drop entries whose windowStart > 60s ago │
│ │
│ storage: in-process Map() — NOT shared across server │
│ instances (acceptable: per-instance budget is the │
│ minimum effective rate) │
└─────────────────────────────────────────────────────────────┘
The in-process Map() is deliberate. A Redis-backed limiter would
add a network hop to the trace-ingestion p95 budget; the in-process
counter is microseconds. The trade-off is that with multiple
backend instances behind the load-balancer, an agent can in theory
exceed maxRequestsPerMinute by a factor of N (instances). For
typical Adjudon deployments (1-3 backend instances on Fly.io
Frankfurt) this is a known approximation; for stricter enforcement
contact support to switch the deployment to a centralised store.
Dedicated limiters on a few routes
Two routes carry an additional limiter on top of the shared ones:
- CPI compliance reports (
/api/v1/cpi/report) — capped at 10/hour per org viacpiRateLimitbecause PDF generation is expensive. Past that, the dashboard recommends Scheduled Reports for higher volume. - Onboarding endpoints (
/api/v1/onboarding/*) — capped viaonboardingRateLimitto defend the public-facing wizard flows from form-spam. The exact numbers are documented inline in the middleware. - Demo-request form (public, unauthenticated) — capped at 5/hour per IP; authenticated admin requests bypass the cap.
These dedicated limiters return the same 429 RATE_LIMIT_EXCEEDED
envelope.
What clients should do
When a 429 returns, the SDK and any direct HTTP client should:
- Read
RateLimit-Resetand wait that long before retrying. - If
RateLimit-Resetis absent, fall back to exponential backoff:1s, 2s, 4s, 8s, 16scapped at 60 seconds. - Do not retry indefinitely. A trace that has been retried five
times with backoff and still returns
429is signalling that the per-agent limit is too low for the workload — raise it via the Agents API or break up the workload. - Respect idempotency. Retries against the same trace payload share the auto-generated key; the Idempotency layer collapses duplicates into a single ingestion.
Adjudon's official SDKs implement this loop natively; direct HTTP clients (curl scripts, hand-rolled integrations) should mirror it.
What this is NOT
- Not a unified limiter. Different routes have different budgets; quoting "the rate limit" without specifying which layer is wrong. Read all four layers above.
- Not a token bucket today. The shared limiters are fixed-window
counters via
express-rate-limit; the per-agent limiter is a fixed-window counter on an in-process Map. Bursts that fit inside a single window go through; bursts that span window boundaries benefit from the reset. This is documented exactly because some customers expect bucket-style smoothing — that is a roadmap item, not a current behaviour. - Not a paywall. Hitting
429means slow down; it does not unlock by upgrading the plan. The per-agent limit is what the operator chooses to set on their Agent; the IP/user budgets are platform-wide protection, not plan-tiered features. - Not configurable per-route by the customer. Operators configure the per-agent limit; the IP, user, and auth limits are Adjudon-internal defences and not exposed for editing.
Performance posture
The shared limiters add a single in-memory increment per request
(by default; the express-rate-limit package's MemoryStore is the
default for Node.js single-instance deployments and the
MemoryStore reset interval is bounded). The per-agent limiter
adds one Map.get/Map.set pair plus an O(active-agents) prune
every five minutes. None of these layers are on a path that touches
external storage; rate-limit checks fit inside the published
p95 < 25 ms budget on POST /traces with a comfortable margin.
See also
- Idempotency — the sibling layer
that absorbs the retries a
429triggers - Performance SLOs — the latency budget these layers respect
- Agents API — where
policies.maxRequestsPerMinuteis configured per agent - Error Codes — the
RATE_LIMIT_EXCEEDEDcode in the broader taxonomy - POST /traces — the most-throttled hot path