Performance & SLOs
A latency budget is a published number. A latency contract is the
same number with a regulator-readable consequence attached. Adjudon
publishes both for POST /traces — the only hot-path
endpoint where every customer agent waits on a synchronous response
— and the small set of SLAs that govern uptime and fail-open
behaviour. This page lays out the full set of performance
commitments, where they apply, and where they explicitly do not.
The latency contract is a Cardinal Rule on the codebase
(CLAUDE.md § Performance-Verträge); it is what every change
to the trace-ingestion pipeline is measured against.
The trace-ingestion latency contract
POST /traces resolves synchronously: PII scrub, Confidence Engine
triangulation, Policy Engine evaluation, Audit-log write, durable
persistence, then return. Anything that has to flow through the
agent's blocking loop happens inside this budget; anything that
can fire-and-forget (webhook dispatch, alert engine, vector index,
hash-chain append) runs after the response is sent.
| Percentile | Target |
|---|---|
| p50 | < 10 ms |
| p95 | < 25 ms |
| p99 | < 45 ms |
The numbers are end-to-end from request arrival to response
emission, measured in the Frankfurt eu-central-1 region. They
include the Confidence Engine's three-pillar triangulation and the
Policy Engine's Policy.find().sort('priority').lean() against
the compound index. They do not include round-trip from the
customer's data centre to Frankfurt — if your agent runs in
us-east-1, the wall-clock latency adds the trans-Atlantic RTT on
top of the 25 ms p95.
If a change to the trace path raises the measured p95, the change does not ship. This is a hard gate, not a guideline.
What is allowed inside the budget
Every line of code that runs synchronously inside POST /traces
respects three rules:
- No external HTTP call — no Slack post, no webhook send, no Stripe meter event, no OpenAI embedding read on the hot path. The Confidence Engine's vector-similarity step runs against the org's local vector memory; if that step's embedding service times out the historical pillar falls back to 0.5 and the trace continues. (See traces-and-confidence.)
- No blocking DB write without a timeout. Audit-log writes are async; durable trace persistence is the only synchronous write and it touches a single collection on a single index.
- Anything > 5 ms is measurable. If a sub-step accumulates more than 5 ms it has to be observable in the metrics, otherwise the budget cannot be defended when something regresses.
Everything that does not fit those rules runs after the response. The Hash Chain append, the alert engine, the webhook dispatch, and the vector index update are all fire-and-forget. A failure in any of them does not raise the customer-visible response.
The fail-open guarantee
The most important SLO is not a percentile. It is the contractual fail-open posture in § 5 of the Adjudon ToS:
If Adjudon is unavailable, the SDK pipeline falls back to pass-through — your agents will never be blocked by our availability.
If api.adjudon.com is down, returns a 5xx, or exceeds the SDK
client timeout, the SDK's default behaviour is to record the
decision locally for later reconciliation and let the customer's
agent proceed. Adjudon never becomes the reason a regulated
agent's transaction fails. This is what makes the strict-gate
posture defensible: the audit-trail is strict; the dependency is
not.
The corollary: a customer who wants hard-fail behaviour (e.g.
because their internal compliance posture says "if the audit
layer is down, do not transact") configures the SDK with
failOpen: false. The default is fail-open and is what the
contract above describes.
Uptime SLAs by plan
| Plan | Availability | Status |
|---|---|---|
| Sandbox | — | No SLA; best-effort on the hard-blocked tier |
| Scale | 99.9 % | Live |
| Governance | 99.9 % | Live |
| Enterprise | 99.99 % | Roadmap (target Q4 2026) |
| Custom | 99.99 % | Roadmap (target Q4 2026) |
The 99.9 % SLA on Scale and Governance is measured monthly on
the public api.adjudon.com endpoint, excludes scheduled
maintenance windows announced ≥ 7 days in advance, and is
honoured today. The 99.99 % SLA on Enterprise and Custom is
on the published roadmap and is not contractually live yet
(backend/data/legal/en/tos.md:30 reads "on roadmap"). Customers
on Enterprise plans receive 99.9 % until the higher tier ships;
the upgrade carries no rate-card change.
The health endpoint
GET /health is the operational read for liveness and is held to
its own latency contract:
GET /healthmust always answer in < 100 ms regardless of database load.
This is the endpoint your load balancer reads, your status-page
poller reads, and your on-call alert reads. It is bypassed by every
rate-limit layer and returns a { status, version, region }
envelope. It does not return any Adjudon data — it is a
liveness probe, not an introspection endpoint.
Frontend SLOs
The Adjudon dashboard at app.adjudon.com is held to two
front-end performance budgets:
| Metric | Target |
|---|---|
| First Contentful Paint | < 1.5 s |
| Time to Interactive | < 3 s |
These are measured at p95 on the dashboard's auth + workspace selection load, on a fresh session, with a cold cache. The budgets shape three implementation rules: lazy-load every page, virtualise lists with > 1,000 items (Review Queue, Audit Log), and run audit exports as background jobs rather than blocking HTTP requests.
What this is NOT
- Not a per-trace correctness guarantee. The latency budget protects the customer's agent loop from Adjudon's dependencies. Whether the policy engine returned the right verdict on a given trace is governed by the Policies & Review contract, not the latency contract.
- Not a multi-region distribution. Adjudon is single-region Frankfurt by design; the latency contract is for clients with EU connectivity. Customers in the Americas or APAC see trans-continental RTT on top — the contract is for the Adjudon stack only, not the network in front of it.
- Not a synchronous webhook delivery promise. Webhooks are fire-and-forget on a durable retry queue (1m / 5m / 30m / 2h / 8h). The synchronous trace response says only that the trace was ingested; downstream notifications are best-effort with durable retry.
- Not measured on every endpoint. The 10/25/45 ms numbers
apply to
POST /traces. Every other endpoint has its own latency profile — CRUD reads are typically sub-50 ms p95; CPI report generation is bounded but not budgeted; PDF exports run as background jobs.
Regulator mapping
| Regulator surface | What this concept satisfies |
|---|---|
| EU AI Act Art. 17(1)(g) | Robustness — documented latency budgets and uptime SLAs are part of the post-market monitoring evidence |
| DORA Art. 6 | ICT risk management framework — published SLAs and fail-open semantics are required documentation for the regulated entity's risk register |
| ISO 42001 § 8.4 | Operational planning & control — the budget rules above are operational controls under ISO 42001 |
| GDPR Art. 32 | Security of processing — availability is one of the three CIA pillars; the SLA documents the availability commitment |
See also
- Idempotency — the layer that protects the latency budget from retry storms
- Rate Limits — the sibling protection layer on the same hot path
- Sub-Processors — the Frankfurt-only data-residency constraint these SLOs run inside
- Versioning — the deprecation policy that protects the contract across releases
- POST /traces — the endpoint the latency contract describes