Policies & Human Review

The Confidence Engine produces a number; the Policy Engine produces a verdict. A trace can score 0.95 and still be blocked because it matches a deny policy. A trace can score 0.42 and still resolve without human review because no policy fired and no review queue is configured. Confidence is what the trace looks like; policy is what the operator's compliance posture allows. This page explains how the Policy Engine evaluates a trace at ingestion time, how a verdict becomes either a 403 ADJ_BLOCKED_BY_POLICY or a 202 hold-for- review with a freshly-created ReviewItem, and where the line sits between the engine's authority and the reviewer's.

The integration surface is documented at POST /policies and POST /reviews; the input score this engine consumes is the one defined in traces-and-confidence.

What a policy is

A policy is a named, ordered, enabled rule that matches a trace by its fields and emits one or more actions. The schema (Policy) captures name, description, enabled, priority, an array of conditions, and an array of actions. Conditions match on a field (e.g. confidenceScore, status, agentId, humanOverride) using one of five operators: equals, contains, greater_than, less_than, regex. Multiple conditions chain using AND / OR logical operators in document order; the result is a single boolean — either the policy fires or it does not.

Actions are typed: block, flag_for_review, notify, approve. A policy can declare more than one action; the engine reads each action independently and accumulates verdicts across all matched policies before resolving.

Confidence-score scale auto-normalisation

confidenceScore is canonical 0–1 (float). For back-compat with older SDKs and policies that pass 0–100 (integer percent), the engine auto-normalises any confidenceScore value greater than 1 by dividing it by 100 — both on the policy condition value AND on the trace value. So a policy "block when confidenceScore < 70" is equivalent to "block when confidenceScore < 0.7". Pass values on the 0–1 scale to suppress a development-mode console warning.

A policy is workspace-scoped by default but the engine also matches org-wide policies (workspace null) against every trace. The compound index (organizationId, workspaceId, enabled) is what makes this evaluable inside the POST /traces p95 budget.

The evaluation order: block > flag > notify > approve

The Policy Engine evaluates every matching policy and then resolves to a single winning action. The priority is fixed:

   ┌─────────────────────────────────────────────────────────┐
   │                Policy Engine — verdict                   │
   ├─────────────────────────────────────────────────────────┤
   │                                                          │
   │   ┌───────────────────────────────────────────┐          │
   │   │  1.  ANY matched policy with action       │          │
   │   │      "block"          → BLOCK   (403)     │          │
   │   ├───────────────────────────────────────────┤          │
   │   │  2.  ELSE any matched policy with action  │          │
   │   │      "flag_for_review"→ HOLD    (202)     │          │
   │   ├───────────────────────────────────────────┤          │
   │   │  3.  ELSE any matched policy with action  │          │
   │   │      "notify"         → ALLOW + alert     │          │
   │   ├───────────────────────────────────────────┤          │
   │   │  4.  ELSE                                 │          │
   │   │                       → ALLOW   (201)     │          │
   │   └───────────────────────────────────────────┘          │
   │                                                          │
   │     priority = .sort('priority') ascending               │
   │     ties broken by document insertion order              │
   └─────────────────────────────────────────────────────────┘

A single block verdict among ten matched policies wins; the other nine matches are ignored for resolution but their matchCount and lastMatched fields are still updated. There is no AND-of-actions mode and no probabilistic mixing; the highest-severity action across all matched policies is the verdict, full stop.

Policies are sorted by priority ascending (Policy.find().sort('priority')). Within a priority tier, document insertion order resolves ties — a deterministic but non-semantic ordering. If you need stable resolution under a tie, write distinct priorities; the engine does not surface "two policies matched at priority 5" as ambiguous.

What happens when a verdict fires

The verdict is read at the end of the trace-ingestion pipeline, after PII scrubbing, after Confidence Engine triangulation, and before durable persistence:

block → the trace is saved with status: 'blocked', the hash chain appends, the alert engine fires, the trace.blocked webhook event dispatches, and the SDK returns 403 ADJ_BLOCKED_BY_POLICY with the matched policy's reason. The customer's agent receives a hard refusal; no decision flows to production.
flag_for_review → the trace is saved with status: 'flagged', a new ReviewItem is created from the trace (priority derived from confidence band, 24-hour SLA deadline by default), the trace.hold_for_review webhook event dispatches, and the SDK returns 202 with allowed: false, action: 'hold_for_review', reviewId. The customer's agent must wait for the human verdict.
notify → the trace is saved as success, allowed to flow, and the configured notify channel (Slack / email / webhook) receives a side-band alert. The decision goes through.
(no match) → the Confidence Engine's suggested status takes effect: success → saved & allowed; flagged / escalated → saved, ReviewItem auto-created on confidence alone (no policy required), SDK returns 202.

The order matters: a block policy can override a high-confidence trace, and a low-confidence trace can be auto-routed to review even without a policy.

The Review Queue

A ReviewItem is the unit of human work. Each item carries the trace's _id reference, an action string (the triggering condition or the policy reason), the confidence percentage at creation time, a derived priority (critical / high / medium / low), a status (pending → approved / rejected / escalated), the input context, and a slaDeadline set to now + 24 hours. Priority is derived deterministically:

Trace status / confidence	Derived priority
`status: 'escalated'`	`critical`
confidence < 65%	`critical`
confidence < 75%	`high`
confidence < 85%	`medium`
else	`low`

A reviewer with the reviewer or higher role opens the queue, picks an item, and resolves it — approve the agent's decision, reject it (the agent is told to fail closed), escalate it to a senior reviewer, or override it with a different decision. The verdict is appended to the audit log and to the SHA-256 hash chain alongside the original trace.

The 500-item hard cap on GET /review-queue is intentional: a queue that grows unbounded is no longer reviewable. If you have more than 500 pending items, your policy thresholds are too tight or your reviewer roster is too thin — both signals the dashboard surfaces directly on the queue header.

Auto-Approval: closing the loop

A trace that lands in the queue, gets approved by a human, and is followed by 49 more nearly-identical traces that get approved by humans is, eventually, evidence that a human is no longer needed for that pattern. The Auto-Approval Engine watches for that evidence: ≥50 observations, ≥95% approval rate, 4-eyes sign-off. Once active, a matching flag_for_review verdict resolves directly to approved without entering the queue.

Auto-Approval can never override a block. The schema enforces it (AutoApprovalDecision.priorStatus enum is flagged or escalated, not blocked); the engine reads the trace's pre- resolution status; a block verdict short-circuits the entire pipeline before auto-approval runs. See auto-approval for the maturation mechanics.

What this is NOT

Not a runtime LLM firewall. Adjudon does not scan prompt payloads for jailbreak strings or run a secondary LLM in the audit path. The Policy Engine matches structured trace fields (confidenceScore, status, agentId, humanOverride, arbitrary metadata) using deterministic operators — no LLM, no embedding, no learned classifier in the gate.
Not soft enforcement. A block verdict returns HTTP 403 synchronously, before the agent's decision is ever durable; there is no eventually-consistent block.
Not a substitute for human review. The maturation thresholds for Auto-Approval are deliberately conservative (≥50 / ≥95% / 4-eyes / 90-day re-validation). The default policy posture is human-in-the-loop; Auto-Approval is opt-in.
Not configurable severity ordering. block > flag > notify > approve is fixed. Operators cannot rewrite the priority ladder; this is a feature, not a limitation. A regulator reading the audit log needs the same priority semantics on every customer's account.
Full regex via RE2 engine. regex operator compiles policy patterns through Google's RE2 engine — supports the standard character classes (\d, \w, \s), quantifiers (*, +, {n,m}), anchors (^, $), groups, alternation (|), and the four flags i / m / s / u. The engine guarantees linear-time matching (no catastrophic backtracking), so admin-supplied patterns like (a+)+b cannot DoS the trace ingestion path. Patterns may be written as /pattern/flags or as a bare pattern string; both forms are accepted. Stateful flags (g, y) are silently dropped because .test() is single-shot.

Regulator mapping

Regulator surface	What this concept satisfies
EU AI Act Art. 14	Human oversight — the `flag_for_review` verdict + ReviewItem auto-creation IS the Art. 14 oversight surface
GDPR Art. 22(3)	Right not to be subject to solely-automated decisions — a flag interrupts automated execution and requires human verdict before the decision lands
EU AI Act Art. 13	Transparency — every `block` includes the matched-policy `reason` in the SDK response; the audit log preserves the verdict history
ISO 42001 §6.4.4	Operational controls for AI-system decisions — the policy catalogue is the "operational control" auditors map evidence against
BaFin MaRisk AT 4.3.4	Outsourced-AI risk controls — policies are the documented control points required for AI-driven decisioning at regulated entities

Performance posture

Policy evaluation runs synchronously inside POST /traces and is budgeted under the 25 ms p95 latency contract. The compound index (organizationId, workspaceId, enabled) keeps the policy fetch sub-millisecond at typical org sizes (< 200 active policies); policy bodies are evaluated in-process with no external calls. Auto-Approval pattern matching, ReviewItem creation, webhook dispatch, and audit-log appending all run after the synchronous verdict is computed; the customer's SDK does not wait on them.

If a policy match throws (malformed condition, regex compile failure, mixed-type comparison), the engine returns false for that condition and continues; an unparseable policy never blocks the customer's agent. The gate is strict, the dependency is not.

What a policy is​

The evaluation order: block > flag > notify > approve​

What happens when a verdict fires​

The Review Queue​

Auto-Approval: closing the loop​

What this is NOT​

Regulator mapping​

Performance posture​

See also​