Skip to main content

Policies & Human Review

The Confidence Engine produces a number; the Policy Engine produces a verdict. A trace can score 0.95 and still be blocked because it matches a deny policy. A trace can score 0.42 and still resolve without human review because no policy fired and no review queue is configured. Confidence is what the trace looks like; policy is what the operator's compliance posture allows. This page explains how the Policy Engine evaluates a trace at ingestion time, how a verdict becomes either a 403 ADJ_BLOCKED_BY_POLICY or a 202 hold-for- review with a freshly-created ReviewItem, and where the line sits between the engine's authority and the reviewer's.

The integration surface is documented at POST /policies and POST /reviews; the input score this engine consumes is the one defined in traces-and-confidence.

What a policy is

A policy is a named, ordered, enabled rule that matches a trace by its fields and emits one or more actions. The schema (Policy) captures name, description, enabled, priority, an array of conditions, and an array of actions. Conditions match on a field (e.g. confidenceScore, status, agentId, humanOverride) using one of five operators: equals, contains, greater_than, less_than, regex. Multiple conditions chain using AND / OR logical operators in document order; the result is a single boolean — either the policy fires or it does not.

Actions are typed: block, flag_for_review, notify, approve. A policy can declare more than one action; the engine reads each action independently and accumulates verdicts across all matched policies before resolving.

A policy is workspace-scoped by default but the engine also matches org-wide policies (workspace null) against every trace. The compound index (organizationId, workspaceId, enabled) is what makes this evaluable inside the POST /traces p95 budget.

The evaluation order: block > flag > notify > approve

The Policy Engine evaluates every matching policy and then resolves to a single winning action. The priority is fixed:

   ┌─────────────────────────────────────────────────────────┐
│ Policy Engine — verdict │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ 1. ANY matched policy with action │ │
│ │ "block" → BLOCK (403) │ │
│ ├───────────────────────────────────────────┤ │
│ │ 2. ELSE any matched policy with action │ │
│ │ "flag_for_review"→ HOLD (202) │ │
│ ├───────────────────────────────────────────┤ │
│ │ 3. ELSE any matched policy with action │ │
│ │ "notify" → ALLOW + alert │ │
│ ├───────────────────────────────────────────┤ │
│ │ 4. ELSE │ │
│ │ → ALLOW (201) │ │
│ └───────────────────────────────────────────┘ │
│ │
│ priority = .sort('priority') ascending │
│ ties broken by document insertion order │
└─────────────────────────────────────────────────────────┘

A single block verdict among ten matched policies wins; the other nine matches are ignored for resolution but their matchCount and lastMatched fields are still updated. There is no AND-of-actions mode and no probabilistic mixing; the highest-severity action across all matched policies is the verdict, full stop.

Policies are sorted by priority ascending (Policy.find().sort('priority')). Within a priority tier, document insertion order resolves ties — a deterministic but non-semantic ordering. If you need stable resolution under a tie, write distinct priorities; the engine does not surface "two policies matched at priority 5" as ambiguous.

What happens when a verdict fires

The verdict is read at the end of the trace-ingestion pipeline, after PII scrubbing, after Confidence Engine triangulation, and before durable persistence:

  • block → the trace is saved with status: 'blocked', the hash chain appends, the alert engine fires, the trace.blocked webhook event dispatches, and the SDK returns 403 ADJ_BLOCKED_BY_POLICY with the matched policy's reason. The customer's agent receives a hard refusal; no decision flows to production.
  • flag_for_review → the trace is saved with status: 'flagged', a new ReviewItem is created from the trace (priority derived from confidence band, 24-hour SLA deadline by default), the trace.hold_for_review webhook event dispatches, and the SDK returns 202 with allowed: false, action: 'hold_for_review', reviewId. The customer's agent must wait for the human verdict.
  • notify → the trace is saved as success, allowed to flow, and the configured notify channel (Slack / email / webhook) receives a side-band alert. The decision goes through.
  • (no match) → the Confidence Engine's suggested status takes effect: success → saved & allowed; flagged / escalated → saved, ReviewItem auto-created on confidence alone (no policy required), SDK returns 202.

The order matters: a block policy can override a high-confidence trace, and a low-confidence trace can be auto-routed to review even without a policy.

The Review Queue

A ReviewItem is the unit of human work. Each item carries the trace's _id reference, an action string (the triggering condition or the policy reason), the confidence percentage at creation time, a derived priority (critical / high / medium / low), a status (pendingapproved / rejected / escalated), the input context, and a slaDeadline set to now + 24 hours. Priority is derived deterministically:

Trace status / confidenceDerived priority
status: 'escalated'critical
confidence < 65%critical
confidence < 75%high
confidence < 85%medium
elselow

A reviewer with the reviewer or higher role opens the queue, picks an item, and resolves it — approve the agent's decision, reject it (the agent is told to fail closed), escalate it to a senior reviewer, or override it with a different decision. The verdict is appended to the audit log and to the SHA-256 hash chain alongside the original trace.

The 500-item hard cap on GET /review-queue is intentional: a queue that grows unbounded is no longer reviewable. If you have more than 500 pending items, your policy thresholds are too tight or your reviewer roster is too thin — both signals the dashboard surfaces directly on the queue header.

Auto-Approval: closing the loop

A trace that lands in the queue, gets approved by a human, and is followed by 49 more nearly-identical traces that get approved by humans is, eventually, evidence that a human is no longer needed for that pattern. The Auto-Approval Engine watches for that evidence: ≥50 observations, ≥95% approval rate, 4-eyes sign-off. Once active, a matching flag_for_review verdict resolves directly to approved without entering the queue.

Auto-Approval can never override a block. The schema enforces it (AutoApprovalDecision.priorStatus enum is flagged or escalated, not blocked); the engine reads the trace's pre- resolution status; a block verdict short-circuits the entire pipeline before auto-approval runs. See auto-approval for the maturation mechanics.

What this is NOT

  • Not a runtime LLM firewall. Adjudon does not scan prompt payloads for jailbreak strings or run a secondary LLM in the audit path. The Policy Engine matches structured trace fields (confidenceScore, status, agentId, humanOverride, arbitrary metadata) using deterministic operators — no LLM, no embedding, no learned classifier in the gate.
  • Not soft enforcement. A block verdict returns HTTP 403 synchronously, before the agent's decision is ever durable; there is no eventually-consistent block.
  • Not a substitute for human review. The maturation thresholds for Auto-Approval are deliberately conservative (≥50 / ≥95% / 4-eyes / 90-day re-validation). The default policy posture is human-in-the-loop; Auto-Approval is opt-in.
  • Not configurable severity ordering. block > flag > notify > approve is fixed. Operators cannot rewrite the priority ladder; this is a feature, not a limitation. A regulator reading the audit log needs the same priority semantics on every customer's account.
  • Not a regex engine. Operators can write regex conditions but the engine escapes the pattern before compilation (escapeRegex() at policyEngine.js:11); ReDoS payloads in policy values are defanged. Treat regex conditions as literal- string match with metacharacters disabled.

Regulator mapping

Regulator surfaceWhat this concept satisfies
EU AI Act Art. 14Human oversight — the flag_for_review verdict + ReviewItem auto-creation IS the Art. 14 oversight surface
GDPR Art. 22(3)Right not to be subject to solely-automated decisions — a flag interrupts automated execution and requires human verdict before the decision lands
EU AI Act Art. 13Transparency — every block includes the matched-policy reason in the SDK response; the audit log preserves the verdict history
ISO 42001 §6.4.4Operational controls for AI-system decisions — the policy catalogue is the "operational control" auditors map evidence against
BaFin MaRisk AT 4.3.4Outsourced-AI risk controls — policies are the documented control points required for AI-driven decisioning at regulated entities

Performance posture

Policy evaluation runs synchronously inside POST /traces and is budgeted under the 25 ms p95 latency contract. The compound index (organizationId, workspaceId, enabled) keeps the policy fetch sub-millisecond at typical org sizes (< 200 active policies); policy bodies are evaluated in-process with no external calls. Auto-Approval pattern matching, ReviewItem creation, webhook dispatch, and audit-log appending all run after the synchronous verdict is computed; the customer's SDK does not wait on them.

If a policy match throws (malformed condition, regex compile failure, mixed-type comparison), the engine returns false for that condition and continues; an unparseable policy never blocks the customer's agent. The gate is strict, the dependency is not.

See also