Policies & Human Review
The Confidence Engine produces a number; the Policy Engine produces
a verdict. A trace can score 0.95 and still be blocked because it
matches a deny policy. A trace can score 0.42 and still resolve
without human review because no policy fired and no review queue is
configured. Confidence is what the trace looks like; policy is what
the operator's compliance posture allows. This page explains how the
Policy Engine evaluates a trace at ingestion time, how a verdict
becomes either a 403 ADJ_BLOCKED_BY_POLICY or a 202 hold-for-
review with a freshly-created ReviewItem, and where the line sits
between the engine's authority and the reviewer's.
The integration surface is documented at POST /policies and POST /reviews; the input score this engine consumes is the one defined in traces-and-confidence.
What a policy is
A policy is a named, ordered, enabled rule that matches a trace by
its fields and emits one or more actions. The schema (Policy)
captures name, description, enabled, priority, an array of
conditions, and an array of actions. Conditions match on a
field (e.g. confidenceScore, status, agentId,
humanOverride) using one of five operators: equals, contains,
greater_than, less_than, regex. Multiple conditions chain
using AND / OR logical operators in document order; the result
is a single boolean — either the policy fires or it does not.
Actions are typed: block, flag_for_review, notify, approve.
A policy can declare more than one action; the engine reads each
action independently and accumulates verdicts across all matched
policies before resolving.
A policy is workspace-scoped by default but the engine also matches
org-wide policies (workspace null) against every trace. The
compound index (organizationId, workspaceId, enabled) is what
makes this evaluable inside the POST /traces p95 budget.
The evaluation order: block > flag > notify > approve
The Policy Engine evaluates every matching policy and then resolves to a single winning action. The priority is fixed:
┌─────────────────────────────────────────────────────────┐
│ Policy Engine — verdict │
├─────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────┐ │
│ │ 1. ANY matched policy with action │ │
│ │ "block" → BLOCK (403) │ │
│ ├───────────────────────────────────────────┤ │
│ │ 2. ELSE any matched policy with action │ │
│ │ "flag_for_review"→ HOLD (202) │ │
│ ├───────────────────────────────────────────┤ │
│ │ 3. ELSE any matched policy with action │ │
│ │ "notify" → ALLOW + alert │ │
│ ├───────────────────────────────────────────┤ │
│ │ 4. ELSE │ │
│ │ → ALLOW (201) │ │
│ └───────────────────────────────────────────┘ │
│ │
│ priority = .sort('priority') ascending │
│ ties broken by document insertion order │
└─────────────────────────────────────────────────────────┘
A single block verdict among ten matched policies wins; the other
nine matches are ignored for resolution but their matchCount and
lastMatched fields are still updated. There is no AND-of-actions
mode and no probabilistic mixing; the highest-severity action
across all matched policies is the verdict, full stop.
Policies are sorted by priority ascending (Policy.find().sort('priority')).
Within a priority tier, document insertion order resolves ties —
a deterministic but non-semantic ordering. If you need stable
resolution under a tie, write distinct priorities; the engine does
not surface "two policies matched at priority 5" as ambiguous.
What happens when a verdict fires
The verdict is read at the end of the trace-ingestion pipeline, after PII scrubbing, after Confidence Engine triangulation, and before durable persistence:
block→ the trace is saved withstatus: 'blocked', the hash chain appends, the alert engine fires, thetrace.blockedwebhook event dispatches, and the SDK returns403 ADJ_BLOCKED_BY_POLICYwith the matched policy's reason. The customer's agent receives a hard refusal; no decision flows to production.flag_for_review→ the trace is saved withstatus: 'flagged', a newReviewItemis created from the trace (priority derived from confidence band, 24-hour SLA deadline by default), thetrace.hold_for_reviewwebhook event dispatches, and the SDK returns202withallowed: false, action: 'hold_for_review', reviewId. The customer's agent must wait for the human verdict.notify→ the trace is saved assuccess, allowed to flow, and the configured notify channel (Slack / email / webhook) receives a side-band alert. The decision goes through.- (no match) → the Confidence Engine's suggested status
takes effect:
success→ saved & allowed;flagged/escalated→ saved, ReviewItem auto-created on confidence alone (no policy required), SDK returns202.
The order matters: a block policy can override a high-confidence
trace, and a low-confidence trace can be auto-routed to review even
without a policy.
The Review Queue
A ReviewItem is the unit of human work. Each item carries the
trace's _id reference, an action string (the triggering condition
or the policy reason), the confidence percentage at creation time,
a derived priority (critical / high / medium / low), a status
(pending → approved / rejected / escalated), the input
context, and a slaDeadline set to now + 24 hours. Priority is
derived deterministically:
| Trace status / confidence | Derived priority |
|---|---|
status: 'escalated' | critical |
| confidence < 65% | critical |
| confidence < 75% | high |
| confidence < 85% | medium |
| else | low |
A reviewer with the reviewer or higher role opens the queue, picks
an item, and resolves it — approve the agent's decision,
reject it (the agent is told to fail closed), escalate it to a
senior reviewer, or override it with a different decision. The
verdict is appended to the audit log and to the SHA-256 hash chain
alongside the original trace.
The 500-item hard cap on GET /review-queue is intentional: a
queue that grows unbounded is no longer reviewable. If you have
more than 500 pending items, your policy thresholds are too tight
or your reviewer roster is too thin — both signals the
dashboard surfaces directly on the queue header.
Auto-Approval: closing the loop
A trace that lands in the queue, gets approved by a human, and is
followed by 49 more nearly-identical traces that get approved by
humans is, eventually, evidence that a human is no longer needed
for that pattern. The Auto-Approval Engine watches for that
evidence: ≥50 observations, ≥95% approval rate, 4-eyes sign-off.
Once active, a matching flag_for_review verdict resolves directly
to approved without entering the queue.
Auto-Approval can never override a block. The schema enforces
it (AutoApprovalDecision.priorStatus enum is flagged or
escalated, not blocked); the engine reads the trace's pre-
resolution status; a block verdict short-circuits the entire
pipeline before auto-approval runs. See
auto-approval for the maturation
mechanics.
What this is NOT
- Not a runtime LLM firewall. Adjudon does not scan prompt
payloads for jailbreak strings or run a secondary LLM in the
audit path. The Policy Engine matches structured trace fields
(
confidenceScore,status,agentId,humanOverride, arbitrary metadata) using deterministic operators — no LLM, no embedding, no learned classifier in the gate. - Not soft enforcement. A
blockverdict returns HTTP 403 synchronously, before the agent's decision is ever durable; there is no eventually-consistent block. - Not a substitute for human review. The maturation thresholds for Auto-Approval are deliberately conservative (≥50 / ≥95% / 4-eyes / 90-day re-validation). The default policy posture is human-in-the-loop; Auto-Approval is opt-in.
- Not configurable severity ordering.
block > flag > notify > approveis fixed. Operators cannot rewrite the priority ladder; this is a feature, not a limitation. A regulator reading the audit log needs the same priority semantics on every customer's account. - Not a regex engine. Operators can write
regexconditions but the engine escapes the pattern before compilation (escapeRegex()atpolicyEngine.js:11); ReDoS payloads in policy values are defanged. Treat regex conditions as literal- string match with metacharacters disabled.
Regulator mapping
| Regulator surface | What this concept satisfies |
|---|---|
| EU AI Act Art. 14 | Human oversight — the flag_for_review verdict + ReviewItem auto-creation IS the Art. 14 oversight surface |
| GDPR Art. 22(3) | Right not to be subject to solely-automated decisions — a flag interrupts automated execution and requires human verdict before the decision lands |
| EU AI Act Art. 13 | Transparency — every block includes the matched-policy reason in the SDK response; the audit log preserves the verdict history |
| ISO 42001 §6.4.4 | Operational controls for AI-system decisions — the policy catalogue is the "operational control" auditors map evidence against |
| BaFin MaRisk AT 4.3.4 | Outsourced-AI risk controls — policies are the documented control points required for AI-driven decisioning at regulated entities |
Performance posture
Policy evaluation runs synchronously inside POST /traces and is
budgeted under the 25 ms p95 latency contract. The compound index
(organizationId, workspaceId, enabled) keeps the policy fetch
sub-millisecond at typical org sizes (< 200 active policies);
policy bodies are evaluated in-process with no external calls.
Auto-Approval pattern matching, ReviewItem creation, webhook
dispatch, and audit-log appending all run after the synchronous
verdict is computed; the customer's SDK does not wait on them.
If a policy match throws (malformed condition, regex compile
failure, mixed-type comparison), the engine returns false for
that condition and continues; an unparseable policy never blocks
the customer's agent. The gate is strict, the dependency is not.
See also
- Policies API — the CRUD surface for the rules this engine evaluates
- Reviews API — the human-review queue this engine feeds
- Auto-Approval — the pattern-maturation feedback loop that closes review work
- Traces & Confidence — the score this engine reads from
- POST /traces — the ingestion surface that wraps the whole pipeline