Traces & Confidence Scoring
A model's self-reported confidence is the cheapest number to ship and the easiest one to lie about. Every LLM produces a softmax. Every classifier produces a probability. None of those numbers, on their own, tell a regulator whether the agent's decision held up against the next-best alternative or against what the same agent decided last Tuesday on the same input. That gap — between a model's self-report and a defensible score — is what the Adjudon Confidence Engine fills.
This page explains how the engine triangulates a trace's confidence score from three independent signals, what each signal is grounded in, what the score is allowed to do downstream (block? flag? auto- approve?), and where the boundary sits between Adjudon's score and the customer's own model. It is a concept page, not an API reference; the integration surface is documented at POST /traces and the runtime gate at concepts/policies-and-review.
What a "trace" is
A trace is the smallest auditable unit Adjudon stores: one decision
an agent made, snapshotted at the moment it was made. The schema
(DecisionTrace) captures the input that triggered the decision,
the output the agent emitted, the rationale the agent (or the
operator's policy) attached, the alternatives the model also
considered, and a stack of telemetry — SDK and runtime
fingerprint, ML-BOM digest, EU AI Act Art. 12 envelope, hash-chain
back-reference. Every paid plan retains traces for 90 days by
default; Governance and above can extend retention up to 365 days,
Enterprise to 3,650 days for the BaFin seven-year retention regime.
A trace exists per agent invocation. It is not a model log, not a prompt log, and not a system log. It is the row a regulator points to when they ask "show me decision X by AI system Y on date Z, with the inputs that produced it and the alternative it almost picked instead." That sentence is EU AI Act Article 13(1) verbatim ("transparency & provision of information to deployers"); it is why the trace exists.
The Confidence Engine: three independent pillars
The Engine produces a single triangulated score in [0, 1] and a
suggested resolution status. The score is the weighted sum of three
pillars; the suggested status is a discrete band derived from the
score plus optional warning flags.
┌─────────────────────────────────────────────────────────┐
│ Confidence Engine 2.0 │
├─────────────────────────────────────────────────────────┤
│ │
│ Pillar 1 — Base Model Probability weight 40% │
│ Pillar 2 — Variance vs. Next-Best weight 30% │
│ Pillar 3 — Historical Precedent weight 30% │
│ │
│ finalScore = 0.4·base + 0.3·var + 0.3·hist │
│ │
├─────────────────────────────────────────────────────────┤
│ <0.4 → escalated │
│ <0.7 OR flags → flagged │
│ ≥0.7 no flags → success │
└─────────────────────────────────────────────────────────┘
Pillar 1 — Base Model Probability (weight 40%)
The agent's own self-reported confidence, parsed from
outputDecision.confidenceScore or, falling back, from the top-
level confidence field. If the SDK omits both, the Engine assumes
0.5 and proceeds. This is the only pillar the model controls. By
itself it is not enough; the regulator question is not "what does
the model think" but "how does the model's claim hold up under
adversarial reading."
Pillar 2 — Variance vs. Next-Best Alternative (weight 30%)
If the agent submitted alternatives, the Engine reads the top
alternative's confidence and asks: how much daylight is there
between the chosen decision and the runner-up? The formula is
min(1.0, 0.5 + delta · 1.5) where delta = max(0, base − topAlt).
A 30-point gap (delta = 0.30) maps to a near-perfect variance
score. A toss-up (delta = 0.02) maps to a barely-passing 0.53. If
the agent submits no alternatives, the Engine returns the neutral
default 0.8 and assumes a single clear path.
The reason this pillar earns 30% of the weight: a 95% prediction beating a 92% runner-up is a statistically different decision from a 95% prediction beating a 60% runner-up. The first is a coin-flip disguised as confidence; the second is real. Sampling-temperature artefacts, prompt-injection ambiguity, and tool-selection toss-ups all show up here.
Pillar 3 — Historical Precedent (weight 30%)
The Engine embeds the trace's triggeringCondition and
inputContext into a vector, runs a similarity search against the
org's vector memory (top-3, min-score 0.7), and asks: of the past
similar decisions, how many resolved without human override and
without being flagged? That ratio becomes the historical score. If
the search finds no precedent above the 0.7 floor, the Engine
returns 0.6 and emits the NOVEL_SITUATION flag. If the embedding
service is unavailable or returns empty, the Engine returns 0.5 and
the trace continues; the gate is strict, the dependency is not.
The vector memory backend is OpenAI Embeddings — the only non-EU sub-processor in the Adjudon stack — with EU Standard Contractual Clauses on file and trace payload digested before send. Customers who reject the OpenAI dependency get a constant 0.5 on this pillar; the engine still runs.
The status ladder
After computing finalScore, the Engine sets two warning flags and
suggests a resolution status:
| Condition | Result |
|---|---|
finalScore < 0.6 | LOW_CONFIDENCE flag added |
pillars.variance < 0.3 | HIGH_AMBIGUITY flag added |
finalScore < 0.4 | suggestedStatus = 'escalated' |
finalScore < 0.7 OR flags.length > 0 | suggestedStatus = 'flagged' |
| else | suggestedStatus = 'success' |
The suggested status is not the final status. The Policy Engine
runs immediately after with full authority to overrule it — a
high-confidence trace can still be blocked if it hits a deny
policy, and a low-confidence one can still resolve approved if a
matured ApprovalPattern matches. Confidence is a signal; policy
is the gate. See
policies-and-review for the
ordering and
auto-approval for the maturation
mechanics.
What this is NOT
- Not the model's self-report. The base pillar earns 40% of the weight, not 100%; a model can never auto-approve itself out of human review.
- Not a probability of correctness. The score is a defensibility index, not a calibrated probability. A 0.92 means three independent signals agreed; it does not mean the decision is correct nine times out of ten.
- Not a runtime firewall. The Engine assigns a score and suggests a status; the Policy Engine and Review Queue are what block, hold, or release. Confidence is read; policy is write.
- Not a model-explainability tool. The Engine does not surface feature importances, attention weights, or counterfactuals. SHAP- style explainability is the customer's responsibility upstream; Adjudon evaluates the decision the model already made.
- Not blockchain. The score is anchored into the SHA-256 Decision Hash Chain alongside the trace; the chain is tamper- evident, append-only, and verifiable via three commands. It is not a distributed ledger.
Regulator mapping
| Regulator surface | What this concept satisfies |
|---|---|
| EU AI Act Art. 13 | Transparency & provision of information — the trace IS the record a deployer hands a market-surveillance authority |
| EU AI Act Art. 14 | Human oversight — the suggested-status ladder is what routes a sub-0.7 decision to the Review Queue |
| EU AI Act Art. 12 | Logging — the art12 envelope on every trace is the future-proof Art. 12 surface (input/output digest, model info, governance context) |
| GDPR Art. 22 | Solely-automated-decision protection — a flagged or escalated status creates the human-in-the-loop trigger Art. 22(3) requires |
| ISO 42001 | Per-clause traceability — every trace ships with the mlBomReference digest that maps to ISO 42001 §6 model-management evidence |
Failure modes and fail-open posture
The Engine has three external dependencies: the trace payload itself, the vector-memory similarity search, and the embedding service. Cardinal Rule: if any dependency is slow or unavailable, the Engine must not block trace ingestion — the customer's agent will keep running with or without an Adjudon score, and a missing score is recoverable downstream while a 503 is not. The fallbacks are:
- No alternatives in payload → variance pillar = 0.8 (neutral good).
- No vector-memory results above 0.7 → historical pillar = 0.6,
NOVEL_SITUATIONflag set. - Embedding service down or returns null → historical pillar = 0.5 (neutral); the trace continues.
- Engine throws → the trace is saved with the model's self-
reported
confidenceScore(or 0.5 default); the Policy Engine still runs.
The published p50/p95/p99 latency budget for POST /traces (10 ms /
25 ms / 45 ms) is measured end-to-end, including this engine. If a
trace's confidence path exceeds the budget, the historical pillar
times out first — the budget protects the customer's agent
from Adjudon's dependencies, not the other way round.
See also
- POST /traces — the integration surface that produces the trace
- Policies & Review — the runtime gate that consumes the score
- Hash Chain — how the score is anchored for tamper-evident retrieval
- Auto-Approval — what
matured pattern-matching does to a
flaggedtrace - Sub-Processors — the OpenAI Embeddings exception and the SCC posture