Skip to main content

Traces & Confidence Scoring

A model's self-reported confidence is the cheapest number to ship and the easiest one to lie about. Every LLM produces a softmax. Every classifier produces a probability. None of those numbers, on their own, tell a regulator whether the agent's decision held up against the next-best alternative or against what the same agent decided last Tuesday on the same input. That gap — between a model's self-report and a defensible score — is what the Adjudon Confidence Engine fills.

This page explains how the engine triangulates a trace's confidence score from three independent signals, what each signal is grounded in, what the score is allowed to do downstream (block? flag? auto- approve?), and where the boundary sits between Adjudon's score and the customer's own model. It is a concept page, not an API reference; the integration surface is documented at POST /traces and the runtime gate at concepts/policies-and-review.

What a "trace" is

A trace is the smallest auditable unit Adjudon stores: one decision an agent made, snapshotted at the moment it was made. The schema (DecisionTrace) captures the input that triggered the decision, the output the agent emitted, the rationale the agent (or the operator's policy) attached, the alternatives the model also considered, and a stack of telemetry — SDK and runtime fingerprint, ML-BOM digest, EU AI Act Art. 12 envelope, hash-chain back-reference. Every paid plan retains traces for 90 days by default; Governance and above can extend retention up to 365 days, Enterprise to 3,650 days for the BaFin seven-year retention regime.

A trace exists per agent invocation. It is not a model log, not a prompt log, and not a system log. It is the row a regulator points to when they ask "show me decision X by AI system Y on date Z, with the inputs that produced it and the alternative it almost picked instead." That sentence is EU AI Act Article 13(1) verbatim ("transparency & provision of information to deployers"); it is why the trace exists.

The Confidence Engine: three independent pillars

The Engine produces a single triangulated score in [0, 1] and a suggested resolution status. The score is the weighted sum of three pillars; the suggested status is a discrete band derived from the score plus optional warning flags.

   ┌─────────────────────────────────────────────────────────┐
│ Confidence Engine 2.0 │
├─────────────────────────────────────────────────────────┤
│ │
│ Pillar 1 — Base Model Probability weight 40% │
│ Pillar 2 — Variance vs. Next-Best weight 30% │
│ Pillar 3 — Historical Precedent weight 30% │
│ │
│ finalScore = 0.4·base + 0.3·var + 0.3·hist │
│ │
├─────────────────────────────────────────────────────────┤
│ <0.4 → escalated │
│ <0.7 OR flags → flagged │
│ ≥0.7 no flags → success │
└─────────────────────────────────────────────────────────┘

Pillar 1 — Base Model Probability (weight 40%)

The agent's own self-reported confidence, parsed from outputDecision.confidenceScore or, falling back, from the top- level confidence field. If the SDK omits both, the Engine assumes 0.5 and proceeds. This is the only pillar the model controls. By itself it is not enough; the regulator question is not "what does the model think" but "how does the model's claim hold up under adversarial reading."

Pillar 2 — Variance vs. Next-Best Alternative (weight 30%)

If the agent submitted alternatives, the Engine reads the top alternative's confidence and asks: how much daylight is there between the chosen decision and the runner-up? The formula is min(1.0, 0.5 + delta · 1.5) where delta = max(0, base − topAlt). A 30-point gap (delta = 0.30) maps to a near-perfect variance score. A toss-up (delta = 0.02) maps to a barely-passing 0.53. If the agent submits no alternatives, the Engine returns the neutral default 0.8 and assumes a single clear path.

The reason this pillar earns 30% of the weight: a 95% prediction beating a 92% runner-up is a statistically different decision from a 95% prediction beating a 60% runner-up. The first is a coin-flip disguised as confidence; the second is real. Sampling-temperature artefacts, prompt-injection ambiguity, and tool-selection toss-ups all show up here.

Pillar 3 — Historical Precedent (weight 30%)

The Engine embeds the trace's triggeringCondition and inputContext into a vector, runs a similarity search against the org's vector memory (top-3, min-score 0.7), and asks: of the past similar decisions, how many resolved without human override and without being flagged? That ratio becomes the historical score. If the search finds no precedent above the 0.7 floor, the Engine returns 0.6 and emits the NOVEL_SITUATION flag. If the embedding service is unavailable or returns empty, the Engine returns 0.5 and the trace continues; the gate is strict, the dependency is not.

The vector memory backend is OpenAI Embeddings — the only non-EU sub-processor in the Adjudon stack — with EU Standard Contractual Clauses on file and trace payload digested before send. Customers who reject the OpenAI dependency get a constant 0.5 on this pillar; the engine still runs.

The status ladder

After computing finalScore, the Engine sets two warning flags and suggests a resolution status:

ConditionResult
finalScore < 0.6LOW_CONFIDENCE flag added
pillars.variance < 0.3HIGH_AMBIGUITY flag added
finalScore < 0.4suggestedStatus = 'escalated'
finalScore < 0.7 OR flags.length > 0suggestedStatus = 'flagged'
elsesuggestedStatus = 'success'

The suggested status is not the final status. The Policy Engine runs immediately after with full authority to overrule it — a high-confidence trace can still be blocked if it hits a deny policy, and a low-confidence one can still resolve approved if a matured ApprovalPattern matches. Confidence is a signal; policy is the gate. See policies-and-review for the ordering and auto-approval for the maturation mechanics.

What this is NOT

  • Not the model's self-report. The base pillar earns 40% of the weight, not 100%; a model can never auto-approve itself out of human review.
  • Not a probability of correctness. The score is a defensibility index, not a calibrated probability. A 0.92 means three independent signals agreed; it does not mean the decision is correct nine times out of ten.
  • Not a runtime firewall. The Engine assigns a score and suggests a status; the Policy Engine and Review Queue are what block, hold, or release. Confidence is read; policy is write.
  • Not a model-explainability tool. The Engine does not surface feature importances, attention weights, or counterfactuals. SHAP- style explainability is the customer's responsibility upstream; Adjudon evaluates the decision the model already made.
  • Not blockchain. The score is anchored into the SHA-256 Decision Hash Chain alongside the trace; the chain is tamper- evident, append-only, and verifiable via three commands. It is not a distributed ledger.

Regulator mapping

Regulator surfaceWhat this concept satisfies
EU AI Act Art. 13Transparency & provision of information — the trace IS the record a deployer hands a market-surveillance authority
EU AI Act Art. 14Human oversight — the suggested-status ladder is what routes a sub-0.7 decision to the Review Queue
EU AI Act Art. 12Logging — the art12 envelope on every trace is the future-proof Art. 12 surface (input/output digest, model info, governance context)
GDPR Art. 22Solely-automated-decision protection — a flagged or escalated status creates the human-in-the-loop trigger Art. 22(3) requires
ISO 42001Per-clause traceability — every trace ships with the mlBomReference digest that maps to ISO 42001 §6 model-management evidence

Failure modes and fail-open posture

The Engine has three external dependencies: the trace payload itself, the vector-memory similarity search, and the embedding service. Cardinal Rule: if any dependency is slow or unavailable, the Engine must not block trace ingestion — the customer's agent will keep running with or without an Adjudon score, and a missing score is recoverable downstream while a 503 is not. The fallbacks are:

  • No alternatives in payload → variance pillar = 0.8 (neutral good).
  • No vector-memory results above 0.7 → historical pillar = 0.6, NOVEL_SITUATION flag set.
  • Embedding service down or returns null → historical pillar = 0.5 (neutral); the trace continues.
  • Engine throws → the trace is saved with the model's self- reported confidenceScore (or 0.5 default); the Policy Engine still runs.

The published p50/p95/p99 latency budget for POST /traces (10 ms / 25 ms / 45 ms) is measured end-to-end, including this engine. If a trace's confidence path exceeds the budget, the historical pillar times out first — the budget protects the customer's agent from Adjudon's dependencies, not the other way round.

See also

  • POST /traces — the integration surface that produces the trace
  • Policies & Review — the runtime gate that consumes the score
  • Hash Chain — how the score is anchored for tamper-evident retrieval
  • Auto-Approval — what matured pattern-matching does to a flagged trace
  • Sub-Processors — the OpenAI Embeddings exception and the SCC posture