Tracing the OpenAI SDK

Goal

Trace every chat.completions.create() call from the OpenAI Python SDK through Adjudon's audit layer with a thin wrapper that adds two lines around the call site. The OpenAI SDK keeps working exactly as before; the trace emerges from the wrapper.

Status

Adjudon does not ship a dedicated adjudon-openai adapter package today. The pattern below uses the core adjudon package — manual wrap rather than callback-driven. A LangChain-style auto-instrument wrapper is on the roadmap; the manual wrap is what production customers run today and the shape will not change when the wrapper ships.

You'll need

An Adjudon Sandbox plan (or above)
An adj_test_* agent API key
An OpenAI API key
Python 3.9+ with openai and adjudon installed

pip install openai adjudon
export ADJUDON_API_KEY="adj_test_..."
export OPENAI_API_KEY="sk-..."

Code

openai_traced.py
import os
from openai import OpenAI
from adjudon import Adjudon

openai = OpenAI()
adjudon = Adjudon(
    api_key=os.environ["ADJUDON_API_KEY"],
    agent_id="customer-support-bot",
)

def traced_chat(messages: list[dict], model: str = "gpt-4o-mini") -> str:
    """Wrap one chat.completions.create call in an Adjudon trace."""
    completion = openai.chat.completions.create(
        model=model,
        messages=messages,
    )
    answer = completion.choices[0].message.content

    # Pull confidence from logprobs if available; fall back to neutral.
    # Production: derive from prob distribution or a downstream classifier.
    confidence = 0.85

    trace = adjudon.trace(
        input_context={
            "prompt":         messages[-1]["content"],
            "systemPrompt":   next((m["content"] for m in messages if m["role"] == "system"), None),
            "model":          model,
        },
        output_decision={
            "action":      answer,
            "confidence":  confidence,
        },
        metadata={
            "llmProvider":     "openai",
            "responseModel":   completion.model,
            "tokensInput":     completion.usage.prompt_tokens,
            "tokensOutput":    completion.usage.completion_tokens,
            "finishReason":    completion.choices[0].finish_reason,
        },
    )

    if trace.status == "blocked":
        raise RuntimeError(f"Blocked by policy: {trace.id}")
    return answer

# ── Use it ──────────────────────────────────────────────────────────────
reply = traced_chat([
    {"role": "system",  "content": "You are a refund-policy assistant."},
    {"role": "user",    "content": "I want a refund for order #12345."},
])
print(reply)

Run it:

python openai_traced.py
# → "Sure — for order #12345, our policy allows a full refund within 30 days. ..."

What just happened

Two HTTP requests fired: one to OpenAI for the completion, one to Adjudon for the trace. The trace carries the user prompt, the system prompt, the model name, the response text, and the token-usage telemetry from the OpenAI response. The Confidence Engine ran the Three-Pillar triangulation (your confidence: 0.85 becomes the base pillar; variance and historical pillars compute automatically); the Policy Engine evaluated any active deny rules; the response status is what your code acts on.

If you set up an OpenAI logprobs-driven confidence (passing logprobs=True, top_logprobs=5 to the completion call), the top-token probability is a more honest base than the constant 0.85 — and the confidence triangulation rewards a well-calibrated base score with a higher final score.

What gets recorded on each trace

The wrapper above pulls a deliberately small set of fields out of the OpenAI response. Each one earns its place:

Field	Why it's there
`prompt`, `systemPrompt`	The two semantically distinct inputs the regulator wants to see
`model`	The exact model identifier the call routed to (different from what the SDK was configured for if OpenAI does silent rerouting)
`responseModel`	OpenAI's `completion.model` field; can differ from the requested model under weighted-routing
`tokensInput`, `tokensOutput`	The OpenAI usage telemetry needed for cost reconciliation
`finishReason`	`stop`, `length`, `tool_calls`, or `content_filter` — the last is its own audit signal

The metadata.finishReason: 'content_filter' case is worth capturing explicitly: an OpenAI content-policy filter that truncated the response is itself a compliance-relevant event, and the audit trail should record it without the operator having to dig into OpenAI's response shape later.

Edge cases worth handling

openai.RateLimitError: catch and either retry with back-off or trace the failure with metadata.error: true, metadata.errorType: 'rate-limit'. The trace is the evidence the agent attempted the call.
openai.APIError (5xx from OpenAI): same pattern. Adjudon's audit posture is "every decision attempt is a trace"; a failed downstream call is a kind of decision (the decision not to proceed).
Empty completion (completion.choices[0].message.content is None): trace with outputDecision.action: '' and metadata.emptyResponse: true. The Confidence Engine treats it as a low-confidence event; the Review Queue catches it.

Why a wrapper, not a monkey-patch

Patching openai.chat.completions.create globally is tempting — one import and every call is traced. The downside is the patch wraps every call: a system-prompt-engineering test script, an offline evaluation pipeline, an internal benchmark, all hit the trace endpoint and inflate metered usage. The explicit-wrapper pattern keeps the call sites visible: every traced call is a traced call by intention.

For high-volume call sites where the wrapper is repetitive, factor it into a single helper module the team imports; the wrapper logic stays in one place without taking the all-or-nothing patching trade-off.

Going further

OpenTelemetry path. If your team already runs an OTel collector with the OpenAI auto-instrumentation, point the collector's OTLP exporter at Adjudon — the OpenTelemetry recipe covers the zero-import alternative.
Tool-call traces. When the completion includes tool_calls, capture them in outputDecision.toolCalls[] on the trace; the Multi-step agents recipe shows the schema.
Streaming completions. For stream=True calls, emit the trace once after the stream closes with the assembled response in outputDecision.action and the streaming duration in metadata.streamDurationMs. Do not trace per-chunk — one trace per logical decision is the rule.
Async OpenAI client. Swap from openai import OpenAI for from openai import AsyncOpenAI and adjudon.trace for adjudon.atrace; everything else is identical.

Privacy posture

The OpenAI call sends the prompt to OpenAI; the Adjudon trace sends the prompt to Adjudon. Both surfaces carry the customer data, but Adjudon runs the standard PII scrubber on the trace payload before persistence and on every downstream read. The OpenAI side is governed by your OpenAI data-processing agreement — the two surfaces are independent, and your privacy posture must account for both. For organisations that route AI prompts to a private deployment (Azure OpenAI in EU regions, self-hosted models), the trace shape on Adjudon's side is identical; only the upstream call changes.

Goal​

You'll need​

Code​

What just happened​

What gets recorded on each trace​

Edge cases worth handling​

Why a wrapper, not a monkey-patch​

Going further​

Privacy posture​

See also​