Tracing the OpenAI SDK
Goal
Trace every chat.completions.create() call from the OpenAI
Python SDK through Adjudon's audit layer with a thin wrapper
that adds two lines around the call site. The OpenAI SDK keeps
working exactly as before; the trace emerges from the wrapper.
Adjudon does not ship a dedicated adjudon-openai adapter
package today. The pattern below uses the core adjudon
package — manual wrap rather than callback-driven. A
LangChain-style auto-instrument wrapper is on the roadmap; the
manual wrap is what production customers run today and the
shape will not change when the wrapper ships.
You'll need
- An Adjudon Sandbox plan (or above)
- An
adj_test_*agent API key - An OpenAI API key
- Python 3.9+ with
openaiandadjudoninstalled
pip install openai adjudon
export ADJUDON_API_KEY="adj_test_..."
export OPENAI_API_KEY="sk-..."
Code
import os
from openai import OpenAI
from adjudon import Adjudon
openai = OpenAI()
adjudon = Adjudon(
api_key=os.environ["ADJUDON_API_KEY"],
agent_id="customer-support-bot",
)
def traced_chat(messages: list[dict], model: str = "gpt-4o-mini") -> str:
"""Wrap one chat.completions.create call in an Adjudon trace."""
completion = openai.chat.completions.create(
model=model,
messages=messages,
)
answer = completion.choices[0].message.content
# Pull confidence from logprobs if available; fall back to neutral.
# Production: derive from prob distribution or a downstream classifier.
confidence = 0.85
trace = adjudon.trace(
input_context={
"prompt": messages[-1]["content"],
"systemPrompt": next((m["content"] for m in messages if m["role"] == "system"), None),
"model": model,
},
output_decision={
"action": answer,
"confidence": confidence,
},
metadata={
"llmProvider": "openai",
"responseModel": completion.model,
"tokensInput": completion.usage.prompt_tokens,
"tokensOutput": completion.usage.completion_tokens,
"finishReason": completion.choices[0].finish_reason,
},
)
if trace.status == "blocked":
raise RuntimeError(f"Blocked by policy: {trace.id}")
return answer
# ── Use it ──────────────────────────────────────────────────────────────
reply = traced_chat([
{"role": "system", "content": "You are a refund-policy assistant."},
{"role": "user", "content": "I want a refund for order #12345."},
])
print(reply)
Run it:
python openai_traced.py
# → "Sure — for order #12345, our policy allows a full refund within 30 days. ..."
What just happened
Two HTTP requests fired: one to OpenAI for the completion, one
to Adjudon for the trace. The trace carries the user prompt,
the system prompt, the model name, the response text, and the
token-usage telemetry from the OpenAI response. The
Confidence Engine ran the
Three-Pillar triangulation (your confidence: 0.85 becomes
the base pillar; variance and historical pillars compute
automatically); the
Policy Engine evaluated any
active deny rules; the response status is what your code
acts on.
If you set up an OpenAI logprobs-driven confidence (passing
logprobs=True, top_logprobs=5 to the completion call), the
top-token probability is a more honest base than the constant
0.85 — and the confidence triangulation rewards a
well-calibrated base score with a higher final score.
What gets recorded on each trace
The wrapper above pulls a deliberately small set of fields out of the OpenAI response. Each one earns its place:
| Field | Why it's there |
|---|---|
prompt, systemPrompt | The two semantically distinct inputs the regulator wants to see |
model | The exact model identifier the call routed to (different from what the SDK was configured for if OpenAI does silent rerouting) |
responseModel | OpenAI's completion.model field; can differ from the requested model under weighted-routing |
tokensInput, tokensOutput | The OpenAI usage telemetry needed for cost reconciliation |
finishReason | stop, length, tool_calls, or content_filter — the last is its own audit signal |
The metadata.finishReason: 'content_filter' case is worth
capturing explicitly: an OpenAI content-policy filter that
truncated the response is itself a compliance-relevant event,
and the audit trail should record it without the operator
having to dig into OpenAI's response shape later.
Edge cases worth handling
openai.RateLimitError: catch and either retry with back-off or trace the failure withmetadata.error: true,metadata.errorType: 'rate-limit'. The trace is the evidence the agent attempted the call.openai.APIError(5xx from OpenAI): same pattern. Adjudon's audit posture is "every decision attempt is a trace"; a failed downstream call is a kind of decision (the decision not to proceed).- Empty completion (
completion.choices[0].message.contentisNone): trace withoutputDecision.action: ''andmetadata.emptyResponse: true. The Confidence Engine treats it as a low-confidence event; the Review Queue catches it.
Why a wrapper, not a monkey-patch
Patching openai.chat.completions.create globally is tempting
— one import and every call is traced. The downside is the
patch wraps every call: a system-prompt-engineering test
script, an offline evaluation pipeline, an internal benchmark,
all hit the trace endpoint and inflate metered usage. The
explicit-wrapper pattern keeps the call sites visible: every
traced call is a traced call by intention.
For high-volume call sites where the wrapper is repetitive, factor it into a single helper module the team imports; the wrapper logic stays in one place without taking the all-or-nothing patching trade-off.
Going further
- OpenTelemetry path. If your team already runs an OTel collector with the OpenAI auto-instrumentation, point the collector's OTLP exporter at Adjudon — the OpenTelemetry recipe covers the zero-import alternative.
- Tool-call traces. When the completion includes
tool_calls, capture them inoutputDecision.toolCalls[]on the trace; the Multi-step agents recipe shows the schema. - Streaming completions. For
stream=Truecalls, emit the trace once after the stream closes with the assembled response inoutputDecision.actionand the streaming duration inmetadata.streamDurationMs. Do not trace per-chunk — one trace per logical decision is the rule. - Async OpenAI client. Swap
from openai import OpenAIforfrom openai import AsyncOpenAIandadjudon.traceforadjudon.atrace; everything else is identical.
Privacy posture
The OpenAI call sends the prompt to OpenAI; the Adjudon trace sends the prompt to Adjudon. Both surfaces carry the customer data, but Adjudon runs the standard PII scrubber on the trace payload before persistence and on every downstream read. The OpenAI side is governed by your OpenAI data-processing agreement — the two surfaces are independent, and your privacy posture must account for both. For organisations that route AI prompts to a private deployment (Azure OpenAI in EU regions, self-hosted models), the trace shape on Adjudon's side is identical; only the upstream call changes.
See also
- Anthropic tracing — the same pattern for Claude
- OpenTelemetry — the collector-driven alternative to manual wrapping
- Multi-step agents — capturing tool-use traces
- Traces & Confidence — how the base-pillar confidence feeds the engine