Tracing the Anthropic SDK
Goal
Auto-instrument the Anthropic Python SDK so every
messages.create() call — including tool-use turns
— produces an Adjudon trace without touching the call
sites. The Anthropic client keeps working as before; the
trace emerges from a one-line wrap at construction time.
Unlike the OpenAI recipe (which
uses manual wrapping), Adjudon ships a dedicated
adjudon-anthropic-tools package that patches the client's
messages.create in place. Tool-use blocks in the response
are extracted into outputDecision.toolCalls[] automatically.
You'll need
- An Adjudon Sandbox plan (or above)
- An
adj_test_*agent API key - An Anthropic API key
- Python 3.9+ with
anthropicandadjudon-anthropic-tools
pip install anthropic adjudon-anthropic-tools
export ADJUDON_API_KEY="adj_test_..."
export ANTHROPIC_API_KEY="sk-ant-..."
Code
import os
import anthropic
from adjudon_anthropic_tools import wrap_anthropic
# One-line wrap — every messages.create() is now traced.
client = wrap_anthropic(
anthropic.Anthropic(),
api_key=os.environ["ADJUDON_API_KEY"],
agent_id="research-agent",
)
# Use the client exactly as before.
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{"role": "user", "content": "What does GDPR Article 22 cover?"},
],
tools=[
{
"name": "lookup_regulation",
"description": "Look up a specific GDPR article",
"input_schema": {
"type": "object",
"properties": {"article": {"type": "string"}},
"required": ["article"],
},
}
],
)
# The Anthropic response is unchanged — work with it as you always would.
for block in response.content:
if block.type == "text":
print(block.text)
elif block.type == "tool_use":
print(f"Tool call: {block.name}({block.input})")
Run it:
python anthropic_traced.py
# → "GDPR Article 22 covers automated individual decision-making..."
# → Tool call: lookup_regulation({'article': '22'})
What just happened
The wrap_anthropic call patched the Anthropic client's
messages.create method in-place. The next time your code
called client.messages.create(...), the wrapper:
- Started a wall-clock timer.
- Forwarded the call to the original Anthropic SDK.
- On response, extracted the prompt from
kwargs.messages, the text-completion from anytextblocks, and the tool-use parameters from anytool_useblocks. - Submitted one Adjudon trace with:
inputContext.prompt: the last user messageinputContext.model: the requested Claude modeloutputDecision.action: the assembled text responseoutputDecision.toolCalls[]: eachtool_useblock as{ tool, args }metadata.llmProvider: 'anthropic'metadata.tokensInput,metadata.tokensOutput,metadata.stopReason,metadata.durationMs
The original response object is returned unchanged — the
adapter never mutates Anthropic's reply. Your code reads
response.content exactly the way the Anthropic docs
describe it, and the trace runs as a side effect on the
return path.
What gets traced when
The wrapper captures the same trace shape every call regardless
of the response variant. A response with one text block produces
a trace with outputDecision.action set to the text and
outputDecision.toolCalls[] empty. A response with one or more
tool_use blocks produces a trace with outputDecision.action
set to whatever text accompanied the tool call (often empty)
and outputDecision.toolCalls[] populated with one entry per
tool block. A pure-tool_use response (the Claude model
deciding "I need to call a tool before I answer") still
produces a trace; the auditor reads it as "the agent decided
to consult external data before answering."
The metadata.stopReason field captures Anthropic's
stop_reason directly: end_turn, max_tokens, stop_sequence,
or tool_use. Tracking this is non-optional for compliance —
a max_tokens truncation is a different audit signal than a
clean end_turn, and the regulator wants to see when a
response ran out of budget.
Tool-use coverage
Anthropic's tool-use protocol is a multi-turn dance: the
model emits a tool_use block; your code runs the tool;
your code sends the result back via a tool_result block in
the next messages.create call. Each turn in the dance
produces its own trace under the same agent ID. To group
the turns into one auditable conversation, pass an explicit
metadata.conversationId on each call:
import uuid
conv_id = f"conv-{uuid.uuid4()}"
# First turn — model emits tool_use
resp1 = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[{"role": "user", "content": "Look up GDPR Art. 22"}],
tools=[...],
metadata={"conversationId": conv_id},
)
# Second turn — your code returned the tool result, model emits final answer
resp2 = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
messages=[
{"role": "user", "content": "Look up GDPR Art. 22"},
{"role": "assistant", "content": resp1.content},
{"role": "user", "content": [
{"type": "tool_result", "tool_use_id": "...", "content": "..."},
]},
],
metadata={"conversationId": conv_id},
)
The dashboard's
Decision Log
groups by conversationId; both turns surface as one
expandable thread.
Configuration
The wrapper accepts the same options as the LangChain handler:
| Option | Default | Description |
|---|---|---|
sample_rate | 1.0 | Fraction of calls to trace; lower for high-volume dev |
raise_on_block | False | Convert a policy block verdict into AdjudonBlockedException instead of returning passthrough |
metadata | {} | Default metadata merged onto every trace |
Privacy posture
Same dual-surface concern as the OpenAI recipe: the
messages.create call sends the prompt to Anthropic's API,
the wrapper sends a parallel trace to Adjudon. Each surface
is governed by its own data-processing agreement — your
Anthropic DPA on the
upstream side, the Adjudon DPA at
adjudon.com/legal/dpa on the audit-layer side. The
PII scrubber runs on the
trace payload before persistence; the audit posture on
Adjudon's side is the same regardless of whether the
upstream is Anthropic Public API, AnthropicBedrock on AWS,
or AnthropicVertex on GCP.
Going further
- Async client. Swap
wrap_anthropicforwrap_async_anthropicand the rest of the recipe is identical. - Streaming. For
client.messages.stream(...), trace once after the stream closes with the assembled response. The package's roadmap includes auto-stream support; today streaming calls bypass the wrapper. The practical workaround is to calladjudon.trace(...)manually after the stream's final-chunk handler fires, with the re-assembled text asoutputDecision.actionand the streaming wall-time inmetadata.streamDurationMs. This matches the OpenAI streaming pattern documented in the OpenAI recipe. - Sample rate in production. The default
sample_rate=1.0is correct for compliance — every Claude decision is audited. Reduce only on internal benchmark or prompt-engineering scripts where the audit trail is not the point. - Vertex AI / Bedrock. The wrapper patches
messages.createon whatever Anthropic-shaped client you pass in — the same call pattern works againstAnthropicVertexandAnthropicBedrockclients with no code change.
See also
- OpenAI tracing — the manual-wrap pattern for OpenAI
- Multi-step agents — the tool-use trace shape this adapter produces
- Python SDK — the parent package family
- Traces & Confidence — how each traced Anthropic call is scored