Chapter 3. Messages, Turns, and the Transcript

Previously: we built a forty-line loop against a mock provider and watched it break five ways. Break 5 — tool output overwhelming the transcript — hinted that the transcript was doing too much work as a pile of dicts. It's time to give it some structure, and at the same time plug in real providers.

A message is not a dict. A message is a typed record with a role, ordered content blocks, a provenance, a token cost, and a creation timestamp. A dict can hold all of those, but it can also hold none of them — and a loop that treats {"role": "user", "content": "..."} and {"role": "user", "content": [{"type": "text", "text": "..."}]} as interchangeable will sooner or later send the wrong shape to the wrong provider and get a 400 back.

By the end of this chapter, three things are true of your harness:

  1. The transcript is a typed data structure, not a list of dicts.
  2. It translates cleanly to and from at least three provider formats (Anthropic, OpenAI, and a generic OSS adapter).
  3. The loop from Chapter 2 still works, unchanged in logic, but now routed through the adapter layer.
Anthropic
{
  "role": "assistant",
  "content": [
    {"type": "tool_use",
     "id": "t_1",
     "name": "calc",
     "input": {"expr": "2+2"}}
  ]
}
OpenAI
{
  "role": "assistant",
  "tool_calls": [
    {"id": "t_1",
     "type": "function",
     "function": {
       "name": "calc",
       "arguments": "{\"expr\":\"2+2\"}"}}
  ]
}
Internal Transcript (ToolCall block)
Two wire shapes, one internal Transcript — the adapter seam absorbs the difference.

3.1 The Problem With Dicts

Three failure modes, all of which I've shipped to production at least once.

Shape drift. Anthropic uses content blocks: [{"type": "text", "text": "..."}]. OpenAI uses plain strings for user messages and structured objects for assistant tool calls. A dict-based transcript has to guess which shape is live at any moment. The guesses are usually right. The times they're wrong, you find out in the worst possible way: a parser that silently accepts malformed input, a model that hallucinates because a tool result came back empty, a test that passes locally and fails in production.

Role confusion. Is a tool result a user message (Anthropic's convention) or a tool message (OpenAI's)? If your code picks one and hard-codes it, you've locked yourself to that provider. If your code picks the wrong one, you've made your own life harder for no benefit.

Accounting becomes impossible. To track how much context you've consumed, you need to know what each message is — and a list of dicts requires re-parsing every time you want to answer that question, whereas a typed list exposes the structure up front and lets the accountant pattern-match on block kinds instead of guessing.

We're going to fix all three in the same file, and it's going to cost us about eighty lines of dataclasses plus a thin Transcript wrapper.


3.2 The Canonical Shape

Here are the types we'll use for the rest of the book — worth memorizing, because they show up in every chapter from this point on, and the vocabulary ("blocks", "role", "the ToolCall block", "the ReasoningBlock's metadata") recurs without reintroduction.

# src/harness/messages.py
from __future__ import annotations

from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Literal
from uuid import uuid4


Role = Literal["user", "assistant", "system"]


@dataclass(frozen=True)
class TextBlock:
    text: str
    kind: Literal["text"] = "text"


@dataclass(frozen=True)
class ToolCall:
    id: str
    name: str
    args: dict
    kind: Literal["tool_call"] = "tool_call"


@dataclass(frozen=True)
class ToolResult:
    call_id: str
    content: str
    is_error: bool = False
    kind: Literal["tool_result"] = "tool_result"


@dataclass(frozen=True)
class ReasoningBlock:
    """Model-internal reasoning ("thinking" on Anthropic, "reasoning" on OpenAI).

    Emitted by reasoning-enabled providers before the final answer or tool
    call. `metadata` holds vendor-specific fields (notably Anthropic's
    opaque `signature`) that the adapter needs to round-trip.
    """
    text: str
    metadata: dict = field(default_factory=dict)
    kind: Literal["reasoning"] = "reasoning"


Block = TextBlock | ToolCall | ToolResult | ReasoningBlock


@dataclass
class Message:
    role: Role
    blocks: list[Block]
    created_at: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
    id: str = field(default_factory=lambda: str(uuid4()))

    @classmethod
    def user_text(cls, text: str) -> "Message":
        return cls(role="user", blocks=[TextBlock(text)])

    @classmethod
    def assistant_text(cls, text: str, *,
                       reasoning: ReasoningBlock | None = None) -> "Message":
        blocks: list[Block] = []
        if reasoning is not None:
            blocks.append(reasoning)
        blocks.append(TextBlock(text))
        return cls(role="assistant", blocks=blocks)

    @classmethod
    def assistant_tool_call(cls, call: ToolCall, *,
                            reasoning: ReasoningBlock | None = None) -> "Message":
        blocks: list[Block] = []
        if reasoning is not None:
            blocks.append(reasoning)
        blocks.append(call)
        return cls(role="assistant", blocks=blocks)

    @classmethod
    def tool_result(cls, result: ToolResult) -> "Message":
        # conventionally attached to the "user" role;
        # the adapter remaps this for providers that use "tool".
        return cls(role="user", blocks=[result])

Five things to notice.

Block is a union of four kinds. Everything a message can contain — plain text, a tool call the assistant wants to make, a result from a tool call, and an optional reasoning trace — is one of those four. The loop never has to ask "what shape is this?"; it pattern-matches on the kind discriminant.

All four block types are frozen. Messages are immutable once created. If you want to modify one, you replace it. This prevents a whole class of bugs where code downstream of the loop mutates a message and the next turn sends different content than was rendered.

Tool results live in a user-role message. This matches Anthropic's convention and is straightforward to remap for OpenAI. The role is a transport detail; the block type is the semantics.

ReasoningBlock rides alongside text or tool calls on an assistant turn. The factory methods assistant_text(..., reasoning=...) and assistant_tool_call(..., reasoning=...) take an optional ReasoningBlock that gets placed before the primary block. That's why the blocks list can have two entries on a reasoning-enabled turn: the reasoning trace first, then the tool call or final answer. More on reasoning below.

Each message gets a UUID and a timestamp. These cost nothing at creation time and save hours when debugging. If your compaction policy later drops a message, the UUID tells you which one.

A closer look at ReasoningBlock

Reasoning models — Anthropic's Extended Thinking, OpenAI's o-series and gpt-5 reasoning models, DeepSeek R1, and several others — produce a chain-of-thought trace before the final answer or tool call. Anthropic calls these blocks thinking, OpenAI calls them reasoning, and the underlying mechanism is the same. The technique traces back to Wei et al.'s 2022 paper "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models," which demonstrated that explicit step-by-step reasoning in the model's output substantially improved performance on arithmetic and commonsense tasks. The 2024–2025 generation of reasoning models turned that prompt-engineering trick into a dedicated internal phase: the model now produces a reasoning trace before its visible answer as a matter of course, and providers expose the trace as a first-class block type rather than asking callers to parse it out of free-form text. We name our internal type after the broader industry term rather than either vendor's.

Three things to know about reasoning tokens before we write any adapter code.

They're billed as output tokens. A task that used to cost 200 output tokens may now cost 2,000 — most of it reasoning. Chapter 7's accountant counts reasoning text against the "history" component when it ends up in the transcript; Chapter 20's budget enforcer sees reasoning as output spend.

They're usually not fed back to the model. Reasoning is the model's private scratchpad. OpenAI's Responses API keeps reasoning server-side and replays it via previous_response_id; if you don't use that flow (and we don't — our architecture is stateless), the model starts fresh each turn and previous reasoning is dropped from input. Anthropic will accept reasoning round-tripped in messages, but only with the opaque signature field attached and only when extended thinking stays enabled. The metadata dict on ReasoningBlock is where the signature lives.

Anthropic requires round-trip when thinking + tools are on. If you enable extended thinking and the model uses a tool, the next request must include the assistant's thinking block (with signature) alongside the tool_use block. Drop the thinking, and the API rejects the request. The Anthropic adapter (a few pages down) handles this: with thinking on, reasoning stays in the transcript and serializes back out; with thinking off, nothing is ever generated.

The matching change on the ProviderResponse side is an extra field that carries whatever reasoning the provider emitted this turn:

# src/harness/providers/base.py (preview — full definition in §3.3)

@dataclass(frozen=True)
class ProviderResponse:
    text: str | None = None
    tool_call_id: str | None = None
    tool_name: str | None = None
    tool_args: dict | None = None
    reasoning_text: str | None = None   # the trace, if any
    input_tokens: int = 0
    output_tokens: int = 0
    reasoning_tokens: int = 0           # subset of output_tokens, broken out

The loop doesn't branch on reasoning — it dispatches on tool call vs. text as before. Reasoning shows up on the response so adapters can persist it, the accountant can count it, and observability (Chapter 18) can surface it.

One factory method on Message ties this together. The loop calls it every turn; adapters decide on translation whether to round-trip the reasoning. Add it inside the Message class — alongside assistant_text, assistant_tool_call, and the rest — not as a module-level function:

# src/harness/messages.py (add to the Message class)

class Message:
    # ... role, blocks, created_at, id, the four existing factory methods ...

    @classmethod
    def from_assistant_response(cls, response) -> "Message":
        """Build an assistant Message from a ProviderResponse.

        Reasoning (if emitted) comes first as a ReasoningBlock; the
        primary output (text or tool call) follows. Vendor-specific
        metadata (OpenAI's encrypted reasoning items, Anthropic's thinking
        signature) is merged into `ReasoningBlock.metadata` so adapters
        can round-trip reasoning on the next turn.
        """
        reasoning = None
        has_reasoning = (
            bool(response.reasoning_text)
            or bool(getattr(response, "reasoning_metadata", None))
        )
        if has_reasoning:
            meta: dict = {"provider_tokens": response.reasoning_tokens}
            meta.update(getattr(response, "reasoning_metadata", None) or {})
            reasoning = ReasoningBlock(
                text=response.reasoning_text or "",
                metadata=meta,
            )
        blocks: list[Block] = []
        if reasoning is not None:
            blocks.append(reasoning)
        if response.tool_calls:
            # One assistant message carries every ToolCall block from
            # this turn — both providers accept multi-tool_use messages
            # on round-trip.
            for ref in response.tool_calls:
                blocks.append(ToolCall(id=ref.id, name=ref.name,
                                        args=dict(ref.args)))
        else:
            blocks.append(TextBlock(text=response.text or ""))
        return cls(role="assistant", blocks=blocks)

That's the whole data model. §3.4's adapters translate each block type to and from the two vendor wire formats; §3.5's loop uses from_assistant_response every turn.

The Transcript is a thin wrapper:

# src/harness/messages.py (continued)

@dataclass
class Transcript:
    messages: list[Message] = field(default_factory=list)
    system: str | None = None

    def append(self, message: Message) -> None:
        self.messages.append(message)

    def extend(self, messages: list[Message]) -> None:
        self.messages.extend(messages)

    def last(self) -> Message | None:
        return self.messages[-1] if self.messages else None

    def __len__(self) -> int:
        return len(self.messages)

System prompts are separate from the messages list. Every provider handles them slightly differently — Anthropic takes it as a top-level parameter, OpenAI as the first message in the list, some OSS models bake it into a template. Keeping it apart means the adapters decide how to inject it.


3.3 The Provider Protocol, Upgraded

Now that we have typed messages, the Provider protocol can use them directly. Replace the old base.py:

# src/harness/providers/base.py
from __future__ import annotations

from dataclasses import dataclass, field
from typing import Protocol

from ..messages import Transcript


@dataclass(frozen=True)
class ProviderResponse:
    """A provider's response to one complete() call.

    Exactly one of (text, tool_call) is set. `reasoning_text` is
    orthogonal — it may accompany either a text answer or a tool call
    when the provider is configured to emit reasoning. `reasoning_metadata`
    holds vendor-specific replay data (OpenAI's encrypted reasoning items,
    Anthropic's thinking signature) that `Message.from_assistant_response`
    folds into the `ReasoningBlock.metadata` so the adapter can round-trip
    reasoning on the next turn.
    """
    text: str | None = None
    tool_call_id: str | None = None
    tool_name: str | None = None
    tool_args: dict | None = None
    reasoning_text: str | None = None
    reasoning_metadata: dict = field(default_factory=dict)
    input_tokens: int = 0
    output_tokens: int = 0
    reasoning_tokens: int = 0  # subset of output_tokens, broken out for accounting

    @property
    def is_tool_call(self) -> bool:
        return self.tool_name is not None

    @property
    def is_final(self) -> bool:
        return self.text is not None and self.tool_name is None


class Provider(Protocol):
    name: str

    def complete(self, transcript: Transcript, tools: list[dict]) -> ProviderResponse:
        ...

Three additions since Chapter 2. First, input_tokens and output_tokens — the provider knows what it cost; we want that visible at the protocol level so Chapter 7's accountant doesn't have to estimate. Second, reasoning_text / reasoning_tokens / reasoning_metadata — see §3.2's discussion; every adapter populates these when the model emits a trace, and zero-fills them otherwise. Third, name as a discriminator, so logs and traces can identify which provider served a given response.


3.4 The Adapters

Three adapters. Keep them small; the job of an adapter is to translate, not to decide.

The Anthropic adapter

# src/harness/providers/anthropic.py
from __future__ import annotations

import os
from typing import Any

from ..messages import (
    Block, Message, ReasoningBlock, TextBlock, ToolCall, ToolResult, Transcript,
)
from .base import Provider, ProviderResponse


class AnthropicProvider(Provider):
    name = "anthropic"

    def __init__(self, model: str = "claude-sonnet-4-6",
                 client: Any | None = None,
                 enable_thinking: bool = False,
                 thinking_budget_tokens: int = 2000,
                 max_tokens: int = 4096) -> None:
        self.model = model
        self.enable_thinking = enable_thinking
        self.thinking_budget_tokens = thinking_budget_tokens
        self.max_tokens = max_tokens
        if client is None:
            # Import the specific symbol (not `import anthropic`) so there's no
            # ambiguity with this module's own name, `harness.providers.anthropic`.
            from anthropic import Anthropic  # external SDK
            client = Anthropic()
        self._client = client

    def complete(self, transcript: Transcript, tools: list[dict]) -> ProviderResponse:
        kwargs: dict[str, Any] = {
            "model": self.model,
            "max_tokens": self.max_tokens,
            "messages": [_to_anthropic(m, self.enable_thinking)
                          for m in transcript.messages],
            "tools": tools,
        }
        if transcript.system:
            kwargs["system"] = transcript.system
        if self.enable_thinking:
            kwargs["thinking"] = {
                "type": "enabled",
                "budget_tokens": self.thinking_budget_tokens,
            }
        # Parallel tool use stays on (Anthropic's default). Chapter 5's
        # `accumulate` collects every tool_use into `ProviderResponse.tool_calls`,
        # and the loop dispatches them sequentially in arrival order.

        raw = self._client.messages.create(**kwargs)
        return _from_anthropic(raw)


def _to_anthropic(message: Message, keep_reasoning: bool) -> dict:
    # Drop ReasoningBlocks when thinking isn't enabled — the API rejects
    # `thinking` blocks without the feature turned on. With thinking on,
    # reasoning (including its signature) must round-trip.
    content: list[dict] = []
    for block in message.blocks:
        if isinstance(block, ReasoningBlock) and not keep_reasoning:
            continue
        content.append(_block_to_anthropic(block))
    return {"role": message.role, "content": content}


def _block_to_anthropic(block: Block) -> dict:
    match block:
        case TextBlock(text=t):
            return {"type": "text", "text": t}
        case ToolCall(id=i, name=n, args=a):
            return {"type": "tool_use", "id": i, "name": n, "input": a}
        case ToolResult(call_id=i, content=c, is_error=err):
            return {"type": "tool_result", "tool_use_id": i,
                    "content": c, "is_error": err}
        case ReasoningBlock(text=t, metadata=meta):
            out: dict[str, Any] = {"type": "thinking", "thinking": t}
            if (sig := meta.get("signature")) is not None:
                out["signature"] = sig  # required on round-trip
            return out


def _from_anthropic(raw: Any) -> ProviderResponse:
    # Gather any thinking trace first — it may accompany either a tool_use
    # or a text answer, and we want to preserve it on ProviderResponse so
    # the loop's `Message.from_assistant_response` puts it in the transcript.
    thinking_texts = [b.thinking for b in raw.content if b.type == "thinking"]
    reasoning_text = "\n".join(thinking_texts) if thinking_texts else None

    for block in raw.content:
        if block.type == "tool_use":
            return ProviderResponse(
                tool_call_id=block.id,
                tool_name=block.name,
                tool_args=dict(block.input),
                reasoning_text=reasoning_text,
                input_tokens=raw.usage.input_tokens,
                output_tokens=raw.usage.output_tokens,
            )

    # No tool call → concatenate text blocks for the final answer.
    texts = [b.text for b in raw.content if b.type == "text"]
    return ProviderResponse(
        text="\n".join(texts),
        reasoning_text=reasoning_text,
        input_tokens=raw.usage.input_tokens,
        output_tokens=raw.usage.output_tokens,
    )

The match statement in _block_to_anthropic is the pattern we'll use throughout the book for discriminating blocks. It's exhaustive: adding a new block type and forgetting to handle it becomes a MatchError rather than silent data loss — and notice ReasoningBlock is a first-class case alongside text/tool_use/tool_result.

_from_anthropic does three passes over the response content — thinking first (so we can attach it regardless of which primary path fires), then tool_use, then falling back to text. This mirrors what Chapter 5's streaming version does; the only difference is that streaming emits ReasoningDelta events as they arrive rather than collecting them at the end.

One Anthropic-specific wrinkle the _block_to_anthropic case covers: a thinking block round-tripped on a subsequent turn must carry its opaque signature field. We stashed the signature in ReasoningBlock.metadata when we first captured it (the real harness streaming adapter reads it off the completed block); passing it back is how the API verifies the reasoning hasn't been tampered with. Miss the signature and the API rejects the request.

The OpenAI adapter

A word on which OpenAI API to target, and a short detour through how we got here. OpenAI introduced function calling as a first-class feature in Chat Completions in June 2023, establishing the pattern of tool calls as typed output blocks that every major vendor has since adopted. Before that release, tool use in production systems was a prompt-engineering exercise: you'd ask the model to emit JSON in a particular format, then parse free-form text looking for it, and a substantial fraction of what people called "tool-use failures" were actually parser failures — the model hadn't produced invalid output, the downstream code had simply misread it. The typed-block approach — picked up by Anthropic in 2024 and now the de facto standard across the industry — is the protocol-level shift that makes this book's ToolCall and ToolResult types possible in the first place.

OpenAI currently ships two APIs on top of that foundation. Chat Completions (client.chat.completions.create) is the 2023-era surface. Responses (client.responses.create, introduced in 2025) is the newer one, and OpenAI now actively recommends Responses for agentic use — it is stateful, supports built-in tools like web_search and code_interpreter, and powers OpenAI's own Agents SDK. Chat Completions remains available but is no longer the preferred surface for new work.

We use Responses, for two reasons worth naming.

First, vendor direction. When the platform owner says "this is the supported agentic surface going forward," a book teaching agent harnesses that ignores the recommendation ages badly. Chat Completions will keep working, but new capabilities — structured outputs on tool calls, built-in tools, the richer streaming event vocabulary — are shipping on Responses first.

Second, coverage is closing fast on the OSS side. vLLM and Ollama both speak Responses now, and more open-source servers ship support each quarter. Our LocalProvider — the subclass pointing AsyncOpenAI at a local endpoint — works against any server that implements /v1/responses. If you hit a server that only exposes /v1/chat/completions, add a sibling OpenAIChatCompletionsProvider against the same Provider protocol; that's exactly what the adapter seam is for, and it's the kind of change that touches one file and nothing else.

The Responses API is a little more verbose than Chat Completions — input items are typed (function_call, function_call_output, message) rather than role-tagged strings — but the typing absorbs ambiguity that the Chat Completions shape papered over. Tool calls and tool results become first-class input items instead of array-nested dict keys, which is the same philosophical move our Transcript made, and the adapter has correspondingly less translation work to do.

# src/harness/providers/openai.py
from __future__ import annotations

import json
from typing import Any

from typing import Literal

from ..messages import (
    Block, Message, ReasoningBlock, TextBlock, ToolCall, ToolResult, Transcript,
)
from .base import Provider, ProviderResponse


ReasoningEffort = Literal["minimal", "low", "medium", "high"]


class OpenAIProvider(Provider):
    name = "openai"

    def __init__(self, model: str = "gpt-5", client: Any | None = None,
                 reasoning_effort: ReasoningEffort | None = None) -> None:
        self.model = model
        self.reasoning_effort = reasoning_effort
        if client is None:
            # Import the specific symbol (not `import openai`) so there's no
            # ambiguity with this module's own name, `harness.providers.openai`.
            from openai import OpenAI  # external SDK
            client = OpenAI()
        self._client = client

    def complete(self, transcript: Transcript, tools: list[dict]) -> ProviderResponse:
        input_items: list[dict] = []
        for m in transcript.messages:
            input_items.extend(_to_responses_input(m))

        responses_tools = [_tool_to_responses(t) for t in tools] if tools else None
        kwargs: dict[str, Any] = {"model": self.model, "input": input_items}
        if transcript.system:
            kwargs["instructions"] = transcript.system  # system prompt, top-level
        if responses_tools:
            kwargs["tools"] = responses_tools
            # Parallel tool calls stay on (Responses default). Chapter 5's
            # `accumulate` handles the batch.
        if self.reasoning_effort is not None:
            kwargs["reasoning"] = {"effort": self.reasoning_effort}
            # Ask Responses for the encrypted reasoning blob so we can replay
            # it across turns without `previous_response_id`. We run stateless
            # — `store=False` opts out of server-side conversation storage.
            kwargs["include"] = ["reasoning.encrypted_content"]
            kwargs["store"] = False

        raw = self._client.responses.create(**kwargs)
        return _from_responses(raw)


def _tool_to_responses(tool: dict) -> dict:
    # Our canonical tool shape is Anthropic-flavoured: {name, description, input_schema}.
    # Responses flattens function tools: {type, name, description, parameters}.
    return {
        "type": "function",
        "name": tool["name"],
        "description": tool.get("description", ""),
        "parameters": tool.get("input_schema", tool.get("parameters", {})),
    }


def _to_responses_input(message: Message) -> list[dict]:
    # Tool results become function_call_output items (no role — typed directly).
    if any(isinstance(b, ToolResult) for b in message.blocks):
        return [
            {"type": "function_call_output", "call_id": b.call_id, "output": b.content}
            for b in message.blocks if isinstance(b, ToolResult)
        ]

    # Reasoning items get replayed to Responses so chain-of-thought carries
    # across turns. We stashed the opaque `id` + `encrypted_content` in
    # metadata on the way in; if the metadata is missing (e.g. the
    # ReasoningBlock came from Anthropic, or reasoning wasn't enabled on
    # the provider that produced it), we skip — Responses won't accept a
    # raw text reasoning item.
    items: list[dict] = []
    for b in message.blocks:
        if isinstance(b, ReasoningBlock):
            for spec in b.metadata.get("openai_items") or []:
                item: dict[str, Any] = {
                    "type": "reasoning",
                    "summary": spec.get("summary") or [],
                }
                if rid := spec.get("id"):
                    item["id"] = rid
                if enc := spec.get("encrypted_content"):
                    item["encrypted_content"] = enc
                items.append(item)

    # Assistant tool calls become function_call items.
    if any(isinstance(b, ToolCall) for b in message.blocks):
        for b in message.blocks:
            if isinstance(b, ToolCall):
                items.append({
                    "type": "function_call",
                    "call_id": b.id,
                    "name": b.name,
                    "arguments": json.dumps(b.args),
                })
        return items

    # Plain text keeps its role/content shape.
    text = "\n".join(b.text for b in message.blocks if isinstance(b, TextBlock))
    return [{"role": message.role, "content": text}]


def _from_responses(raw: Any) -> ProviderResponse:
    # Extract reasoning first so it attaches to whichever primary output fires.
    # Two things to collect from each reasoning item: the summary text (for
    # humans / logs) and the opaque id + encrypted_content (for replay on
    # the next turn via `_to_responses_input`).
    reasoning_parts: list[str] = []
    openai_items: list[dict] = []
    for item in raw.output:
        if item.type == "reasoning":
            for summary in getattr(item, "summary", []) or []:
                text = getattr(summary, "text", None)
                if text:
                    reasoning_parts.append(text)
            openai_items.append({
                "id": getattr(item, "id", "") or "",
                "encrypted_content": getattr(item, "encrypted_content", "") or "",
                "summary": [],  # send back empty; summaries aren't required on replay
            })
    reasoning_text = "\n".join(reasoning_parts) if reasoning_parts else None
    reasoning_metadata = {"openai_items": openai_items} if openai_items else {}

    # Responses breaks reasoning tokens out under usage.output_tokens_details.
    details = getattr(raw.usage, "output_tokens_details", None)
    reasoning_tokens = int(getattr(details, "reasoning_tokens", 0) or 0) if details else 0

    # If the model issued a tool call, return the first one.
    for item in raw.output:
        if item.type == "function_call":
            return ProviderResponse(
                tool_call_id=item.call_id,
                tool_name=item.name,
                tool_args=json.loads(item.arguments),
                reasoning_text=reasoning_text,
                reasoning_metadata=reasoning_metadata,
                input_tokens=raw.usage.input_tokens,
                output_tokens=raw.usage.output_tokens,
                reasoning_tokens=reasoning_tokens,
            )

    # Otherwise, concatenate the output_text blocks from any message items.
    texts: list[str] = []
    for item in raw.output:
        if item.type == "message":
            for block in item.content:
                if block.type == "output_text":
                    texts.append(block.text)
    return ProviderResponse(
        text="\n".join(texts),
        reasoning_text=reasoning_text,
        reasoning_metadata=reasoning_metadata,
        input_tokens=raw.usage.input_tokens,
        output_tokens=raw.usage.output_tokens,
        reasoning_tokens=reasoning_tokens,
    )

Same shape as _from_anthropic: reasoning comes first, then the primary path branches on tool call vs. text. The difference is how the two APIs surface the count — Anthropic rolls reasoning tokens into usage.output_tokens, OpenAI breaks them out under usage.output_tokens_details.reasoning_tokens. Our reasoning_tokens field captures whichever the provider exposes, which lets Chapter 20's router and Chapter 18's traces show the breakdown without caring which vendor produced it.

One more difference: _from_responses captures the opaque id and encrypted_content from every reasoning item into reasoning_metadata.openai_items. Those fields are what make stateless reasoning replay possible — the next turn's _to_responses_input emits a matching {type: "reasoning", id, encrypted_content} item so the model sees its own chain-of-thought from the previous turn. Without this, reasoning models effectively "forget" their thinking each turn. The include=["reasoning.encrypted_content"] request flag and store=False (we saw both up in complete()) are what make the encrypted blob show up in the first place; together they give us chain-of-thought continuity without relying on OpenAI's server-side previous_response_id conversation storage. Anthropic handles the same round-trip via signature on thinking blocks; our harness-internal ReasoningBlock.metadata dict holds whichever convention applies.

Notice _to_responses_input still returns a list, not one item. One of our Message objects can expand into multiple Responses input items — a function_call followed by a function_call_output on the next turn, or two function calls from a single assistant turn. The adapter absorbs the asymmetry; the rest of the harness never sees it.

Also notice the translation is almost mechanical. Transcript already distinguishes ToolCall from ToolResult as typed block subclasses; Responses already distinguishes function_call from function_call_output as typed input items. Both sides agree that tool calls and tool results are different things with different shapes — all the adapter does is rename the fields.

The OSS adapter

For open-source models served through a local endpoint (llama.cpp, vLLM, Ollama), we use OpenAI-compatible mode — almost every serious local server supports it. The adapter is one line:

# src/harness/providers/local.py
from __future__ import annotations

from .openai import OpenAIProvider


class LocalProvider(OpenAIProvider):
    name = "local"

    def __init__(self, model: str = "llama-3.1-8b-instruct",
                 base_url: str = "http://localhost:8000/v1") -> None:
        # Import the specific symbol so there's no ambiguity with the sibling
        # module `harness.providers.openai`.
        from openai import OpenAI  # external SDK
        client = OpenAI(base_url=base_url, api_key="not-needed")
        super().__init__(model=model, client=client)

LocalProvider inherits all the OpenAI translation. This works for any server that speaks OpenAI's chat-completions protocol, which is the de facto OSS standard.

Turning reasoning on

§3.2 introduced ReasoningBlock as a first-class block type and both adapter snippets above already translate it. What we haven't shown is how a caller turns reasoning on. Both knobs are plain constructor arguments; the rest of the harness is unaffected.

from harness.providers.anthropic import AnthropicProvider
from harness.providers.openai import OpenAIProvider

anthropic = AnthropicProvider(
    enable_thinking=True,
    thinking_budget_tokens=4000,
    max_tokens=16000,     # must be larger than the thinking budget
)

openai = OpenAIProvider(
    reasoning_effort="medium",   # "minimal" | "low" | "medium" | "high"
)

The two providers agree on the outcome: reasoning tokens stream, accumulate into ProviderResponse.reasoning_text, the loop's Message.from_assistant_response puts them in the transcript as a ReasoningBlock, and the adapter decides whether to round-trip them on the way back out. Same loop code, reasoning-agnostic. That's the adapter seam doing its job.


3.5 Updating the Loop

The Chapter 2 loop used raw dicts. Now it uses Transcript and typed messages. The logic is identical; the types tighten:

# src/harness/agent.py
from __future__ import annotations

from typing import Callable

from .messages import Message, TextBlock, ToolCall, ToolResult, Transcript
from .providers.base import Provider


MAX_ITERATIONS = 20


def run(
    provider: Provider,
    tools: dict[str, Callable[..., str]],
    tool_schemas: list[dict],
    user_message: str,
    system: str | None = None,
) -> str:
    transcript = Transcript(system=system)
    transcript.append(Message.user_text(user_message))

    for _ in range(MAX_ITERATIONS):
        response = provider.complete(transcript, tool_schemas)

        if response.is_final:
            # from_assistant_response preserves reasoning (if any) alongside
            # the final text as a single assistant Message. With reasoning off
            # it's equivalent to Message.assistant_text(response.text).
            transcript.append(Message.from_assistant_response(response))
            return response.text or ""

        # Same story on the tool-call branch: reasoning rides with the
        # ToolCall blocks in one assistant message.
        transcript.append(Message.from_assistant_response(response))

        # Dispatch each call in arrival order. One tool_result message per
        # call; Chapter 5 keeps the same loop shape with the registry.
        for ref in response.tool_calls:
            try:
                result_text = tools[ref.name](**ref.args)
                result = ToolResult(call_id=ref.id, content=result_text)
            except KeyError:
                result = ToolResult(call_id=ref.id,
                                    content=f"unknown tool: {ref.name}",
                                    is_error=True)
            except Exception as e:
                result = ToolResult(call_id=ref.id, content=str(e),
                                    is_error=True)
            transcript.append(Message.tool_result(result))

    raise RuntimeError(f"agent did not finish in {MAX_ITERATIONS} iterations")

Three things earned by the refactor. The transcript now has a system field — Anthropic will pass it at the top level, OpenAI as the first message, your OSS provider however it wants. The loop doesn't care. Message.from_assistant_response(response) is the one-liner that persists both the primary output (text or tool call) and any ReasoningBlock the provider emitted, in a single assistant Message — the loop stays reasoning-agnostic. And the try/except around tool dispatch addresses Break 1 and Break 3 from Chapter 2 in a minimal way — the loop no longer crashes on unknown tools or exceptions; it returns a structured error to the model and lets it recover. This is a preview; Chapter 6 does it properly with schema validation and dedup.


3.6 Swapping Providers

The payoff. The Chapter 2 mock still works — it uses Transcript now, but the shape of the call hasn't changed:

# src/harness/providers/mock.py (updated)
from ..messages import Transcript
from .base import Provider, ProviderResponse


class MockProvider:
    name = "mock"

    def __init__(self, responses: list[ProviderResponse]) -> None:
        self._responses = list(responses)
        self._index = 0

    def complete(self, transcript: Transcript, tools: list[dict]) -> ProviderResponse:
        if self._index >= len(self._responses):
            raise RuntimeError("mock ran out of responses")
        response = self._responses[self._index]
        self._index += 1
        return response

The real providers drop in behind the same interface:

# examples/ch03_real_provider.py
import os
import sys

from harness.agent import run
from harness.providers.anthropic import AnthropicProvider
from harness.providers.openai import OpenAIProvider
from harness.providers.local import LocalProvider


def calc(expression: str) -> str:
    return str(eval(expression, {"__builtins__": {}}, {}))


tool_schemas = [{
    "name": "calc",
    "description": "Evaluate a Python arithmetic expression.",
    "input_schema": {
        "type": "object",
        "properties": {"expression": {"type": "string"}},
        "required": ["expression"],
    },
}]


# Choose the provider once. The rest of the script doesn't care which one.
provider_name = os.environ.get("PROVIDER", "anthropic")
required_env = {
    "anthropic": "ANTHROPIC_API_KEY",
    "openai": "OPENAI_API_KEY",
    "local": None,  # local servers don't need a key
}
env_var = required_env.get(provider_name)
if env_var and not os.environ.get(env_var):
    sys.exit(
        f"error: PROVIDER={provider_name} requires {env_var}. "
        f"Set it and re-run. For the local provider, use PROVIDER=local."
    )

provider = {
    "anthropic": AnthropicProvider,
    "openai": OpenAIProvider,
    "local": LocalProvider,
}[provider_name]()

answer = run(
    provider=provider,
    tools={"calc": calc},
    tool_schemas=tool_schemas,
    user_message="What is 17 * 23, minus 100?",
)
print(answer)

Before running it, set the API keys. The Anthropic SDK reads ANTHROPIC_API_KEY from the environment; the OpenAI SDK reads OPENAI_API_KEY. If either is missing, the SDK raises TypeError: Could not resolve authentication method … deep inside its HTTP layer — an error whose stack trace is long enough that it's worth learning to recognise on sight.

export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...

The LocalProvider path needs no key — it points at an OpenAI-compatible local server (llama.cpp, vLLM, Ollama, LM Studio). Set OPENAI_API_KEY=not-needed or leave it; the local server ignores it.

Run it three times:

PROVIDER=anthropic uv run examples/ch03_real_provider.py
PROVIDER=openai    uv run examples/ch03_real_provider.py
PROVIDER=local     uv run examples/ch03_real_provider.py  # assumes local endpoint

Three different models, same loop, same tool, same transcript type, no code change. That's the seam paying for itself.

Commit:

git add -A && git commit -m "ch03: typed transcript + Anthropic/OpenAI/local adapters"
git tag ch03-transcript

3.7 Why Tool Schemas Are Still Dicts

You may have noticed that tool_schemas is a list[dict]. That's deliberate, and temporary. Chapter 4 introduces a Tool class that owns its schema, its callable, its side-effect declaration, and its validator. At that point the adapter stops taking schemas as dicts and takes them as typed objects.

Why wait? Because the shape of a JSON schema is not complicated enough to earn its own abstraction yet, and every abstraction you add before its pain has been felt is debt. Chapter 4's Tool is motivated by the fact that two breaks from Chapter 2 (unknown tool, schema mismatch) are still being handled ad hoc in the loop's try/except. We fix them properly when we have the right shape to hang the fix on.


3.8 Try It Yourself

  1. Write a fourth adapter. Pick a provider we haven't covered — Gemini, Cohere, AWS Bedrock, Together AI, Groq. Read its docs. Write an adapter implementing the Provider protocol. Run the calculator example against it. How many lines did you need? Which parts were trivial? Which were friction?
  2. Break the translation deliberately. Mutate a TextBlock after creating it. What happens? (Hint: it's frozen, so the attempt raises.) Now write a subtly-wrong OpenAI translator that forgets to serialize tool_calls arguments as JSON strings. Run the example. Observe the provider's error. Note what the error told you and what it didn't.
  3. Add tracing to the adapter layer. Before and after each complete() call, log the input token count, the output token count, and the wall-clock duration. You've just built a minimal version of what Chapter 18 will formalize as observability. Keep it — we'll replace it with real OpenTelemetry spans later.