Previously: compaction. Older tool results get masked; the prefix gets summarized when masking isn't enough. The transcript survives long sessions. But anything the agent wanted to keep verbatim is at the mercy of the compactor.
A compactor is a janitor. It throws out what doesn't look important. That's fine for tool results and stale discussion, but sometimes the agent produces something it knows to be important — a plan, a discovery, a decision it wants to stand by. Those things should not be in the context window. They should be in durable storage the agent can read from on demand, structured so that compaction cannot touch them.
This chapter introduces the scratchpad pattern. The agent gets write_scratchpad and read_scratchpad tools backed by a filesystem directory. Context contains pointers (keys) the agent remembers; content lives on disk. The pattern has several well-known instances under different names: Wang et al.'s 2023 "Voyager: An Open-Ended Embodied Agent with Large Language Models" called it a skill library — a persistent repository of learned action sequences the agent could retrieve and reuse across episodes. Park et al.'s 2023 "Generative Agents: Interactive Simulacra of Human Behavior" used a memory stream with explicit retrieval to give simulated agents continuity across time. Claude Code calls it CLAUDE.md; Anthropic's multi-agent research system uses it for plans that must survive context truncation. The name varies; the pattern is identical, and the common thread is that non-trivial agents need state that lives outside the context window and survives the compactor.
By the end of this chapter, your agent can work across sessions. The scratchpad file is on disk; tomorrow's agent reads what today's agent wrote.
A tempting thought: why bother with external state when models support 200K or 1M tokens? Just write the plan into the conversation and let the compactor leave it alone.
Two reasons that doesn't work.
The compactor can't distinguish "important plan" from "verbose tool output" without being told. You can teach it ("preserve things tagged X"), but now your taxonomy of things is part of the compactor's concern, and every new kind of important state is a compactor change. An external scratchpad pushes that concern out to where the agent lives: the tool interface.
Context doesn't survive process death. If the harness crashes, or the user comes back tomorrow, the context is gone. The scratchpad file is still on disk. Chapter 21 builds durable checkpointing for the full session state; the scratchpad is its precursor, and a cheaper pattern that covers 80% of the use cases by itself.
There's a third reason — cost. Content in the scratchpad doesn't eat tokens on every turn; content in the context does. A 2,000-token plan that's relevant to three turns out of thirty is a 2,000 × 27 = 54,000-token waste in context, and a 2,000 × 3 = 6,000-token cost when read from scratchpad on the three turns that need it. Order of magnitude savings for anything the agent doesn't need every turn.
Three tools: write, read, list. A thin layer of discipline around a directory.
# src/harness/tools/scratchpad.py
from __future__ import annotations
from pathlib import Path
from .base import Tool
from .decorator import tool
class Scratchpad:
"""Durable per-session key-value store, exposed to the agent as tools."""
def __init__(self, root: Path | str) -> None:
self.root = Path(root)
self.root.mkdir(parents=True, exist_ok=True)
def _path(self, key: str) -> Path:
# simple key sanitization: allow alphanumerics, dash, underscore
safe = "".join(c for c in key if c.isalnum() or c in "-_")
if safe != key:
raise ValueError(f"invalid key {key!r}: use [A-Za-z0-9_-]+")
if not safe:
raise ValueError("key cannot be empty")
return self.root / f"{safe}.txt"
def write(self, key: str, content: str) -> str:
path = self._path(key)
path.write_text(content, encoding="utf-8")
return f"wrote {len(content)} chars to scratchpad[{key}]"
def read(self, key: str) -> str:
path = self._path(key)
if not path.exists():
raise KeyError(f"scratchpad[{key}] not found")
return path.read_text(encoding="utf-8")
def list(self) -> list[str]:
return sorted(p.stem for p in self.root.glob("*.txt"))
def as_tools(self) -> list[Tool]:
pad = self
@tool(side_effects={"write"})
def scratchpad_write(key: str, content: str) -> str:
"""Store a value in the scratchpad under the given key.
key: alphanumeric, dashes, underscores only. No slashes, dots.
content: any string; overwrites existing value for this key.
Side effects: writes one file to the scratchpad directory.
Use this for: plans, discovered facts, decisions that should
survive the context window. Write once, read on demand.
"""
return pad.write(key, content)
@tool(side_effects={"read"})
def scratchpad_read(key: str) -> str:
"""Retrieve a value from the scratchpad.
key: the key used when writing.
Returns the stored content, or an error if not found.
Side effects: reads one file.
"""
return pad.read(key)
@tool(side_effects={"read"})
def scratchpad_list() -> str:
"""List keys currently in the scratchpad.
Returns a newline-separated list of keys.
Side effects: reads the scratchpad directory.
Use this at the start of a session to discover what prior
agents (or you, in a past turn) have stored.
"""
keys = pad.list()
return "\n".join(keys) if keys else "(empty)"
return [scratchpad_write, scratchpad_read, scratchpad_list]
Three design points.
Keys are sanitized, not encoded. A key containing a slash or a dot would let the agent write outside the scratchpad directory — a path traversal by accident. We reject invalid keys with a clear error rather than silently rewriting them. The model learns the convention quickly: all of my keys are plan-a, findings-1, schema-cache.
Content is always a string. The scratchpad doesn't know about types. If the agent stores JSON, it stores a JSON string; on read, it gets a JSON string back. Type discipline is the agent's job, not the scratchpad's.
The three tools are created together via as_tools(). This is a pattern we'll reuse in the MCP chapter: a stateful component exposes itself to the model as a bundle of closure-captured tools, not as individual top-level functions. It keeps the Scratchpad instance private to the tool bundle and out of the global namespace.
The agent needs to know the scratchpad exists and what it's for. Good scratchpad use is a system-prompt decision, not a tool-description decision alone. A tool description says "what this tool does"; the system prompt says "when to reach for it."
You have access to a scratchpad — a durable key-value store that survives
the context window. Use it whenever you discover or decide something you
expect to need more than two turns later.
Examples of what to write to the scratchpad:
- Plans you commit to. If you decide on a 5-step approach, write it to
"plan" immediately. Read it back when you're unsure of your next step.
- Findings from expensive tools. If you ran a 10-second database query,
store the result in "query-result-1" so you don't have to re-run it.
- Constraints the user has expressed. "No changes to production" goes to
"constraints" and stays there for the session.
- Decisions you don't want to revisit. "Using port 8081 because 8080 is
taken" goes to "port-decision".
Call scratchpad_list() at the start of a session to see what's already
stored. Call scratchpad_read(key) to retrieve values you remember writing.
Call scratchpad_write(key, content) to persist. Use short keys:
"plan", "constraints", "query-result-1".
The scratchpad is durable. What you write here will be readable by future
sessions (including yourself, tomorrow). Treat it like a shared notebook.
This isn't decorative. The difference between an agent that uses the scratchpad well and one that doesn't is mostly the system prompt. Without these instructions, most models will write to it occasionally but not systematically. With them, they start every complex session by writing a plan to plan and reading it whenever they feel lost.
The scratchpad's value shows up on long sessions with expensive tool calls. Let's write one.
# examples/ch09_investigation.py
import asyncio
from pathlib import Path
from harness.agent import arun
from harness.context.accountant import ContextAccountant
from harness.context.compactor import Compactor
from harness.providers.anthropic import AnthropicProvider
from harness.tools.registry import ToolRegistry
from harness.tools.scratchpad import Scratchpad
from harness.tools.std import read_file, bash, calc
SYSTEM = """\
You are an investigative agent. You have a scratchpad — a durable key-value
store that survives the context window. Use it whenever you discover or
decide something you expect to need more than two turns later.
[... full scratchpad system prompt from 9.3 ...]
At the start of every session, call scratchpad_list() to see what's there.
"""
async def main() -> None:
provider = AnthropicProvider()
pad = Scratchpad(root=Path(".scratchpad"))
registry = ToolRegistry(
tools=[calc, read_file, bash] + pad.as_tools()
)
accountant = ContextAccountant()
compactor = Compactor(accountant, provider)
await arun(
provider=provider,
registry=registry,
accountant=accountant,
compactor=compactor,
system=SYSTEM,
user_message=(
"Investigate this machine. First, make a plan in the scratchpad. "
"Then carry it out: learn about the OS, CPU, memory, disk, and "
"recent activity. Record your findings in the scratchpad as you "
"go. When done, synthesize a 200-word report."
),
)
asyncio.run(main())
Run it and watch the scratchpad directory:
ls .scratchpad/
# plan.txt findings-os.txt findings-cpu.txt report-draft.txt
The model has partitioned its work. The plan is stored separately from the findings. Each finding has its own key. Compaction can fire freely during this run — the tool results can be masked, the older turns can be summarized — and the agent can still reach its plan and findings on demand.
The second-run test is even more telling:
# examples/ch09_followup.py
# Same harness, different user message, same scratchpad directory.
await arun(
provider=provider,
registry=registry,
system=SYSTEM,
user_message="What did you learn yesterday? Briefly summarize.",
)
This session starts fresh — no context, no history — but the agent reads scratchpad_list(), sees plan.txt, findings-os.txt, etc., and reconstructs what the previous agent did. That's persistent agent memory, built out of a directory and three tools.
Claude Code uses a specific convention worth mentioning because it's publicly documented and informs good scratchpad practice more broadly. A file called CLAUDE.md in the project root is included automatically in the agent's system prompt every session. It's a standing instruction: "these are durable rules for this project."
The convention:
CLAUDE.md: project-wide rules ("we use pytest; tests live in tests/; the build uses uv").CLAUDE.md in subdirectories: rules specific to that subdirectory ("this code is generated; don't edit directly").# Compact Instructions section: "when compacting, preserve the fact that we use uv, never node."This is a different flavor of scratchpad: the agent doesn't write to it, the human does. It's static, not dynamic. But it fills the same role — persistent context that survives compaction — and the mechanism is identical.
Our scratchpad supports both roles. The human can pre-populate files before a session starts; the agent reads them in via scratchpad_list and scratchpad_read. Adding a persistent-rules.txt and instructing the agent to read it first gives you a CLAUDE.md-style mechanism without any special casing.
A scratchpad written to by multiple agents at once has the classic shared-state problem. Two sub-agents writing to the same key at the same moment — one wins, one loses, and neither knows. Chapter 17 tackles this head-on with a lease system.
For this chapter, the scratchpad is single-agent. If you run multiple agents in parallel, either give each a different root directory or don't let them write to the same keys. The convention that saves you: keys should include the agent's role or ID for namespacing. plan is fine for one agent; plan-investigator, plan-writer is the start of what Chapter 17 will formalize.
A filesystem directory is the simplest scratchpad that could possibly work. Production systems often want more: SQLite for indexing, Redis for speed, Postgres for transactionality, S3 for durability across machines.
The scratchpad interface in Section 9.2 doesn't depend on the backing store. You can swap in a SQLite implementation in about thirty lines:
# src/harness/tools/scratchpad_sqlite.py
import sqlite3
class SqliteScratchpad:
def __init__(self, db_path: str) -> None:
self._conn = sqlite3.connect(db_path)
self._conn.execute("""
CREATE TABLE IF NOT EXISTS scratchpad (
key TEXT PRIMARY KEY,
content TEXT NOT NULL,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
def write(self, key: str, content: str) -> str:
self._conn.execute(
"INSERT OR REPLACE INTO scratchpad (key, content) VALUES (?, ?)",
(key, content),
)
self._conn.commit()
return f"wrote {len(content)} chars to scratchpad[{key}]"
def read(self, key: str) -> str:
row = self._conn.execute(
"SELECT content FROM scratchpad WHERE key = ?", (key,)
).fetchone()
if row is None:
raise KeyError(f"scratchpad[{key}] not found")
return row[0]
# ... same as_tools() as before
We don't swap in SQLite in the main harness because the filesystem version is enough for the book's scenarios, and it has the pedagogical advantage that you can cat a scratchpad file and read what the agent wrote. Chapter 21 revisits persistent state when we build full session checkpointing; at that point, SQLite or Postgres earns its keep.
git add -A && git commit -m "ch09: scratchpad — durable external state for the agent"
git tag ch09-scratchpad
pad.write("../../etc/evil", "oops"). Confirm it raises. Remove the sanitization; confirm it doesn't. That five-line _path function is a small but real security boundary.Chapter 8. Compaction
Previously: the accountant sees the context window. Red state is detected but not acted upon. The transcript keeps growing until the provider refuses to accept it.
Chapter 10. Retrieval
Previously: the scratchpad gave the agent durable state for what it produces. What it doesn't cover is what the agent needs to read from but didn't write — a codebase it's exploring, documentation, a knowledge base that's larger than the context window could hold even empty.