Chapter 17. Parallelism and Shared State

Previously: structured plans with evidence-backed completion. Sub-agents still run sequentially. The payoff for sub-agents comes from running them in parallel — the Anthropic multi-agent finding of 90%+ improvement over single-agent baselines rests on parallelism plus independent context windows, not sub-agents in series.

Three problems emerge when sub-agents run concurrently.

Write conflicts. Two sub-agents both decide to write to the same file at the same moment. One wins, one loses, neither knows. The MAST paper found that coordination breakdowns accounted for 36.9% of multi-agent failures — the largest single category — and shared-state corruption was a recurring mechanism.

Hallucinated consensus. A sub-agent invents a fact. Another sub-agent, asked to verify, reads the first's output and confirms it. The orchestrator treats "confirmed" facts as ground truth. The system produces confident wrong answers because verification closed a loop with no external ground truth at the bottom.

Duplication. Two sub-agents, given overlapping objectives, do similar work. The parent pays twice. Anthropic's multi-agent research post names this explicitly: "research the semiconductor shortage" given to two sub-agents produces two redundant research runs.

This chapter addresses all three with concrete mechanisms: a lease-based write-ownership system, a verification discipline that grounds claims in external tool calls rather than peer output, and a coordinator that narrows sub-agent objectives to prevent overlap.

sub-agent A

holds lease ✓

sub-agent B

← WAIT (retry)

sub-agent C

← CONFLICT

LeaseManager

broker

1 holder / resource

resource

scratchpad/shared.md

exclusive writes only

Three sub-agents, one resource, one lease. The harness serializes writes so concurrent agents can't corrupt shared state.

17.1 Why LLMs Don't Have a Concurrency Model

LLMs are stateless. Each call is independent; there is no shared memory across calls except what the caller puts in the context. When two sub-agents run in parallel, each is a separate sequence of LLM calls with its own transcript, and they have no native way to know about each other's work.

Production systems with concurrent state discipline — databases, distributed systems, actor frameworks — solve this with locks, transactions, or immutability. LLMs don't participate in any of that. The agent isn't the thing holding the lock; the agent is running inside something else that holds the lock on its behalf. This is the distributed-systems observation Leslie Lamport formalized in "Time, Clocks, and the Ordering of Events in a Distributed System" (Communications of the ACM, 1978): coordination between independent processes is not a problem the processes themselves can solve — it requires a mediator outside the set of participants, and that mediator's job is to impose an ordering the participants can agree on. Substitute "sub-agent" for "process" and the conclusion is the same.

So the harness has to be that something else. It brokers access to shared resources; it serializes conflicting writes; it exposes clean, consistent reads. The agents stay stateless at the protocol level; coordination lives in the harness.

17.2 The Resource Lease

# src/harness/coordination/lease.py
from __future__ import annotations

import asyncio
from dataclasses import dataclass, field
from datetime import datetime, timedelta, timezone
from uuid import uuid4


@dataclass
class Lease:
    resource: str
    holder: str            # agent or sub-agent ID
    token: str             # unique lease token; required for ops on this resource
    expires_at: datetime


class LeaseConflict(Exception):
    pass


@dataclass
class LeaseManager:
    """Mediates exclusive access to named resources across concurrent agents."""

    _leases: dict[str, Lease] = field(default_factory=dict)
    _lock: asyncio.Lock = field(default_factory=asyncio.Lock)

    async def acquire(
        self, resource: str, holder: str, ttl: timedelta = timedelta(seconds=60)
    ) -> Lease:
        async with self._lock:
            existing = self._leases.get(resource)
            if existing is not None:
                if existing.expires_at > datetime.now(timezone.utc):
                    raise LeaseConflict(
                        f"resource {resource!r} held by {existing.holder!r}"
                    )
                # expired — reap
                del self._leases[resource]
            lease = Lease(
                resource=resource,
                holder=holder,
                token=str(uuid4()),
                expires_at=datetime.now(timezone.utc) + ttl,
            )
            self._leases[resource] = lease
            return lease

    async def release(self, lease: Lease) -> None:
        async with self._lock:
            existing = self._leases.get(lease.resource)
            if existing and existing.token == lease.token:
                del self._leases[lease.resource]

    async def renew(self, lease: Lease, ttl: timedelta = timedelta(seconds=60)) -> Lease:
        async with self._lock:
            existing = self._leases.get(lease.resource)
            if not existing or existing.token != lease.token:
                raise LeaseConflict("lease no longer valid")
            new_lease = Lease(
                resource=lease.resource, holder=lease.holder, token=lease.token,
                expires_at=datetime.now(timezone.utc) + ttl,
            )
            self._leases[lease.resource] = new_lease
            return new_lease

    async def check(self, resource: str, token: str) -> bool:
        async with self._lock:
            existing = self._leases.get(resource)
            return (existing is not None
                    and existing.token == token
                    and existing.expires_at > datetime.now(timezone.utc))

Three properties earned.

In-process and async-safe. The lease manager uses an asyncio.Lock around its internal state. Multiple coroutines in the same event loop can call acquire, and they serialize correctly.

TTL-bounded. A crashed sub-agent cannot hold a lease forever. When the TTL expires, the next acquire reaps the stale lease. This matches the Postgres advisory-lock pattern — if you want distributed leases, you'd swap the backing store to Postgres or Redis without changing the interface.

Token-based. The lease's token is what future operations on the resource check against. A sub-agent that forgot to release a lease but continued working with stale state gets rejected on the next write attempt.

17.3 Write-gated File Tools

File tools now go through the lease manager. The sub-agent must acquire a lease before editing, and the tool checks the lease on every call.

Because LeaseManager.acquire and LeaseManager.check are async def (they wait on an asyncio.Lock internally), the tools that wrap them have to be async too. We use @async_tool from §13.3 and let the registry's adispatch path await them. Dropping an asyncio.run(...) into a sync tool body would raise RuntimeError: asyncio.run() cannot be called from a running event loop — the agent loop is already running when the tool is dispatched, as §13.3 spells out.

# src/harness/tools/files_leased.py
from __future__ import annotations

from datetime import timedelta
from pathlib import Path

from ..coordination.lease import Lease, LeaseManager
from .base import Tool
from .decorator import async_tool


def leased_file_tools(mgr: LeaseManager, holder: str) -> list[Tool]:

    @async_tool(side_effects={"write"})
    async def acquire_file_lease(path: str, ttl_seconds: int = 60) -> str:
        """Acquire an exclusive write-lease on a file.

        path: the file you intend to modify.
        ttl_seconds: how long to hold the lease before auto-expiry.

        Returns a lease token to include in subsequent edit calls.
        If another agent holds a lease on the same file, returns an
        error; wait and retry, or choose a different approach.
        """
        try:
            lease = await mgr.acquire(
                path, holder, ttl=timedelta(seconds=ttl_seconds)
            )
            return f"token={lease.token} (expires in {ttl_seconds}s)"
        except Exception as e:
            return f"could not acquire lease: {e}"

    @async_tool(side_effects={"write"})
    async def edit_lines_leased(
        path: str, start_line: int, end_line: int,
        replacement: str, lease_token: str,
    ) -> str:
        """Replace a line range, verifying a lease token for the file.

        lease_token: obtained from acquire_file_lease. Required.
        Other args: see edit_lines.
        """
        ok = await mgr.check(path, lease_token)
        if not ok:
            return f"edit rejected: lease for {path} is invalid or expired"
        from .files import edit_lines
        return edit_lines.run(path=path, start_line=start_line,
                               end_line=end_line, replacement=replacement)

    return [acquire_file_lease, edit_lines_leased]

Concrete protocol: sub-agent acquires a lease, uses the token on every subsequent edit to the same file, releases when done (or the TTL expires). If two sub-agents both try to edit /workspace/schema.sql, only one gets the lease; the other's acquire_file_lease call returns an error, at which point the agent either waits, works on something else, or coordinates with the parent.

For read-only access we don't need leases. Reading a file while another agent writes to it is not a correctness problem here — at worst, the reader sees a stale version, which is a freshness issue, not a corruption issue.

17.4 Parallel Sub-agent Spawning

The spawner from Chapter 15 gets a parallel version:

# src/harness/subagents/parallel.py
from __future__ import annotations

import asyncio
from dataclasses import dataclass

from .spawner import SubagentSpawner
from .subagent import SubagentResult, SubagentSpec


@dataclass
class ParallelSpawner:
    inner: SubagentSpawner

    async def spawn_all(
        self, specs: list[SubagentSpec], justification: str = "",
    ) -> list[SubagentResult]:
        """Run multiple sub-agents concurrently; wait for all; return results."""
        tasks = [
            asyncio.create_task(self.inner.spawn(spec, justification=justification))
            for spec in specs
        ]
        return await asyncio.gather(*tasks, return_exceptions=False)

And a parent-facing tool. Same reasoning as §17.3: ParallelSpawner.spawn_all is async def, so the tool wrapping it has to be @async_tool + async def — a sync tool calling asyncio.run(...) from inside the agent loop would crash.

# src/harness/subagents/parallel_tool.py
from ..tools.base import Tool
from ..tools.decorator import async_tool
from .parallel import ParallelSpawner
from .subagent import SubagentSpec


def spawn_parallel_tool(spawner: ParallelSpawner) -> Tool:

    @async_tool(side_effects={"mutate"})
    async def spawn_parallel_subagents(
        objectives: list[str],
        output_format: str,
        tools_allowed: list[str],
        justification: str,
    ) -> str:
        """Spawn multiple sub-agents concurrently.

        objectives: list of distinct, non-overlapping objectives. Each
                    sub-agent handles one. Do not pass the same objective
                    twice.
        output_format: format ALL sub-agents use; the parent synthesizes
                       across parallel results, so they must be comparable.
        tools_allowed: same list for all sub-agents.
        justification: why parallel is better than sequential here.
                       Required.

        Returns a newline-separated, indexed list of sub-agent summaries.
        Do not use this for tasks where one sub-agent's output is input
        to the next — those need sequential spawn_subagent.
        """
        if not justification:
            return "error: justification required"
        if len(objectives) < 2:
            return "error: use spawn_subagent for a single objective"
        if len(set(objectives)) != len(objectives):
            return "error: objectives must be distinct (no duplicates)"

        specs = [
            SubagentSpec(
                objective=obj, output_format=output_format,
                tools_allowed=tools_allowed,
            )
            for obj in objectives
        ]
        results = await spawner.spawn_all(specs, justification)
        lines = []
        for i, r in enumerate(results, start=1):
            lines.append(f"--- sub-agent {i} ---")
            if r.error:
                lines.append(f"error: {r.error}")
            else:
                lines.append(r.summary)
        return "\n".join(lines)

    return spawn_parallel_subagents

The tool enforces the things Anthropic's multi-agent writeup warned about: non-overlapping objectives (distinct strings), same output format for comparability, justification required, minimum count to avoid misuse. The parent that wants to run one sub-agent uses spawn_subagent; the parent that wants parallelism uses spawn_parallel_subagents. Two different tools, two different default behaviors, one clean contract each.

17.5 Grounding Verification in External Truth

The hallucinated-consensus problem: if two sub-agents verify each other's claims, neither grounds anything against reality.

The fix is a verification discipline, not a new primitive: verification postconditions require an external tool call as evidence. A sub-agent can't claim "I checked and the file exists" unless its evidence string references a concrete tool result. The orchestrator's synthesis can't claim consensus unless it records which external source confirmed each fact.

We encode this in the system prompt for sub-agents:

When you claim a fact as true, your evidence must reference an external
source: a tool call you just made, a file you just read, an API response
you just received. Do not cite another sub-agent's summary as evidence —
sub-agents can be wrong. If you cannot ground a fact externally, say
"unconfirmed" in your output rather than asserting the fact.

And we add a verify_fact_externally tool that the agent calls when it wants to assert a fact confidently:

@tool(side_effects={"read"})
def verify_fact_externally(claim: str, tool_used: str, tool_result: str) -> str:
    """Record that a claim has been externally verified.

    claim: what you are asserting.
    tool_used: name of the tool whose output grounds this claim.
    tool_result: quoted excerpt from the tool's output that supports the
                 claim. Must be a direct quote, not a paraphrase.

    Returns a verification receipt you include in your final summary.
    """
    return (f"[verified: {claim}; grounded in {tool_used} output: "
            f"{tool_result[:200]}]")

This looks bureaucratic. It is. The trick is that the friction is what stops the agent from rubber-stamping. In practice, agents that use this tool produce visibly different — and more accurate — outputs than agents that don't.

A model-capability floor. There's a failure mode below this whole discipline that system-prompt phrasing alone cannot fix: a sub-agent told to run uptime will emit a textual narrative of what uptime would produce, without ever dispatching the bash tool. iterations_used=1, zero tool calls in the transcript, a confident-sounding summary containing Linux-flavored output even on macOS, or /proc/meminfo fields in response to a vm_stat objective. The coordinator then synthesizes across these fabricated outputs and produces a "machine state report" that is entirely fiction, presented confidently. This happens consistently with open models in the 7–12B parameter range (Gemma, Llama-class) and occasionally with larger models under ambiguous prompts. Frontier models (Opus, Sonnet 4.5-class) comply with implicit "use your tools" expectations; local models need the expectation made explicit. Two defenses that work: (a) §15.2's _default_subagent_system already injects a hard imperative — "You MUST call at least one tool from your allowed list before producing your final answer. Do not describe what you would do — do it" — whenever tools_allowed is non-empty; (b) in your eval harness (Chapter 19), add an assertion that fails any sub-agent run where iterations_used == 1 and the transcript contains zero tool calls. Combined, these shift the symptom from silent hallucinated output to a visible missing tool call, which is debuggable. The verify_fact_externally tool above raises the ceiling on what a competent model will do; the capability floor is about whether the model is competent enough in the first place.

17.6 A Parallel Research Scenario

# examples/ch17_parallel.py
import asyncio
from pathlib import Path

from harness.agent import arun
from harness.coordination.lease import LeaseManager
from harness.providers.anthropic import AnthropicProvider
from harness.subagents.parallel import ParallelSpawner
from harness.subagents.parallel_tool import spawn_parallel_tool
from harness.subagents.spawn_tool import spawn_tool
from harness.subagents.spawner import SubagentSpawner
from harness.tools.scratchpad import Scratchpad
from harness.tools.selector import ToolCatalog
from harness.tools.std import STANDARD_TOOLS


SYSTEM = """\
You are a research coordinator. When a user question decomposes into
distinct, non-overlapping sub-questions, use spawn_parallel_subagents to
run them concurrently. When steps must be sequential, use spawn_subagent.

Always require concrete external grounding: each sub-agent should ground
its claims in specific tool outputs, not in other sub-agents' output.
"""


async def main() -> None:
    provider = AnthropicProvider()
    lease_mgr = LeaseManager()
    pad = Scratchpad(root=Path(".scratchpad"))

    catalog = ToolCatalog(tools=STANDARD_TOOLS + pad.as_tools())
    sub_spawner = SubagentSpawner(provider=provider, catalog=catalog)
    par_spawner = ParallelSpawner(inner=sub_spawner)

    coordinator_catalog = ToolCatalog(tools=(
        STANDARD_TOOLS + pad.as_tools() +
        [spawn_tool(sub_spawner), spawn_parallel_tool(par_spawner)]
    ))

    await arun(
        provider=provider,
        catalog=coordinator_catalog,
        system=SYSTEM,
        user_message=(
            "Investigate this machine's resource state, concurrently. "
            "I need four things: CPU load average, memory utilization, "
            "disk utilization on /, and the three largest directories "
            "under /var. Each sub-agent should use bash; each should "
            "ground its answer in a specific bash command's output. "
            "Synthesize the four answers into one paragraph."
        ),
    )


asyncio.run(main())

Run it. Four sub-agents start in parallel. Each is constrained to bash, each runs a specific small command, each returns a structured summary with tool-output grounding. Total wall-clock time: roughly the time of the slowest sub-agent, not the sum of all four. Total context in the parent: four compact summaries.

17.7 What We Haven't Solved

Two limitations worth naming.

Lease starvation. A greedy sub-agent holds a lease for its full 60-second TTL even when it's waiting on a slow tool. Other agents block. Fix: shorter TTLs with automatic renewal when progress is being made. We didn't build it; the interface supports it via renew.

Cross-resource deadlock. Agent A holds lease on X, waits for lease on Y. Agent B holds lease on Y, waits for lease on X. Classic deadlock. The lease manager has no deadlock detection. For the book's scenarios — small number of concurrent agents, clear resource hierarchies — this doesn't bite. Production systems with richer coordination need either ordered acquisition (always acquire leases in alphabetical order) or a full deadlock detector. That's out of scope here.

17.8 Commit

git add -A && git commit -m "ch17: lease manager, parallel spawning, grounded verification"
git tag ch17-parallelism

17.9 Try It Yourself

Cause a conflict. Write two sub-agents that both try to write to /workspace/shared.txt. Run them with spawn_parallel_subagents. Observe the lease contention. Does the losing sub-agent retry? If it doesn't, what would you add?
Induce a hallucinated consensus. Run two sub-agents, one with instructions to invent a plausible fact, the other with instructions to verify claims. See what happens without the external-grounding requirement. Re-run with the grounding requirement enforced. Does the second sub-agent catch the invention?
Time the parallelism. Measure wall-clock for the parallel scenario above vs. a sequential rewrite that uses four sequential spawn_subagent calls. How much faster is parallel? Is it worth the coordination complexity?

What you now understand

Sub-agents can run in parallel. Shared resources are gated by leases that prevent write conflicts. Parallel spawning is a distinct tool from sequential, with guardrails that force non-overlapping objectives. Verification is grounded in external tool output, not peer output — a discipline in the system prompt and a tool that makes the discipline visible. The harness now supports the full Anthropic-style multi-agent pattern: orchestrator over parallel specialists with independent contexts.

What's still missing: nothing user-facing, and that's the problem. The harness does a lot; we have very little visibility into what it's doing. A failed run tells you the final error but not which sub-agent burned tokens, which tool took 12 seconds, which compaction dropped the file the final agent wanted. Chapter 18 makes the harness observable end-to-end via OpenTelemetry GenAI semantic conventions.

Chapter 16. Structured Plans and Verified Completion

Previously: sub-agents with bounded delegation. A coordinator can now split work across sub-agents and synthesize. What it cannot yet do — and what single-agent runs also cannot — is verify that what it claims to have done, it actually did.

Chapter 18. Observability

Previously: parallel sub-agents, leases, grounded verification. The harness is capable but opaque. A failed run tells you the final error but nothing about which sub-agent burned tokens, which tool call took 12 seconds, which compaction event dropped what the final agent wanted.