Menu

Home/Blog/Multi-Agent Orchestration Patterns: When One Claude Session Isn't Enough
AI Engineering

Multi-Agent Orchestration Patterns: When One Claude Session Isn't Enough

13 min readTunerLabs EngineeringMay 29, 2026

Four production patterns for composing AI agents - Planner/Executor/Critic, Fan-out/Fan-in, Scoped Pipeline, and Specialist+Generalist. When to spawn vs extend, how to keep the blast radius small, and a cheat sheet for picking the simplest topology that ships.

One Agent Isn't Always Wrong. Two Agents Isn't Always Better.

Most teams reach for five agents when one would have shipped. Multi-agent topology looks impressive on an architecture deck. It also fails in three new ways the single-agent version never could: orchestrator bugs, message-format drift, and partial failures where some agents succeed and others halt mid-flight.

This guide lays out the four orchestration patterns we use in production at TunerLabs - ranked in order of cost. Start at the top. Climb the ladder only when the pattern below provably can't handle the work. Never before.

The patterns covered:

1. Planner - Executor - Critic - the three-role loop

2. Fan-out / Fan-in - parallel sub-agents

3. Scoped Pipeline - sequential, scoped tools

4. Specialist + Generalist - scoped expertise on demand

Plus the two cross-cutting concerns - communication and blast-radius isolation - that every pattern depends on, and a decision cheat sheet for picking the right topology in under a minute.

> Designing a multi-agent system right now? Book a free 30-minute architecture review and we will walk through your topology with one of our engineers.


The Decision Rule: Spawn or Extend?

Before any pattern, the first decision: do you actually need a new agent, or should you just extend the current session with another turn?

Adding a turn to an existing agent costs roughly zero. Spawning a new one costs context setup, tool wiring, message-passing scaffolding, and one more place to debug when something breaks. The default is extend. Three triggers say spawn:

Trigger 1: Different Tools (Tool Isolation)

The new task needs a tool surface the current agent shouldn't have. Read-only navigation. Production database write access. Spawn so the permissions stay scoped to the task that needs them, not bleeding into every later turn.

Trigger 2: Different Context (Memory Hygiene)

The current context is full of noise the next task doesn't need. A fresh agent starts clean. The original agent keeps its focus on its own thread without context pollution.

Trigger 3: Real Parallelism (Concurrency)

Two independent tasks can actually run side-by-side. Three Lambdas to migrate, no shared state. Spawn three agents, fan in the results. This is the only pattern that delivers wall-clock speedup - the others are about isolation, not speed.

The Trap

Multi-agent diagrams look impressive in architecture decks. They also fail in three new ways: orchestrator bugs (the coordinator itself has bugs), message-format drift (agents start emitting outputs the next stage can't parse), and partial failures (two of three slices succeed, one fails - what now?). If a single agent with a clear scope works, ship that.


Pattern 01: Planner - Executor - Critic

The three-agent loop. One thinks. One does. One checks. This is the most-used pattern in our harness and the default unless you have a specific reason to choose something else.

How It Works

PLANNER (read-only)         CRITIC (read-only)
decompose - spec            grades - gates - loops
       |                          ^
       v                          |
              EXECUTOR (full access)
              writes - runs - ships
                    |
                    v
          IF OK -> merge
          IF NOT OK -> retry with feedback
          IF stuck -> escalate

The Planner reads the problem and produces a structured spec: what needs to change, in what order, with what success criteria. It writes no code.

The Executor takes the spec, makes the changes, runs tests, and reports back. It has the full tool surface - file edits, shell commands, network calls - because it is the only agent that does mutating work.

The Critic reads the executor's output against the planner's spec. It grades. It either approves the merge, rejects with specific feedback for another loop, or escalates to a human if the loop has run too many times without convergence.

Best For

Most agent work. This is the default. It maps directly to the standard four agents we ship in the TunerLabs harness for production engineering tasks.

The Trade-off

Three context windows means three times the token cost, and you have to watch out for critics that drift toward agreement. An unprimed critic will rubber-stamp the executor after a few iterations because it has been seeing the executor's reasoning. Pin the critic with explicit invariants from the spec - "the test must pass," "the migration must be reversible," "the response shape must match the schema" - so it stays grounded.

Why This Wins

Mutation lives in one agent. Two agents are read-only. The blast radius is the executor's tool surface, no wider. If the executor goes wrong - deletes the wrong file, writes a broken migration, calls the wrong API - the critic catches it before merge. The pattern is a built-in safety layer for autonomous work.


Pattern 02: Fan-out / Fan-in (Parallel Sub-Agents)

When the work is embarrassingly parallel - three slices, no shared state - this is the only pattern that gives you real wall-clock speedup. Three agents, three slices, one merge step.

How It Works

ORCHESTRATOR
split - fan-out
  |       |       |
  v       v       v
AGENT A  AGENT B  AGENT C
migrate  migrate  migrate
user-svc auth-svc billing-svc
  |       |       |
  +---+---+---+---+
          |
          v
       MERGE
   aggregate - review
          |
          v
   PR (one or three)

The orchestrator splits the work into independent units, spawns one agent per unit, waits for all of them to finish, and then aggregates the results - typically as one bundled PR or three sibling PRs depending on the review preference.

Best For

Migration sweeps. Three Lambdas with no shared state become one afternoon, not three days. Codemods across N similar files. Service migrations where each service is self-contained. Embarrassingly parallel work compounds across the pod - this is the pattern that turns a one-engineer week into a one-afternoon job.

The Trade-off

Partial failure is the killer. Two slices succeed, one fails - what now? Decide ahead of time: ship two PRs and re-queue the third, or rollback all three and start over. The orchestrator needs to know its policy before the fan-out begins, not after the failure lands. We have seen teams discover their failure policy by accident, mid-migration, and lose half a day untangling the partial state.

The Pre-Req

Fan-out only works when slices are truly independent. No shared file. No shared schema migration. No shared config. If two agents touch the same file, you have recreated the merge-conflict problem at agent scale - except now you have three confused agents emitting conflicting diffs instead of three humans on a call sorting it out.

Verify isolation in the work-package spec before fanning out. If you cannot prove independence on paper, the parallel run will surface it as a runtime bug.


Pattern 03: Scoped Pipeline

Each stage runs one agent with the smallest tool set it needs. The output of one stage is the input of the next. The pipeline is the structure; each stage is a small, well-scoped agent.

How It Works

EXTRACT          TRANSLATE       TEST            DEPLOY
read-only   ->   scoped write -> scoped write -> validate
parse        ->  code transforms generate - run  staging only
classify

Each stage gets only the tools it needs and only the context it needs. Extract reads files but never writes. Translate writes code but does not touch infra. Test generates and runs tests but does not ship anywhere. Deploy touches infra but only against staging, never prod.

The handoff between stages is typed - usually a JSON or markdown block with a defined schema - so the next stage knows exactly what to expect.

Best For

Predictable transformations. One Lambda to one Function App. A document to a structured record to a database row to a notification email. The stages are stable enough to encode as a pipeline rather than a freeform conversation.

If you can write down the stages without thinking hard about the input, this is the pattern. If you cannot, you probably want the loop pattern instead.

The Trade-off

Less adaptive than the loop pattern. If a stage hits something the next stage cannot handle - an edge case the schema does not cover, a code path that needs a different translation - the pipeline cannot reroute. It errors out and waits for a human. Use loop for novelty, pipeline for repetition.

Why Scoping Works

Five small blast radii beat one big one. If the extract stage has a bug, it can corrupt parsed output but it can never accidentally deploy to prod. If the translate stage has a bug, it can produce broken code but it can never publish a notification. The scoping isolates failure to the stage where it happened.

This is the pattern of choice for any work that touches multiple systems. It is also the pattern that auditors love, because every stage's tool surface is documented in the topology itself.


Pattern 04: Specialist + Generalist

A generalist agent runs the work. When it hits a domain that needs depth, it delegates to a specialist with the right context. The specialist returns an answer, the generalist resumes.

How It Works

GENERALIST
migrator agent runs the spec
       |
       +-> SPECIALIST - IAM       (translate IAM to MSI)
       |
       +-> SPECIALIST - SCHEMA    (DDB to Cosmos partition strategy)
       |
       +-> SPECIALIST - SECURITY  (review the diff)
       |
RESUMED generalist continues with answer

The generalist is in the driver's seat. It owns the overall task and the conversation context. The specialists are called on-demand: a focused question goes in, a focused answer comes back, the generalist's context absorbs only the answer - not the specialist's reasoning trace.

Best For

Tasks where 80% is generic but 20% needs deep domain context. Security review during a migration. Schema design during a service build. Threat modeling during an API change. The generalist can handle the breadth; the specialist provides the depth without bloating the generalist's context window.

The Trade-off

Handoff loss. The specialist sees only what the generalist passes - not the whole conversation. Pass too little, the specialist guesses based on partial information. Pass too much, you have lost the point of scoping and the specialist is just another generalist with a different system prompt.

The art of this pattern is in the handoff payload: enough context for the specialist to be useful, little enough to keep its context window focused.

Specialists as Skills, Not Agents

In Claude Code, most "specialists" are skills, not agents - loaded on demand, no separate context window, no inter-agent message passing. A skill is a markdown file that gets injected into the active agent's context when invoked.

Reach for a real sub-agent only when you need genuine isolation: read-only tools for security review where the reviewer should not be able to write code, separate token budget for a long-running translation task that would otherwise blow up the parent's context, or a fresh context window for a task that benefits from no prior conversation noise.

Otherwise, a skill is cheaper, faster, and easier to debug. The default is skill. Sub-agent is the exception.


Cross-Cutting Concern 1: Communication

Two cross-cutting concerns every pattern shares. Get these wrong and the prettiest topology fails.

Structured Handoff (The Default)

Typed JSON or markdown blocks at agent boundaries. Schema-validated. The output of one agent is the input of the next, and both sides know exactly what shape it has.

The schema does two jobs: it forces the producer to emit complete, valid output, and it lets the consumer fail loudly when something drifts. Without a schema, you discover format drift at runtime, in production, three weeks after deploy.

Shared Scratchpad

A file or doc all agents read and write. Useful for accumulating state across stages - notes, partial results, decisions made. Easy to corrupt: two agents writing to the same scratchpad at the same time produce a garbled mess.

Use the scratchpad pattern only when stages run sequentially (one writer at a time) or when the scratchpad has clear sections owned by specific agents.

Message Bus

Heavy. Real queues, real durability, real ordering guarantees. Reach for it when you have five or more agents and real concurrency, not before. Most teams that adopt a message bus for a three-agent system are paying infrastructure cost for theoretical scale they will never reach.

The Principle

The default is structured handoff. Climb to scratchpad when stages need accumulating state. Climb to message bus only when you have outgrown both. Do not start at the top of this ladder.


Cross-Cutting Concern 2: Blast-Radius Isolation

What each agent can touch matters more than what each agent can do.

Tool Scoping

Each agent gets the smallest tool set that lets it work. Read-only by default. Mutation tools - file write, shell exec, network POST - added only when the agent's role requires them, and only for the duration of the role.

A planner does not need write tools. A critic does not need write tools. Only the executor does. If your planner has a write tool, you have leaked blast radius into a role that should not have it.

Path Scoping

Write access limited to declared directories via settings.json deny rules. The executor can write to src/, not to .env. The deploy stage can write to infrastructure/staging/, not to infrastructure/prod/. Path scoping turns a careful policy into an enforced one.

Time Scoping

Token budget caps and wall-clock timeouts. An agent that hits the limit halts, never overruns. This is the cheapest safety mechanism in the stack and the one teams forget to set until an agent runs for 40 minutes burning tokens on a loop.

The Principle

Every agent in your topology should fail closed, not open. If communication breaks, the pipeline halts. If isolation breaks, nothing escapes the scope. Design for failure first; the success path takes care of itself.

Prod Access Is the Edge Case

At least one agent in your topology will eventually need write access to something irreversible: production database, customer-facing API, prod deploy. Make that agent a security specialist with the smallest possible tool surface, scoped credentials issued just-in-time, and a separate audit log that lives outside the agent's own context.

Never give prod write to a generalist. Never give prod write to an agent whose tool surface includes anything other than the prod write operation.


The Cheat Sheet: Which Pattern?

Start at the top. The first row that fits is your pick. Do not reach for complex when simple ships.

IfUse
The work fits in one focused sessionOne agent
Output needs grading before mergePlanner - Executor - Critic
You have N independent slicesFan-out / Fan-in
The transformation is stable and repeatedScoped Pipeline
One step needs deep domain expertiseSpecialist + Generalist
Prod write or irreversible loss is involvedScoped specialist only
You are combining two patterns aboveStop. Simplify first.

Composing Patterns (Cautiously)

These four patterns compose. A Planner-Executor-Critic loop can use a Pipeline as its Executor. A Fan-out can spawn N Planner-Executor-Critic loops, one per slice. A Specialist + Generalist topology can sit inside any of the others.

Composition works when each layer has a clear job and a clear boundary. It fails when the topology becomes a flow chart no human can hold in their head.

The rule we use: if you cannot draw the topology on a whiteboard in 30 seconds, redesign it. Multi-agent systems that work in production are the ones an engineer can debug at 3 AM. Multi-agent systems that fail in production are the ones that needed a diagram to explain.


Common Failure Modes

Five failures we see repeatedly in client multi-agent systems, in order of frequency:

1. Orchestrator bugs. The coordinator itself has bugs. It splits work incorrectly, fails to handle a stage's error, or aggregates results in a way that loses information. The agents are fine; the conductor is broken.

2. Message-format drift. Stage A starts emitting an extra field. Stage B silently ignores it. Three weeks later, Stage C breaks because the field it depended on got renamed somewhere in the chain. Schema validation at every boundary prevents this.

3. Partial failures with no policy. Two slices succeed, one fails, and the orchestrator has no rule for what to do. Engineers manually patch the partial state, often badly. Decide the policy before fan-out, not during the post-mortem.

4. Context bloat in the generalist. The Specialist + Generalist pattern leaks specialist context back into the generalist over time. The generalist's context window fills with domain-specific noise, its accuracy drops, and the team blames the model. Trim the handoff payload, both directions.

5. Critic agreement drift. The critic starts agreeing with the executor after a few loops, because it has been reading the executor's reasoning and absorbing its frame. Pin the critic with explicit invariants from the spec - and re-instantiate it fresh between loops if drift persists.


The Bottom Line

Multi-agent is a tool, not a default. Each new agent in your topology adds context, latency, blast radius, and one more place a chain can fail. Pick the simplest pattern that solves your problem.

The patterns above are ranked in order of cost. Start with one agent. Climb to Planner-Executor-Critic when you need grading before merge. Climb to Fan-out when you have N truly independent slices. Climb to Pipeline when the transformation is stable enough to encode. Climb to Specialist + Generalist when one step needs depth the generalist cannot provide.

Climb only when the pattern below provably cannot handle the work. Never before.

The Quick-Start Checklist

  • Audit your current agent design: how many agents do you actually have, and what is each one's job?
  • For each agent, ask: would extending the prior session work instead?
  • For each agent, ask: what is its tool surface, and is anything beyond what its role needs?
  • For each handoff, ask: is the schema typed and validated, or are you trusting freeform text?
  • For each parallel fan-out, ask: what is the partial-failure policy?
  • For the system as a whole, ask: can an engineer draw this on a whiteboard in 30 seconds?

If any of those answers are no, you have work to do before adding the next agent.


Designing or refactoring a multi-agent system? Talk to TunerLabs - we engineer production multi-agent systems for businesses worldwide. From single-agent harnesses to multi-stage scoped pipelines, we design topologies that ship, run, and do not break at 3 AM. Save this guide for the next time someone wants to design a five-agent topology on a whiteboard. Share it with an architect who keeps adding agents to fix problems prompts could solve.

Topics:

multi-agent systemsagent orchestrationClaude CodeAI architectureagent design patternsLLM engineeringPlanner Executor Criticfan-out fan-inagent pipelinesharness engineering
Free Guide

Master Claude Code

The complete architecture guide — Skills, Agents, Memory & the full Tools reference. Everything in one beautiful page.

Read the Guide