AI Agents

Multi-Agent Systems: Patterns That Work

Explore multi-agent systems patterns that work in production. Learn orchestrator-worker, parallel fan-out, and hierarchical designs with real trade-offs.

May 31, 2026 10 min read
Multi-Agent Systems: Patterns That Work

Multi-agent systems get a lot of conference-talk enthusiasm and a lot of production-incident silence. The idea is straightforward: instead of one agent trying to do everything, you decompose the problem into specialized agents that collaborate. The practice is harder. Coordination bugs, context bleed between agents, and cascading failures make multi-agent systems genuinely difficult to operate. They're a poor fit for problems that a single well-prompted agent could handle.

This article focuses on the patterns that actually hold up. We've built and maintained multi-agent systems for customer support automation, data-pipeline orchestration, and agentic development tooling at Laxaar. These patterns come from that work, not from reading papers.

The goal here is practical: give you a decision framework and enough working detail to evaluate whether multi-agent coordination is the right call for your problem.

What you'll learn

When multi-agent makes sense

Multi-agent systems add coordination overhead. That overhead is only worth it when the problem genuinely can't fit in a single agent's context window, or when parallel execution is a hard requirement.

Good fits:

  • Tasks with truly independent subtasks that can run concurrently (e.g., researching five companies simultaneously)
  • Workflows that need specialized expertise in different domains (e.g., a code-review agent plus a security-audit agent plus a documentation agent)
  • Long-horizon tasks where a single context window would overflow before completion

Poor fits:

  • Tasks where the steps are sequential and interdependent — you'll serialize everything anyway
  • Problems a well-structured single agent with good tools can handle
  • Anything where you haven't yet built and evaluated the single-agent version

Our honest recommendation: build the single-agent version first. If it fails because of context limits or because it can't parallelize, then reach for multi-agent. Jumping straight to multi-agent because it sounds impressive is one of the most reliable ways to ship something slow, fragile, and expensive.

Orchestrator-worker pattern

The orchestrator-worker pattern is the most production-reliable multi-agent design. An orchestrator agent receives the top-level goal, breaks it into subtasks, dispatches each subtask to a worker agent, and synthesizes the results.

The orchestrator doesn't execute tasks itself — it plans and coordinates. Workers are narrow specialists: they receive a well-scoped task, complete it, and return a result. This separation makes each component easier to test and replace independently.

import anthropic
import asyncio
from typing import Any

client = anthropic.Anthropic()

async def worker_agent(task: str, context: str) -> str:
    """A specialist worker that completes a single scoped task."""
    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=2048,
        system="You are a specialist agent. Complete the assigned task precisely and return a structured result.",
        messages=[
            {"role": "user", "content": f"Context: {context}\n\nTask: {task}"}
        ]
    )
    return response.content[0].text

async def orchestrator_agent(goal: str) -> str:
    """Orchestrator that plans subtasks and coordinates workers."""
    # Step 1: Plan subtasks
    plan_response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        system=(
            "You are an orchestrator. Given a goal, output a JSON list of subtasks. "
            "Each subtask must be independently completable. "
            'Format: [{"task": "...", "context": "..."}]'
        ),
        messages=[{"role": "user", "content": f"Goal: {goal}"}]
    )

    import json
    subtasks = json.loads(plan_response.content[0].text)

    # Step 2: Dispatch workers in parallel
    worker_tasks = [
        worker_agent(s["task"], s["context"])
        for s in subtasks
    ]
    results = await asyncio.gather(*worker_tasks)

    # Step 3: Synthesize
    synthesis_input = "\n\n".join(
        f"Subtask: {s['task']}\nResult: {r}"
        for s, r in zip(subtasks, results)
    )
    final = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=2048,
        system="Synthesize the subtask results into a coherent final answer.",
        messages=[{"role": "user", "content": synthesis_input}]
    )
    return final.content[0].text

The key design choice here: workers use Claude Haiku for speed and cost; the orchestrator uses Claude Sonnet for planning quality. You don't need the same model everywhere. Match model capability to the reasoning demand at each step.

Parallel fan-out

Parallel fan-out is the simplest multi-agent pattern: take a list of independent inputs, process each with a separate agent instance, collect results. No orchestrator needed — a plain async dispatcher handles it.

This pattern fits well when you're doing the same operation on many items (summarizing 50 documents, running code review on 20 PRs, extracting structured data from 100 customer emails). The agents don't communicate with each other; they only report back to the dispatcher.

import asyncio
import anthropic

client = anthropic.Anthropic()

async def process_item(item: str, semaphore: asyncio.Semaphore) -> dict:
    async with semaphore:
        response = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=512,
            messages=[
                {"role": "user", "content": f"Summarize in 2 sentences: {item}"}
            ]
        )
        return {"input": item, "summary": response.content[0].text}

async def fan_out(items: list[str], max_concurrent: int = 5) -> list[dict]:
    semaphore = asyncio.Semaphore(max_concurrent)
    tasks = [process_item(item, semaphore) for item in items]
    return await asyncio.gather(*tasks)

The semaphore is not optional. Without rate limiting, you'll hit API rate limits and pay for retry overhead. Set max_concurrent based on your API tier's requests-per-minute limit, not on what feels fast.

Hierarchical agents

Hierarchical agents extend the orchestrator-worker pattern by adding another layer: a top-level orchestrator delegates to sub-orchestrators, each of which manages its own worker pool. This is appropriate for genuinely complex workflows where different domains need separate coordination logic.

Here's where to use hierarchical agents vs. a flat orchestrator-worker:

DimensionFlat orchestrator-workerHierarchical
Task domainsOne or two domainsThree or more distinct domains
Worker specializationSimilar workersSpecialist sub-teams
Coordination complexityManageable in one orchestratorNeeds domain-specific routing
Debug complexityLowHigh
LatencyLowerHigher (more hops)
Best forMost production workflowsLarge-scale agentic pipelines

Our opinion: hierarchical agents are overused. Most teams reach for them before they've exhausted what a well-tuned flat orchestrator can do. The added debug complexity is real. Every extra coordination layer is another place for a task to get lost, a result to be misrouted, or a timeout to cascade.

If you do need hierarchical coordination, LangGraph's graph-based state machine handles the topology well and makes the data flow inspectable. For simpler orchestration, a typed dataclass passed between async functions is often enough and far easier to test.

State and communication

State management is where multi-agent systems most often fall apart in practice. The core question: where does shared state live, and who's allowed to write to it?

Three patterns for inter-agent communication:

Message passing. Agents communicate through an explicit message queue (Redis, RabbitMQ, or an in-process asyncio queue). Each agent reads from its input queue and writes to an output queue. Decoupled, auditable, and easy to replay — but adds infrastructure.

Shared state store. Agents read and write from a central store (a database, a Redis hash, or a dict in a single-process system). Simpler to set up, but you need to handle concurrent writes carefully. Use optimistic locking or designate a single writer per key.

Blackboard pattern. A shared workspace (the "blackboard") holds all intermediate results. Agents read from it, compute something, and write back. Works well when you can't predict which agent will need which result. It's essentially a structured shared state store with a clear schema.

For most production workloads, Laxaar uses a typed shared state store — usually a Pydantic model persisted to Redis or a lightweight database. It's inspectable, writable from any agent, and survives process restarts. Message queues are worth the overhead when you need strict ordering guarantees or when agents run in separate processes.

The agentic coding workflows article covers how these state patterns apply specifically to software development pipelines, which have their own coordination quirks.

Multi-agent failure modes

Single-agent failures are bad. Multi-agent failures are bad and confusing, because the error might originate in worker A but surface in the orchestrator's output as a nonsensical synthesis.

Coordination deadlocks. Agent A waits for a result from Agent B, which is waiting for a result from Agent A. Fix: always set timeouts on inter-agent calls, and design task graphs to be acyclic.

Result poisoning. One worker returns a hallucinated or malformed result, and the orchestrator treats it as ground truth. Fix: validate worker outputs against a schema before feeding them to the synthesizer. Pydantic does this cleanly.

Context contamination. Workers share context they shouldn't, causing one agent's errors to influence another's reasoning. Fix: scope each worker's context to only what it needs. Don't pass the full conversation history to workers that only need a single task description.

Cascade failures. A single worker failure causes the orchestrator to wait indefinitely or to synthesize with a missing result. Fix: set per-worker timeouts and handle failures explicitly — either skip the failed result, retry once, or surface the error to the caller.

Cost explosions. Parallel workers burn tokens simultaneously. A fan-out of 20 workers running Opus-class models can cost more than you expect in seconds. Fix: set hard token limits per worker, use cheaper models for structured subtasks, and add a cost circuit-breaker at the orchestrator level.

For a structured look at how to catch these failures before they hit production, the evaluating AI agents article covers eval frameworks and failure-injection testing for agent systems.

Laxaar has run post-mortems on production multi-agent failures. The most common root cause isn't the model quality — it's missing output validation and missing timeouts. Both are fixable with a few dozen lines of code. If you'd like a review of your multi-agent architecture, our AI agent experts are available for architecture consultations.

Frequently Asked Questions

How many agents is too many in a multi-agent system?

There's no universal limit, but complexity grows faster than agent count. Systems with more than 5–7 agents typically need dedicated tooling for observability and testing. If you're adding agents because the system is hard to extend, that's a design smell — the problem is probably insufficient task decomposition, not insufficient agents.

Can different agents in the same system use different LLMs?

Yes, and this is often the right call. An orchestrator that needs complex planning might use Claude Sonnet or Opus, while high-volume workers doing structured extraction work well with Claude Haiku. Mix models based on the reasoning demand and cost budget of each role.

How do I test a multi-agent system?

Test each agent in isolation first, with mocked inputs and expected outputs. Then test agent pairs, then the full system. Failure injection — deliberately feeding bad outputs from one agent to the next — is especially important and often skipped. Use deterministic seeds or record-replay when you can to make tests repeatable.

What's the difference between multi-agent systems and microservices?

They're conceptually similar — distributed, specialized components communicating through defined interfaces — but agents make LLM calls and produce probabilistic outputs. That makes them harder to test than deterministic microservices. The coordination patterns borrow from distributed systems, but the failure modes are different.

Should I build my own multi-agent framework or use an existing one?

Use an existing one for coordination infrastructure (LangGraph, AutoGen, CrewAI) and write your own agent logic. Don't reinvent task queuing, retry logic, or graph state management. Do own your agent prompts, tool definitions, and evaluation harness — those are where your differentiation lives.

Multi-agent systems are worth the complexity when the problem genuinely requires them. If you're not sure which architecture fits your use case, the Laxaar team offers architecture reviews and full-stack agent development. Reach out through our contact page and we'll help you find the right pattern for your workload.

Multi-Agent SystemsAI AgentsLLM Architecture
Grow your business with us

Take your business to the next level.

Tell us what you're building. We'll come back inside one business day with a fixed scope, timeline, and team — or an honest “this isn't a fit”.

ENGINEERING PHILOSOPHY

Code is useless if it's not comprehensible to those who maintain it. We write code the next person can actually understand.