Multi-Agent Systems: Patterns That Work

Three of our last five production incidents involved multi-agent systems. Not because the models failed, but because the coordination layer did. The idea is straightforward enough: instead of one agent trying to do everything, you decompose the problem into specialized agents that collaborate. What's harder is the practice. Coordination bugs, context bleed between agents, and cascading failures make multi-agent systems genuinely difficult to operate. They're a poor fit for problems a single well-prompted agent could handle.

This article focuses on the patterns that actually hold up. We've built and maintained multi-agent systems for customer support automation, data-pipeline orchestration, and agentic development tooling at Laxaar. These patterns come from that work, not from reading papers.

The goal is practical: give you a decision framework and enough working detail to evaluate whether multi-agent coordination is the right call for your problem.

What you'll learn

When multi-agent systems are (and aren't) worth the complexity
Orchestrator-worker: the pattern that ships most reliably
Parallel fan-out for independent subtasks
Hierarchical agents for complex workflows
State management and inter-agent communication
Failure modes unique to multi-agent systems

When multi-agent makes sense

Multi-agent systems add coordination overhead. That overhead is only worth it when the problem genuinely can't fit in a single agent's context window, or when parallel execution is a hard requirement.

Good fits:

Tasks with truly independent subtasks that can run concurrently (e.g., researching five companies simultaneously)
Workflows that need specialized expertise in different domains (e.g., a code-review agent plus a security-audit agent plus a documentation agent)
Long-horizon tasks where a single context window would overflow before completion

Poor fits:

Tasks where the steps are sequential and interdependent (you'll serialize everything anyway)
Problems a well-structured single agent with good tools can handle
Anything where you haven't yet built and evaluated the single-agent version

Our honest recommendation: build the single-agent version first. If it fails because of context limits or because it can't parallelize, then reach for multi-agent. Jumping straight to multi-agent because it sounds impressive is one of the most reliable ways to ship something slow, fragile, and expensive.

Orchestrator-worker pattern

The orchestrator-worker pattern is the most production-reliable multi-agent design. An orchestrator agent receives the top-level goal, breaks it into subtasks, dispatches each subtask to a worker agent, and synthesizes the results.

The orchestrator doesn't execute tasks itself. It plans and coordinates. Workers are narrow specialists: they receive a well-scoped task, complete it, and return a result. This separation makes each component easier to test and replace independently.

import anthropic
import asyncio
from typing import Any

client = anthropic.Anthropic()

async def worker_agent(task: str, context: str) -> str:
    """A specialist worker that completes a single scoped task."""
    response = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=2048,
        system="You are a specialist agent. Complete the assigned task precisely and return a structured result.",
        messages=[
            {"role": "user", "content": f"Context: {context}\n\nTask: {task}"}
        ]
    )
    return response.content[0].text

async def orchestrator_agent(goal: str) -> str:
    """Orchestrator that plans subtasks and coordinates workers."""
    # Step 1: Plan subtasks
    plan_response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        system=(
            "You are an orchestrator. Given a goal, output a JSON list of subtasks. "
            "Each subtask must be independently completable. "
            'Format: [{"task": "...", "context": "..."}]'
        ),
        messages=[{"role": "user", "content": f"Goal: {goal}"}]
    )

    import json
    subtasks = json.loads(plan_response.content[0].text)

    # Step 2: Dispatch workers in parallel
    worker_tasks = [
        worker_agent(s["task"], s["context"])
        for s in subtasks
    ]
    results = await asyncio.gather(*worker_tasks)

    # Step 3: Synthesize
    synthesis_input = "\n\n".join(
        f"Subtask: {s['task']}\nResult: {r}"
        for s, r in zip(subtasks, results)
    )
    final = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=2048,
        system="Synthesize the subtask results into a coherent final answer.",
        messages=[{"role": "user", "content": synthesis_input}]
    )
    return final.content[0].text

The key design choice here: workers use Claude Haiku for speed and cost; the orchestrator uses Claude Sonnet for planning quality. You don't need the same model everywhere. Match model capability to the reasoning demand at each step.

Parallel fan-out

Parallel fan-out is the simplest multi-agent pattern: take a list of independent inputs, process each with a separate agent instance, collect results. No orchestrator needed. A plain async dispatcher handles it.

This pattern fits well when you're doing the same operation on many items (summarizing 50 documents, running code review on 20 PRs, extracting structured data from 100 customer emails). The agents don't communicate with each other; they only report back to the dispatcher.

import asyncio
import anthropic

client = anthropic.Anthropic()

async def process_item(item: str, semaphore: asyncio.Semaphore) -> dict:
    async with semaphore:
        response = client.messages.create(
            model="claude-haiku-4-5",
            max_tokens=512,
            messages=[
                {"role": "user", "content": f"Summarize in 2 sentences: {item}"}
            ]
        )
        return {"input": item, "summary": response.content[0].text}

async def fan_out(items: list[str], max_concurrent: int = 5) -> list[dict]:
    semaphore = asyncio.Semaphore(max_concurrent)
    tasks = [process_item(item, semaphore) for item in items]
    return await asyncio.gather(*tasks)

The semaphore is not optional. Without rate limiting, you'll hit API rate limits and pay for retry overhead. Set max_concurrent based on your API tier's requests-per-minute limit, not on what feels fast.

Hierarchical agents

Hierarchical agents extend the orchestrator-worker pattern by adding another layer: a top-level orchestrator delegates to sub-orchestrators, each of which manages its own worker pool. This is appropriate for genuinely complex workflows where different domains need separate coordination logic.

Here's where to use hierarchical agents vs. a flat orchestrator-worker:

Dimension	Flat orchestrator-worker	Hierarchical
Task domains	One or two domains	Three or more distinct domains
Worker specialization	Similar workers	Specialist sub-teams
Coordination complexity	Manageable in one orchestrator	Needs domain-specific routing
Debug complexity	Low	High
Latency	Lower	Higher (more hops)
Best for	Most production workflows	Large-scale agentic pipelines

Our opinion: hierarchical agents are overused. Most teams reach for them before they've exhausted what a well-tuned flat orchestrator can do. The added debug complexity is real. Every extra coordination layer is another place for a task to get lost, a result to be misrouted, or a timeout to cascade.

If you do need hierarchical coordination, LangGraph's graph-based state machine handles the topology well and makes the data flow inspectable. For simpler orchestration, a typed dataclass passed between async functions is often enough and far easier to test.

State and communication

State management is where multi-agent systems most often fall apart in practice. The core question: where does shared state live, and who's allowed to write to it?

Three patterns for inter-agent communication:

Message passing. Agents communicate through an explicit message queue (Redis, RabbitMQ, or an in-process asyncio queue). Each agent reads from its input queue and writes to an output queue. Decoupled, auditable, and easy to replay. The tradeoff: it adds infrastructure.

Shared state store. Agents read and write from a central store (a database, a Redis hash, or a dict in a single-process system). Simpler to set up, but you need to handle concurrent writes carefully. Use optimistic locking or designate a single writer per key.

Blackboard pattern. A shared workspace (the "blackboard") holds all intermediate results. Agents read from it, compute something, and write back. Works well when you can't predict which agent will need which result. It's essentially a structured shared state store with a clear schema.

For most production workloads, Laxaar uses a typed shared state store: usually a Pydantic model persisted to Redis or a lightweight database. It's inspectable, writable from any agent, and survives process restarts. Message queues are worth the overhead when you need strict ordering guarantees or when agents run in separate processes.

The agentic coding workflows article covers how these state patterns apply specifically to software development pipelines, which have their own coordination quirks.

Multi-agent failure modes

Single-agent failures are bad. Multi-agent failures are bad and confusing, because the error might originate in worker A but surface in the orchestrator's output as a nonsensical synthesis.

Coordination deadlocks. Agent A waits for a result from Agent B, which is waiting for a result from Agent A. Fix: always set timeouts on inter-agent calls, and design task graphs to be acyclic.

Result poisoning. One worker returns a hallucinated or malformed result, and the orchestrator treats it as ground truth. Fix: validate worker outputs against a schema before feeding them to the synthesizer. Pydantic does this cleanly.

Context contamination. Workers share context they shouldn't, causing one agent's errors to influence another's reasoning. Fix: scope each worker's context to only what it needs. Don't pass the full conversation history to workers that only need a single task description.

Cascade failures. A single worker failure causes the orchestrator to wait indefinitely or to synthesize with a missing result. Fix: set per-worker timeouts and handle failures explicitly. Skip the failed result, retry once, or surface the error to the caller.

Cost explosions. Parallel workers burn tokens simultaneously. A fan-out of 20 workers running Opus-class models can cost more than you expect in seconds. Fix: set hard token limits per worker, use cheaper models for structured subtasks, and add a cost circuit-breaker at the orchestrator level.

For a structured look at how to catch these failures before they hit production, the evaluating AI agents article covers eval frameworks and failure-injection testing for agent systems.

In our production post-mortems, the culprit almost never turns out to be model quality. It's missing output validation and missing timeouts, both fixable with a few dozen lines of code. If you'd like a review of your multi-agent architecture, our AI agent experts are available for architecture consultations.

Frequently Asked Questions

How many agents is too many in a multi-agent system?

There's no universal limit, but complexity grows faster than agent count. Systems with more than 5–7 agents typically need dedicated tooling for observability and testing. If you're adding agents because the system is hard to extend, that's a design smell. The problem is probably insufficient task decomposition, not insufficient agents.

Can different agents in the same system use different LLMs?

Yes, and this is often the right call. An orchestrator that needs complex planning might use Claude Sonnet or Opus, while high-volume workers doing structured extraction work well with Claude Haiku. Mix models based on the reasoning demand and cost budget of each role.

How do I test a multi-agent system?

Test each agent in isolation first, with mocked inputs and expected outputs. Then test agent pairs, then the full system. Failure injection (deliberately feeding bad outputs from one agent to the next) is especially important and often skipped. Use deterministic seeds or record-replay when you can to make tests repeatable.

What's the difference between multi-agent systems and microservices?

They're conceptually similar: distributed, specialized components communicating through defined interfaces. But agents make LLM calls and produce probabilistic outputs. That makes them harder to test than deterministic microservices. The coordination patterns borrow from distributed systems, but the failure modes are different.

Should I build my own multi-agent framework or use an existing one?

Use an existing one for coordination infrastructure (LangGraph, AutoGen, CrewAI) and write your own agent logic. Don't reinvent task queuing, retry logic, or graph state management. Do own your agent prompts, tool definitions, and evaluation harness. That's where your differentiation lives.

Multi-agent systems are worth the complexity when the problem genuinely requires them. If you're not sure which architecture fits your use case, the Laxaar team offers architecture reviews and full-stack agent development. Reach out through our contact page and we'll help you find the right pattern for your workload.