Why a single agent fails at scale: MAS fundamentals and three control topologies
The monolithic agent is deceptively easy to prototype — and brittle in production at any meaningful scale.
Context window ceilings: Intermediate state fills the window and reasoning quality degrades sharply.
Jack-of-all-trades dilution: One agent doing retrieval, coding, and audit does none well.
No concurrency: Sequential execution means total latency is the sum of every step.
Single point of failure: One bad model call stops the entire workflow.
Evidence: Google's Agent Bake-Off (MLflow 2026) cut processing from one hour to ten minutes (6×). AdaptOrch delivers 12–23% gains on SWE-bench when topology matches the task — larger than model choice effects.
A multi-agent system (MAS) is independent agents collaborating through defined protocols and orchestration. Each agent should be single-responsibility, tool-equipped, state-isolated, and independently replaceable.
| Topology | Pros | Cons | Best for |
|---|---|---|---|
| Centralized | Auditable, controllable | Orchestrator bottleneck | Compliance, fixed flows |
| Decentralized | Resilient, low latency | Hard to debug | Peer negotiation |
| Hierarchical | Balances both | Moderate complexity | Enterprise multi-team systems |
Six orchestration design patterns: sequential pipeline through hybrid architecture
These six patterns cover the vast majority of real production systems.
| Pattern | Core idea | Latency | Use case |
|---|---|---|---|
| ① Sequential pipeline | Strict A→B→C linear flow | Sum of steps | Content creation, compliance review |
| ② Parallel fan-out/fan-in | Concurrent workers, synthesizer merge | Max of branches | Multi-source research, risk assessment |
| ③ Hierarchical supervisor-worker | Supervisor routes to specialists | Dynamic | Coding assistants, enterprise support |
| ④ Swarm | Peer-to-peer, termination rules | Unpredictable | Code review debate (use sparingly) |
| ⑤ Blackboard | Shared workspace, precondition triggers | Async | Hours-to-days heterogeneous workflows |
| ⑥ Hybrid | Router + supervisor + pipeline + parallel | Mixed | Enterprise content platforms |
Pattern 1 — LangGraph sequential: retriever → analyzer → writer via StateGraph. Pattern 2 — Send API: Send("research_worker", task) lists dispatch true concurrency; Annotated[list, operator.add] merges results. Pattern 3 — two-tier routing: keyword fast path (<1ms) plus LLM fallback. Pattern 4 — AutoGen GroupChat: hard max_round=6 cap. Pattern 5 — blackboard: agents activate when task_status: research_done. Pattern 6 — hybrid: intent router → simple direct answer / complex report → supervisor → parallel research + quality pipeline.
builder = StateGraph(PipelineState)
builder.add_node("retriever", retrieval_agent)
builder.add_node("analyzer", analysis_agent)
builder.add_node("writer", writer_agent)
builder.add_edge(START, "retriever")
builder.add_edge("retriever", "analyzer")
builder.add_edge("analyzer", "writer")
builder.add_edge("writer", END)
pipeline = builder.compile()AdaptOrch is decisive: in multi-agent systems, how you compose agents matters more than which model runs underneath.
LangGraph vs CrewAI vs AutoGen and the MCP + A2A dual protocol layer
| Dimension | LangGraph | CrewAI | AutoGen |
|---|---|---|---|
| Model | State machine graph | Role-based crews | Conversation groups |
| State | Native | Custom needed | Limited |
| HITL | Native interrupt() | Custom needed | Supported |
| Observability | LangSmith | Limited | Azure Monitor |
| Production readiness | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ |
| Prototyping speed | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
Choose LangGraph for regulated industries, long-running state, fine HITL. CrewAI for 1–2 day prototypes and role-based pipelines. AutoGen on Microsoft/Azure for conversational debate.
2026 communication stacks on two complementary protocols (Linux Foundation Agentic AI Foundation): MCP (vertical) standardizes tool/DB/API access — write once, use everywhere. A2A (horizontal) — Google open-sourced April 2025, v1.0 early 2026 — standardizes delegation, discovery, and sync across 50+ partners including Atlassian, Salesforce, and SAP.
// /.well-known/agent.json
{
"name": "ResearchAgent",
"skills": [{"id": "web_research", "name": "Web Research"}],
"capabilities": {"streaming": true, "async": true}
}Orchestrators fetch Agent Cards, verify skills, then send JSON-RPC 2.0 message/send tasks. MCP servers expose tools via @app.list_tools() and @app.call_tool().
Six-step production rollout: persistence, HITL, circuit breaking, and observability
Minimum Runbook requirements from demo to production.
PostgreSQL checkpoints: PostgresSaver with thread_id — resume after process restarts.
Human-in-the-loop: interrupt() before high-risk actions; proceed or cancel on human decision.
Circuit breaker: CLOSED/OPEN/HALF_OPEN with failure_threshold=5, recovery_timeout=60s.
Token budget: TokenBudgetManager checks remaining budget before each agent call.
Distributed tracing: OpenTelemetry correlation_id on every agent span with tokens and status.
Handoff validation + LLM-as-Judge: schema checks, confidence thresholds (<0.7 reject), automated quality scoring on four dimensions.
human_decision = interrupt({
"proposed_action": proposed_action,
"risk_level": "HIGH",
"message": "This will modify the production database. Confirm to proceed."
})Hard caps: MAX_ITERATIONS=10, MAX_TOOL_CALLS_PER_AGENT=20, MAX_TOTAL_TOKENS=50_000. LangGraph supports interrupt_before=["high_cost_tool"].
Observability metrics, four pitfalls, decision framework, and 2026 trends
MAST analysis of 1,642 traces: system design failures 41.77%, inter-agent misalignment 36.94%, verification failures 21.30%. 57% run agents in production; only 8% finished observability — errors return HTTP 200 while dashboards stay green.
| Metric | Target | Notes |
|---|---|---|
| task_success_rate | >85% | End-to-end completion |
| e2e_latency_p95 | <30s | P95 latency |
| agent_error_rate | <5% | Per-agent errors |
| hallucination_rate | Sampled | LLM-as-Judge or human labels |
Pitfall 1 — context pollution: one agent's hallucination becomes the next agent's ground truth — validate at every handoff.
Pitfall 2 — runaway loops: retry spirals burn budgets 100× — set hard caps everywhere.
Pitfall 3 — over-engineering: eight agents for a two-step chain — start sequential; sweet spot is 3–8 agents.
Pitfall 4 — demo-to-production gap: input length limits, injection detection, PII redaction, harmful content checks.
Decision tree: sequential deps → parallelizable? no→pipeline, yes→hybrid; no deps → authority agent? yes→supervisor-worker; else long async? yes→blackboard; else ≤5 agents with termination? yes→swarm, no→refactor hierarchical.
2026 trends: federated orchestration, multimodal multi-agent, adaptive topology selection (AdaptOrch), EU AI Act audit-trail requirements.
Running LangGraph + PostgreSQL checkpoints + OpenTelemetry on a MacBook fails when the lid closes. Linux VPS loses Cursor MCP, Keychain credentials, and Xcode adjacency. Co-locating gateway and local inference on underpowered hardware causes swap thrash. For 7×24 multi-agent orchestration with stable MCP/A2A access and HITL review, KVMNODE dedicated Mac Mini M4 / M4 Pro is usually the better production host: launchd persistence, six regions, flexible daily/weekly/monthly terms. See the pricing page, order entry, and help center.