Multi-Agent AI Systems
Building Multi-Agent Systems with Azure AI Foundry
Microsoft's BRK197 session at Ignite 2025 laid out the engineering blueprint for moving from single-agent prototypes to production multi-agent orchestration. The session title says "the right way," which is bold language for a platform that merged two competing frameworks months before the conference. But the technical substance backs it up: shared state management, OpenTelemetry observability baked into the runtime, MCP toolchains for interoperability, and the A2A protocol for agent-to-agent communication. This is the session that turned agent orchestration from a whiteboard exercise into deployable architecture.
Session: BRK197 - Build multi-agent systems the right way with Azure AI Foundry Date: Wednesday, Nov 19, 2025 Time: 10:15 AM - 11:00 AM PST Location: Marriott Marquis, Yerba Buena Ballroom BO2
The real question: Can you orchestrate agents without orchestrating chaos?
The problem with multi-agent systems has never been building individual agents. That part is almost trivially easy with modern frameworks. The hard part is what happens when Agent A needs to hand context to Agent B, Agent B calls a tool that takes 30 seconds, Agent C is waiting on both of them, and a human needs to approve something in the middle. Add observability requirements, deployment at scale, and the need for agents built by different teams (or different companies) to interoperate, and you have a genuine engineering problem.
BRK197 addressed this head-on, walking through the Microsoft Agent Framework's approach to multi-agent orchestration with real demonstrations, not PowerPoint architectures.
The Semantic Kernel and AutoGen merger: Why it matters for orchestration
The backstory: Microsoft had two agent frameworks growing in parallel. Semantic Kernel handled model orchestration, tool calling, and context management beautifully but treated multi-agent coordination as an afterthought. AutoGen excelled at agent-to-agent communication and conversation patterns but lacked Semantic Kernel's model flexibility and enterprise integration depth. Developers were forced to choose between good single-agent capabilities and good multi-agent coordination.
The merger: The unified Microsoft Agent Framework combines both, and the session demonstrated what this means architecturally.
What Semantic Kernel contributes:
- Model orchestration with provider flexibility (swap between GPT-4, Claude, Llama without rewriting agent logic)
- Tool calling infrastructure
- Context and memory management
- Prompt engineering patterns
What AutoGen contributes:
- Native agent-to-agent communication
- Conversation patterns (round-robin, selector, Magentic-One)
- Group chat abstractions
- Multi-agent coordination primitives
The practical result: You can build a coordinator agent that routes tasks to specialised agents, each with their own tools and model configurations, communicating through standardised patterns, all in one framework. Before the merger, this required gluing two frameworks together with custom code.
Shared state: The architecture that makes multi-agent systems possible
The problem with stateless agents: In a multi-agent system, agents need to share context. If Agent A discovers that the user is in Seattle and has a $10,000 budget, Agent B (handling venue selection) needs that information without the user repeating themselves, and without Agent A explicitly passing every detail through a message.
How shared state works in the framework:
The Microsoft Agent Framework implements shared state through a context object that all agents in an orchestration can read from and write to. This is not a shared database -- it is an in-memory state container that flows through the orchestration.
# Shared state accessible to all agents in the orchestration
shared_context = {
"user_requirements": {
"location": "Seattle",
"budget": 10000,
"attendees": 50
},
"venue_options": [], # Populated by venue agent
"logistics_plan": {}, # Populated by logistics agent
"budget_tracking": {} # Updated by budget agent
}
Why this matters architecturally:
Consistency: All agents see the same state. When the budget agent deducts a venue cost, the logistics agent immediately sees the remaining budget.
Conflict resolution: The framework handles concurrent state access. Two agents cannot simultaneously update the same state property without coordination.
Checkpointing: Shared state integrates with the durable agents extension. If the orchestration crashes, state is recoverable from the last checkpoint.
The design trade-off: Shared state creates coupling between agents. Agent B depends on Agent A having populated certain state properties. This is a deliberate architectural choice -- the alternative (pure message-passing) is more loosely coupled but makes coordination significantly harder. Microsoft chose pragmatism over purity.
Human-in-the-loop: Not just an approval button
The session demonstrated human-in-the-loop as an architectural pattern, not just a "click approve" checkpoint. Three distinct patterns emerged:
Pattern 1: Approval gates
The simplest pattern. Agent reaches a decision point, pauses execution, presents options to human, waits for approval, continues. The durable agents extension means the agent spins down during the wait (could be minutes, could be days) and resumes from checkpoint when the human responds.
# Human approval gate in multi-agent workflow
async def request_approval(context, decision):
checkpoint = await context.create_checkpoint()
await context.notify_human(
message=f"Approve decision: {decision}",
options=["approve", "reject", "modify"]
)
# Agent spins down here -- no compute cost while waiting
response = await context.wait_for_human()
return response
Pattern 2: Collaborative refinement
Human and agent iterate on a solution. Agent proposes, human adjusts, agent incorporates feedback and re-proposes. This is closer to pair programming than traditional approval workflows. The shared state tracks the evolution of the solution through iterations.
Pattern 3: Oversight escalation
Agent operates autonomously but escalates when confidence drops below a threshold or when the task falls outside its defined scope. The key insight from the session: escalation is not failure. A well-designed agent that recognises its limitations and escalates is more valuable than one that ploughs ahead with low confidence.
The architectural implication: Human-in-the-loop workflows require asynchronous agent execution. Agents cannot block on human input -- they must checkpoint, spin down, and resume. This is why the durable agents extension is not optional for production multi-agent systems. It is the foundation that makes human interaction economically viable.
OpenTelemetry observability: Seeing inside the black box
The problem with multi-agent observability: When one agent calls another, which calls a tool, which triggers a third agent, debugging failures requires tracing the entire execution path. Traditional logging gives you fragments. You need distributed tracing.
The approach: The Microsoft Agent Framework uses OpenTelemetry (GA, not preview) for end-to-end tracing of multi-agent workflows.
What gets traced:
- Agent invocations: Which agent was called, when, with what input
- Model calls: Which LLM was invoked, prompt sent, response received, token count, latency
- Tool executions: Which tools were called, parameters passed, results returned, execution time
- Agent-to-agent communication: Messages between agents, routing decisions, state changes
- Human-in-the-loop interactions: When humans were notified, how long they took to respond, what they decided
Why OpenTelemetry matters (versus proprietary telemetry):
OpenTelemetry is the industry standard. This means you can send agent traces to Datadog, New Relic, Jaeger, Azure Monitor, or any observability platform that understands OpenTel. You are not locked into Microsoft's monitoring stack.
{
"trace_id": "abc123",
"spans": [
{
"name": "coordinator_agent",
"duration_ms": 4500,
"children": [
{
"name": "venue_agent",
"duration_ms": 2100,
"attributes": {
"model": "gpt-4",
"tokens_used": 1847,
"tools_called": ["web_search", "venue_database"]
}
},
{
"name": "budget_agent",
"duration_ms": 800,
"attributes": {
"model": "gpt-4o-mini",
"tokens_used": 423
}
}
]
}
]
}
The operational insight: The session demonstrated clicking into individual agent spans during execution. Not after the fact -- during. You can see what an agent is doing right now, what context it is working with, which tools it has available. This transforms debugging from archaeology (digging through logs after something breaks) to live observation.
The cost visibility angle: OpenTelemetry traces include token usage per model call. When your multi-agent orchestration costs more than expected, you can trace exactly which agent is consuming tokens and optimise specifically. This is not possible with aggregate billing alone.
MCP toolchains: Agents sharing tools across boundaries
Model Context Protocol (MCP) is the integration layer that makes multi-agent systems genuinely useful. Without it, every agent needs its own tool integrations. With it, tools become shared infrastructure.
How MCP toolchains work in multi-agent systems:
Scenario: A multi-agent system for enterprise operations. One agent handles incident detection, another handles root cause analysis, a third handles remediation.
Without MCP: Each agent needs its own integration with ServiceNow, PagerDuty, GitHub, Azure Monitor. Three agents times four integrations equals twelve custom connectors.
With MCP: Each external system exposes an MCP server. All three agents connect to the same MCP servers. Three agents times one MCP layer equals three connections, regardless of how many external systems are involved.
# MCP tool configuration shared across agents
mcp_tools = MCPToolchain(
servers=[
MCPServer("servicenow", endpoint="mcp://servicenow.internal"),
MCPServer("pagerduty", endpoint="mcp://pagerduty.internal"),
MCPServer("github", endpoint="mcp://github.internal"),
]
)
# Both agents use the same toolchain
incident_agent = Agent(name="incident_detector", tools=mcp_tools)
remediation_agent = Agent(name="remediator", tools=mcp_tools)
The security implication: MCP toolchains centralise tool access control. Instead of configuring permissions for each agent-tool pair, you configure MCP server access policies once. An agent either has access to the ServiceNow MCP server or it does not. Granular permissions are enforced at the MCP server level, not in every agent.
The interoperability implication: MCP is not Microsoft-proprietary. Any agent framework that supports MCP can use the same tool servers. This means agents built with LangChain, CrewAI, or custom frameworks can share tools with Microsoft Agent Framework agents through the same MCP infrastructure.
A2A protocol: When agents need to talk to agents they do not control
The distinction: MCP handles agent-to-tool communication. A2A (Agent-to-Agent) handles agent-to-agent communication across trust boundaries.
Why A2A matters:
Inside your organisation, agents communicate through the framework's built-in patterns (shared state, group chat, coordinator routing). But what about agents built by partners, vendors, or other organisations?
A2A provides:
- Discovery: Agent A can discover Agent B's capabilities without prior knowledge
- Negotiation: Agents agree on communication protocols and data formats
- Delegation: Agent A can delegate a task to Agent B and receive results
- Trust boundaries: Each agent maintains its own security context
The enterprise scenario:
Your procurement agent needs to check pricing from a supplier's agent. Your agent is built on Microsoft Agent Framework. The supplier's agent is built on something else entirely. A2A provides the interoperability protocol that makes this work without tight integration.
The Activity Protocol connection: The session referenced the Activity Protocol alongside A2A. Where A2A handles inter-agent communication, the Activity Protocol standardises how agents report on task progress. Think of it as a status reporting standard: "I started task X," "I am 60% complete," "I am blocked on human input," "I completed task X with result Y." This matters when coordinating agents across systems -- you need a common language for progress reporting.
The honest question: A2A adoption is nascent. The protocol exists, but the ecosystem of A2A-compatible agents is thin. Whether A2A becomes the HTTP of agent communication or the SOAP of agent communication (technically correct but practically ignored) depends on adoption beyond Microsoft's ecosystem.
Deploying as containerised agents: The production architecture
The deployment model: Agents deploy as containers on Azure AI Foundry's infrastructure. This is significant because it maps agent deployment to existing DevOps patterns.
What containerised deployment provides:
Isolation: Each agent runs in its own container. Agent A crashing does not take down Agent B. This is table stakes for production but was not always the case with earlier agent frameworks that ran everything in-process.
Scaling: Individual agents scale independently. If the venue-finding agent receives ten times more traffic than the budget agent, only the venue agent scales up. You are not paying to scale the entire orchestration.
Versioning: Agents can be versioned independently. Deploy a new version of one agent without redeploying the entire system. Canary deployments, blue-green deployments, and rolling updates all work the way they do for any containerised workload.
CI/CD integration: Agent containers flow through standard pipelines. Build, test, scan, deploy. No special agent-specific deployment tooling required -- your existing Kubernetes or Azure Container Apps infrastructure works.
# Agent deployment manifest (conceptual)
agents:
- name: coordinator
image: acr.azurecr.io/agents/coordinator:v2.1
replicas: 2
resources:
cpu: "1"
memory: "2Gi"
- name: venue-finder
image: acr.azurecr.io/agents/venue:v1.4
replicas: 3 # Higher replica count for heavier workload
resources:
cpu: "2"
memory: "4Gi"
- name: budget-tracker
image: acr.azurecr.io/agents/budget:v1.1
replicas: 1
resources:
cpu: "0.5"
memory: "1Gi"
The operational reality: Containerised agents mean your platform engineering team manages agent infrastructure with the same tools and skills they use for everything else. No separate "AI ops" team needed -- agents are just containers with unusual workloads.
The orchestration patterns demonstrated
The session walked through three orchestration patterns, each suited to different multi-agent scenarios:
Pattern 1: Coordinator (hub-and-spoke)
One coordinator agent receives all requests, routes to specialised agents, aggregates responses. Simple, predictable, easy to debug. Fails when the coordinator becomes a bottleneck.
Pattern 2: Round-robin
Agents take turns contributing to a task. Each agent adds its perspective, then passes to the next. Works well for review and refinement workflows where multiple perspectives improve output quality.
Pattern 3: Selector (Magentic-One style)
An intelligent selector dynamically chooses which agent should handle the next step based on the current state. More flexible than coordinator but harder to debug because routing decisions are model-driven, not rule-driven.
The selection guidance: Start with coordinator for predictable workflows. Move to selector when you need dynamic routing. Use round-robin when all agents contribute equally to the output. Do not start with the most complex pattern because you think you will need it eventually.
What was not addressed
Several production concerns went unmentioned:
1. State management at scale
Shared state works elegantly for small orchestrations. What happens when hundreds of concurrent orchestrations share a state management layer? Memory pressure, conflict resolution overhead, and state serialisation costs were not discussed.
2. Agent versioning in running orchestrations
When you deploy a new version of one agent in a multi-agent system, what happens to in-flight orchestrations using the old version? Does the orchestration complete with old agents or hot-swap to new ones? Neither option is without risk.
3. Cross-region orchestration
The demo ran in a single region. Multi-agent systems spanning regions introduce latency, data residency concerns, and network partition scenarios. None of these were addressed.
4. Cost predictability
Multi-agent systems multiply model costs. A coordinator calling three agents, each making multiple model calls with tool use, generates token costs that are difficult to predict. The session did not address cost management or budget controls for multi-agent orchestrations.
5. Testing multi-agent systems
Unit testing individual agents is straightforward. Integration testing multi-agent orchestrations with non-deterministic model outputs, shared state, and human-in-the-loop checkpoints is not. No testing strategy was presented.
The honest assessment
BRK197 delivered the most technically substantive multi-agent architecture session at Ignite 2025.
What is genuinely useful:
Shared state with checkpointing: Solving the coordination problem without forcing pure message-passing makes building multi-agent systems pragmatically achievable. The durable agents integration means human-in-the-loop workflows do not burn compute while waiting.
OpenTelemetry as the observability standard: Not inventing proprietary telemetry is the correct architectural choice. Agent traces flowing to existing observability platforms means you debug agents with tools you already know.
MCP toolchains as shared infrastructure: Decoupling tool integration from individual agents eliminates the N-agents-times-M-tools connector problem. This is a genuine architectural improvement.
Containerised deployment model: Mapping agents to containers makes agent operations a platform engineering problem, not a novel infrastructure challenge.
What is concerning:
A2A adoption uncertainty: The protocol is technically sound but ecosystem adoption is the real question. Without broad adoption, A2A becomes a Microsoft-internal standard rather than an interoperability protocol.
Orchestration complexity hidden: The demos used three to four agents coordinating on straightforward tasks. Production multi-agent systems with dozens of agents, complex dependency chains, and failure modes across multiple layers are a different engineering challenge.
Framework maturity: The Semantic Kernel / AutoGen merger is recent. Production multi-agent systems built on this foundation are betting on framework stability that has not yet been proven by time.
The verdict
BRK197 answered the question I went in with: how do you move from single-agent demos to production multi-agent orchestration? The answer is architecturally sound -- shared state for coordination, OpenTelemetry for observability, MCP for tool integration, A2A for cross-boundary communication, and containers for deployment.
The gap is between the architecture and the operational reality. Managing multi-agent systems in production will surface problems that conference demos cannot anticipate: state management at scale, cost unpredictability, cross-region orchestration, and the testing challenge of non-deterministic distributed systems.
But the foundations are right. If you are building multi-agent systems on Azure, this is the architecture to follow. Start with the coordinator pattern, add complexity only when you need it, instrument everything with OpenTelemetry, and treat agents as containers. The framework gives you the pieces. Assembly is still your problem.
What to watch
Framework stability: Does the unified Microsoft Agent Framework hold, or does another refactoring arrive in 12 months? Track breaking changes in release notes.
A2A ecosystem growth: Monitor adoption beyond Microsoft. If Google, Amazon, and independent framework builders adopt A2A, it becomes a real standard. If not, it remains a Microsoft convention.
MCP server ecosystem: The value of MCP toolchains depends on available MCP servers. Track which enterprise tools ship MCP servers natively versus requiring custom builds.
Multi-agent cost management: Watch for Azure tooling that helps predict and control costs for multi-agent orchestrations. Without this, production budgeting remains guesswork.
Production case studies: BMW's 12x acceleration is compelling but singular. Watch for additional production multi-agent deployments that validate the architecture under different workloads and at different scales.
Related Coverage:
- Microsoft Agent Framework: The migration path nobody asked for
- AI Fleet Operations: Foundry Control Plane
- Azure SRE Agent Deep Dive
- Agent-Framework: Unified Platform for A2A Agents