SteveITpro - Learning AI & Cloud in Public

LAB518 answered a question that every multi-agent architecture eventually runs into: where does the state live? Microsoft's answer is Azure Cosmos DB, and this hands-on lab made the case by having participants build a multi-agent MCP application where Cosmos DB handled not just persistence but semantic search, agent memory, and cross-agent state coordination. The lab worked in both C# (Microsoft Agent Framework) and Python (LangGraph), which is a welcome acknowledgement that not every enterprise runs a single-language stack.

Session: LAB518 — Build a Multi-Agent MCP Application with Azure Cosmos DB Date: Wednesday, Nov 19, 2025 Location: Moscone West, Level 3, Room 3001

The state management problem in multi-agent systems

Single-agent systems can get away with in-memory state. The agent starts, processes a request, returns a result. If state needs to persist between invocations, a simple key-value store or even a flat file works.

Multi-agent systems break this model immediately. When Agent A discovers information that Agent B needs, that state must be accessible across agent boundaries. When the orchestrator checkpoints a workflow, the entire multi-agent state — including intermediate results, tool call history, and shared context — must be durable. When a workflow resumes after failure, every agent must reconstruct its context from persisted state.

The options, roughly ordered by complexity:

In-memory shared state: Fast, simple, lost on process restart. Fine for prototypes.

Redis or equivalent: Fast, durable, but flat. No built-in semantic search, no complex queries on agent state, no native vector support (Redis vector search notwithstanding — it bolts on rather than builds in).

Relational database: Durable, queryable, but schema-rigid. Agent state is inherently schemaless — different agents produce different state shapes, and those shapes evolve as agents are updated.

Document database with vector support: Durable, schemaless, queryable, with native semantic search. This is the Cosmos DB pitch.

LAB518 did not present this as a neutral comparison. It is a Microsoft lab promoting a Microsoft product. But the technical argument for a document database with vector search as the multi-agent state store is genuinely strong, and the lab demonstrated why.

What the lab built

The lab's application was a customer service system with three coordinating agents:

Triage agent: Receives customer queries, classifies intent, routes to the appropriate specialist agent. State written: customer intent classification, confidence score, routing decision.

Knowledge agent: Searches a product knowledge base using semantic search to find relevant documentation. State written: search queries, retrieved documents, relevance scores.

Resolution agent: Synthesises information from the knowledge agent with customer context to generate a response. State written: resolution content, sources used, satisfaction prediction.

The orchestrator: A Magentic-One lead agent managing the workflow, checkpointing state at each step, and handling failure recovery.

All four components read from and wrote to the same Cosmos DB instance, using different containers for different state types.

# Cosmos DB as the shared state store for multi-agent coordination
from azure.cosmos import CosmosClient
from agent_framework import Agent, SharedStateStore

# Single Cosmos DB instance, multiple containers
cosmos_client = CosmosClient(endpoint, credential)
database = cosmos_client.get_database_client("agent_state")

state_store = SharedStateStore(
    agent_memory=database.get_container_client("agent_memory"),
    workflow_state=database.get_container_client("workflow_state"),
    knowledge_base=database.get_container_client("knowledge_vectors")
)

triage_agent = Agent(
    name="triage",
    instructions="Classify customer intent and route appropriately",
    state_store=state_store
)

Architecture: Cosmos DB as the agent state layer

The lab built a three-tier architecture: agents at the top, MCP in the middle, Cosmos DB at the bottom.

+------------------+    +------------------+    +------------------+
|  Research Agent  |    |  Analysis Agent  |    |  Summary Agent   |
+--------+---------+    +--------+---------+    +--------+---------+
         |                       |                       |
         +----------+------------+-----------+-----------+
                    |                        |
         +----------v-----------+  +---------v-----------+
         |   MCP State Server   |  |  MCP Search Server  |
         +----------+-----------+  +---------+-----------+
                    |                        |
         +----------v------------------------v-----------+
         |              Azure Cosmos DB                   |
         |  +----------+  +-----------+  +------------+  |
         |  | Agent    |  | Session   |  | Vector     |  |
         |  | Memory   |  | State     |  | Index      |  |
         +--+----------+--+-----------+--+------------+--+

Layer 1: Agents. Each agent handles a specific task — research, analysis, or summarisation. Agents do not interact with Cosmos DB directly. They interact with MCP servers that abstract the database layer.

Layer 2: MCP servers. Two MCP servers expose Cosmos DB capabilities. The State Server handles reading and writing agent memory and session state. The Search Server handles semantic queries against stored agent data.

Layer 3: Cosmos DB. Three logical containers within a Cosmos DB database. Agent Memory stores long-term agent knowledge. Session State stores per-interaction context. The Vector Index stores embeddings for semantic search.

Why MCP in the middle matters: Agents do not need to know they are using Cosmos DB. They call MCP tools named "remember," "recall," and "search." The MCP server handles serialisation, partition key management, consistency levels, and vector embedding generation. If you swap Cosmos DB for PostgreSQL with pgvector next year, agents do not change.

MCP integration: How agents connect to Cosmos DB

The lab's distinguishing feature was using MCP (Model Context Protocol) as the interface between agents and Cosmos DB, rather than having agents call the Cosmos SDK directly.

Why MCP matters here: If agents call the Cosmos SDK directly, every agent needs Cosmos-specific code — connection strings, container references, query construction, error handling. Adding a new agent means writing more Cosmos integration code. Changing the state store means rewriting every agent.

With MCP, a Cosmos DB MCP server exposes state operations as tools. Agents call tools like save_state, retrieve_state, semantic_search without knowing that Cosmos DB is the backing store.

# MCP server exposing Cosmos DB operations as tools
from agent_framework import MCPToolServer

cosmos_mcp = MCPToolServer(
    endpoint="http://localhost:9090/mcp",
    tools=[
        "save_agent_state",
        "retrieve_agent_state",
        "semantic_search",
        "get_workflow_checkpoint",
        "save_workflow_checkpoint"
    ]
)

# Agent uses MCP tools — no Cosmos SDK dependency
knowledge_agent = Agent(
    name="knowledge",
    instructions="Search knowledge base for relevant product documentation",
    mcp_servers=[cosmos_mcp],
    allowed_tools=["semantic_search", "retrieve_agent_state"]
)

The abstraction benefit is real: During the lab, participants could swap between a Cosmos DB-backed MCP server and a local SQLite-backed MCP server for testing. The agent code did not change. Only the MCP server implementation changed. This is the MCP value proposition demonstrated concretely — tool abstraction that actually works.

The abstraction cost is also real: MCP adds a network hop and serialisation overhead to every state operation. For high-frequency state updates (an agent writing state after every tool call), this latency accumulates. The lab's workload was modest enough that this was not noticeable, but production workloads with thousands of concurrent agent sessions will feel it.

The MCP server implementation:

Tool Name	Description	Cosmos DB Operation
`remember`	Store agent memory	`create_item`
`recall`	Semantic memory search	`query_items` with vector search
`get_context`	Read session state	`read_item` by partition key
`update_context`	Write session state	`upsert_item` with ETag
`semantic_search`	Find related content	Vector similarity search
`aggregate`	Summarise across records	Aggregate query with GROUP BY

Agent memory: What agents need to remember and how

The lab implemented three types of agent memory, each stored differently in Cosmos DB.

Type 1: Conversational memory (short-term)

What it stores: The current conversation between the user and the agent system. Messages, tool call results, agent-to-agent communications for the active session.

{
  "id": "session-001-msg-042",
  "partition_key": "session-001",
  "type": "message",
  "role": "agent:researcher",
  "content": "Found 3 venues in Seattle matching criteria...",
  "timestamp": "2025-11-19T10:23:45Z",
  "metadata": {
    "model": "gpt-4",
    "tokens": 234,
    "tools_used": ["web_search"]
  }
}

How it is used: When an agent needs context about what has happened in the current interaction, it queries conversational memory. This replaces the "pass the entire message history in the prompt" pattern, which breaks when conversations exceed the context window.

Cosmos DB advantage: Partition by session ID. All messages for one session live in the same logical partition, which means reads are fast and isolated. Session cleanup is a single partition delete, not a scan-and-filter operation.

Type 2: Episodic memory (medium-term)

What it stores: Summaries and key decisions from previous sessions. Not the full conversation, but the distilled knowledge: "User prefers outdoor venues," "Budget was approved at $10,000," "Previous event used Venue X with positive feedback."

{
  "id": "user-42-episode-017",
  "partition_key": "user-42",
  "type": "episode",
  "summary": "Planned Q3 team event. Selected outdoor venue in Bellevue. Budget $8,500. 45 attendees. Positive post-event survey.",
  "key_decisions": [
    "Chose outdoor over indoor due to weather forecast",
    "Selected buffet over plated dinner for budget reasons"
  ],
  "entities": ["Bellevue", "outdoor venue", "Q3 event"],
  "embedding": [0.023, -0.041, 0.087],
  "timestamp": "2025-09-15T00:00:00Z"
}

How it is used: When a new session starts, agents query episodic memory for relevant past interactions. "The user is planning another event — what did they prefer last time?" This is where semantic search becomes essential. The query is not "give me episode 17" but "give me episodes about event planning for this user."

Type 3: Semantic knowledge (long-term)

What it stores: Facts, preferences, and domain knowledge that agents accumulate over time. This is the agent's evolving understanding of the user, the organisation, or the domain.

{
  "id": "knowledge-venue-seattle-001",
  "partition_key": "domain:venues",
  "type": "knowledge",
  "fact": "Outdoor venues in Seattle are risky Oct-Apr due to rain. Indoor backup recommended.",
  "confidence": 0.92,
  "source": "analysis of 15 past events",
  "embedding": [0.015, -0.033, 0.091],
  "last_updated": "2025-11-01T00:00:00Z"
}

How it is used: Agents query knowledge before making recommendations. The research agent, before suggesting outdoor venues in Seattle in November, checks knowledge and finds the rain risk. It adjusts its recommendations accordingly, without the user needing to mention weather.

The design decision that matters: Memory types are separated because they have different lifecycles. Conversational memory is deleted when a session ends. Episodic memory is retained for months. Semantic knowledge persists indefinitely. Cosmos DB's TTL (time-to-live) feature handles this automatically at the document level.

Semantic search: Finding meaning, not just matches

The lab dedicated significant time to Cosmos DB's vector search capabilities and how they transform agent memory from a ledger into a knowledge base.

The problem with traditional search for agents:

Traditional database queries require exact matches or keyword patterns. "Find messages containing 'Seattle venue'" works if the agent used those exact words. But what about messages that discuss "event spaces in the Pacific Northwest" or "locations near Pike Place Market"? Keyword search misses semantically related content.

How vector search works in Cosmos DB:

Embedding generation: When agents store memories, the content is converted to a vector embedding — a numerical representation of the content's meaning, not just its words
Vector indexing: Cosmos DB indexes these embeddings using DiskANN (Microsoft's approximate nearest neighbour algorithm), enabling fast similarity search across millions of documents
Semantic queries: When an agent searches memory, the query is also embedded, and Cosmos DB finds documents whose embeddings are closest in meaning

# MCP tools for agent memory with semantic search
@mcp_tool("remember")
async def remember(content: str, memory_type: str, metadata: dict):
    """Store information in agent memory"""
    embedding = await generate_embedding(content)
    document = {
        "content": content,
        "type": memory_type,
        "embedding": embedding,
        "metadata": metadata,
        "timestamp": datetime.utcnow()
    }
    await cosmos_container.create_item(document)

@mcp_tool("recall")
async def recall(query: str, memory_type: str, limit: int = 5):
    """Retrieve relevant memories using semantic search"""
    query_embedding = await generate_embedding(query)
    results = await cosmos_container.query_items(
        query="""SELECT TOP @limit * FROM c
                 WHERE c.type = @type
                 ORDER BY VectorDistance(c.embedding, @embedding)""",
        parameters=[
            {"name": "@limit", "value": limit},
            {"name": "@type", "value": memory_type},
            {"name": "@embedding", "value": query_embedding}
        ]
    )
    return results

Performance observations from the lab:

Vector searches against the lab's ~500 document corpus returned in under 200ms consistently. This is competitive with dedicated vector databases like Pinecone or Weaviate for this corpus size. The question is whether it scales — Cosmos DB's vector search is relatively new, and benchmarks at enterprise scale (millions of documents, thousands of concurrent queries) are sparse.

Cosmos DB's DiskANN advantage:

The lab highlighted Cosmos DB's DiskANN-based vector indexing. Unlike pure in-memory vector indexes (which consume expensive RAM at scale), DiskANN stores the index on SSD storage with minimal accuracy trade-off. For agent systems that accumulate large memory stores over time, this means vector search costs scale with storage (cheap) rather than memory (expensive).

The honest comparison with dedicated vector databases:

Cosmos DB's vector search is good enough for many use cases, and the operational simplicity of having your document store and vector store in the same database is significant. But dedicated vector databases offer features Cosmos DB does not match: more sophisticated indexing algorithms (HNSW tuning, IVF variants), better tooling for embedding pipeline management, and more mature query optimisation for vector workloads.

The right question is not "which is better" but "do I need a separate vector database?" If your vector search requirements are semantic search over a document corpus with moderate scale, Cosmos DB's built-in vector support eliminates an entire infrastructure component. If you are building a vector-first application with complex similarity search requirements, a dedicated vector database is still the better choice.

Multi-agent coordination through shared state

The most practically valuable exercise had agents coordinating through shared Cosmos DB state rather than direct messaging.

The coordination pattern:

{
    "id": "task-42-coordination",
    "partition_key": "task-42",
    "type": "coordination",
    "status": "in_progress",
    "agents": {
        "researcher": {
            "status": "completed",
            "result_ref": "task-42-research-result",
            "completed_at": "2025-11-19T10:25:00Z"
        },
        "analyser": {
            "status": "working",
            "started_at": "2025-11-19T10:25:05Z"
        },
        "summariser": {
            "status": "waiting",
            "depends_on": ["researcher", "analyser"]
        }
    },
    "shared_context": {
        "user_requirements": "Plan event for 50 people in Seattle, $10k budget",
        "constraints_discovered": [
            "November weather risk",
            "Budget tight for catering + venue"
        ]
    }
}

How agents coordinate:

Coordinator creates a coordination document in Cosmos DB
Each agent reads the coordination document to understand its role and dependencies
As agents complete their work, they update their status in the coordination document
Agents with dependencies poll (or use change feed) to know when their prerequisites are met
All agents can read and write to shared_context for passing information between steps

Why persistent coordination beats in-memory coordination:

Crash recovery: If the analysis agent crashes mid-task, the coordination document persists. When a new instance spins up, it reads the document, sees its status was "working," and resumes (or restarts) the analysis. With in-memory coordination, a crash loses everything.

Audit trail: The coordination document serves as a complete record of the multi-agent workflow. Who did what, when, in what order, and what they produced. This is not just debugging convenience — it is a compliance and audit requirement in regulated industries.

Scale-out: When traffic increases, you can run multiple instances of each agent. The coordination document, partitioned by task ID in Cosmos DB, handles concurrent access with optimistic concurrency control. Two instances of the analysis agent cannot both claim the same task — Cosmos DB's ETag-based concurrency ensures one wins and the other retries with a different task.

Cosmos DB change feed for reactive coordination:

# Agent subscribing to coordination updates via change feed
async for change in cosmos_container.read_change_feed(
    partition_key="task-42",
    start_time=session_start
):
    if change["type"] == "coordination":
        if all_dependencies_met(change, agent_name="summariser"):
            await start_summarisation(change)

The advantage over polling: change feed delivers updates within milliseconds of the write. No wasted queries, no missed updates, no polling interval tuning. Agents react to state changes in near-real-time.

Production deployment patterns

The lab's final exercise covered deployment architecture, which is where most agent labs fall short.

The deployment topology:

MCP server: Deployed as an Azure Container App, exposing Cosmos DB operations as MCP tools
Agent runtime: Deployed as separate Azure Container Apps, one per agent type, connecting to the MCP server
Orchestrator: Deployed as a Durable Functions app, managing workflow state and agent coordination
Cosmos DB: Provisioned with autoscale throughput, partitioned by session ID for even distribution

What worked about this topology:

Each component scales independently. If the knowledge agent is the bottleneck (semantic search is expensive), you scale that container without affecting the triage or resolution agents. The MCP server is the single integration point with Cosmos DB, meaning connection pooling, retry logic, and rate limiting are centralised.

What concerned me about this topology:

The MCP server is a single point of failure. If the MCP server goes down, every agent loses access to state. The lab did not cover MCP server redundancy, health checking, or failover.

Cosmos DB throughput pricing is unpredictable at scale. Autoscale adjusts throughput based on demand, but the cost implications of thousands of concurrent agent sessions, each performing multiple state reads and writes, were not discussed. I have seen Cosmos DB bills surprise engineering teams who did not model their request unit consumption carefully.

Durable Functions for orchestration is operationally complex. Durable Functions are powerful but add significant operational overhead — replay semantics, instance management, purging completed orchestrations, monitoring stuck instances. The lab presented this as straightforward; production experience suggests otherwise.

Consistency level decisions the lab got right

The lab explicitly discussed Cosmos DB consistency levels in the context of agent state, which is a detail most demos skip:

Session consistency (recommended for agent memory): Reads within a session always see that session's writes. Agent A stores a memory, then immediately reads it back — guaranteed to see its own write. Other agents may see the write after a brief delay.

Eventual consistency (acceptable for semantic knowledge): Writes propagate eventually. Suitable for long-term knowledge that does not need to be immediately visible to all agents. Lower cost, higher throughput.

Strong consistency (required for coordination documents): All agents always see the latest state. Essential for coordination where race conditions could cause duplicate work or missed dependencies. Higher latency, higher cost, but correctness guarantees justify it.

The operational guidance: Do not use strong consistency everywhere. It is expensive and slow. Use it only where correctness requires it (coordination). Use session consistency for conversational memory. Use eventual consistency for background knowledge accumulation.

Framework choice: C# versus Python

The lab offered exercises in both C# (using Microsoft Agent Framework) and Python (using LangGraph), which surfaced interesting differences.

C# (Agent Framework): Strongly typed, with compile-time checking of state schemas and MCP tool contracts. The Cosmos DB integration felt native — the SDK is mature, connection patterns are well-established, and serialisation to/from Cosmos DB documents worked seamlessly with C# POCOs.

Python (LangGraph): More flexible, faster to prototype, but the Cosmos DB integration required more manual serialisation code. LangGraph's state management model is different from Agent Framework's, and the mapping to Cosmos DB state was less natural.

The architectural insight: Both tracks connected to the same Cosmos DB instance through the same MCP servers. The MCP abstraction made the framework choice irrelevant at the data layer. Agents built in C# and agents built in Python stored and retrieved memories identically because the MCP protocol, not the framework, defines the interface.

This is the strongest argument for MCP in multi-agent systems: framework interoperability without framework coupling. Your C# team and your Python team can build agents that share state through Cosmos DB without either team knowing or caring what the other is using.

Is Cosmos DB the right choice?

The honest assessment requires considering alternatives:

Cosmos DB strengths for multi-agent state:

Schemaless document model fits agent state naturally
Built-in vector search eliminates a separate vector database for moderate workloads
Change feed enables reactive coordination without polling
Global distribution for multi-region agent deployments
Automatic indexing reduces operational overhead

Cosmos DB weaknesses for multi-agent state:

Cost is difficult to predict and easy to overshoot
Request Unit pricing model requires careful capacity planning
Vector search is newer and less battle-tested than dedicated alternatives
Vendor lock-in to Azure (no Cosmos DB equivalent on AWS or GCP)
Consistency model choices (strong vs eventual) add complexity that most agent developers will get wrong initially

When to use Cosmos DB: You are already on Azure, your agent state is schemaless and evolving, you need semantic search but not at vector-database scale, and you value operational simplicity over cost optimisation.

When to use something else: You need multi-cloud portability, your vector search requirements are complex, you need predictable costs at scale, or your team has more experience with PostgreSQL/MongoDB and values familiarity.

The verdict

LAB518 demonstrated a coherent architecture for multi-agent state management that solves real problems. The MCP abstraction layer over Cosmos DB is architecturally sound. The three-tier memory model is a design pattern worth adopting regardless of storage backend. The shared state coordination approach with change feed is production-viable.

The gaps are operational rather than architectural: cost predictability, MCP server resilience, and the learning curve of Durable Functions orchestration. These are solvable with engineering effort but were glossed over in the lab environment.

If you are building multi-agent systems on Azure, this lab's architecture is a strong starting point. If you are evaluating whether Cosmos DB should be your agent state store, the answer is "probably yes" for moderate-scale workloads with semantic search requirements, and "evaluate carefully" for everything else.

What to watch

Cosmos DB vector search GA features: DiskANN-based vector indexing is powerful but evolving. Watch for quantisation support (smaller embeddings, lower cost), hybrid search improvements (combining vector and text search), and index management tooling.

MCP state management standards: As more agent frameworks adopt MCP, watch for emerging conventions around memory tool naming, state serialisation formats, and coordination patterns. De facto standards will emerge from usage, not specification.

Agent memory cost benchmarks: Independent analysis of Cosmos DB costs for agent memory workloads — RU consumption per memory operation, storage growth rates, embedding generation costs. Without this data, capacity planning is guesswork.

Alternative state backends: PostgreSQL with pgvector, Redis with vector search, and purpose-built agent memory platforms (MemGPT, Zep) are all evolving rapidly. Cosmos DB is a strong choice today, but the landscape is shifting.

Related Coverage:

Session: LAB518 | Nov 19, 2025 | Moscone West, Level 3, Room 3001