Multi-Agent AI Systems

AI Agent Governance - Preventing Sprawl and Security Risks

The enterprise AI agent problem has quietly become an enterprise AI agent crisis. Organizations that spent the last eighteen months racing to deploy agents now face an uncomfortable question: how many agents are actually running across the business, and who is responsible for them? THR721 at Microsoft Ignite 2025 tackled this head-on, and the answers should concern every platform engineering team in production today.

Session: THR721 Date: Tuesday, Nov 18, 2025 Time: 3:15 PM - 3:45 PM PST Location: Moscone West, Level 3, Theater E


The shadow AI problem is worse than shadow IT ever was

Shadow IT was the governance headache of the 2010s: business units spinning up SaaS subscriptions and cloud resources without IT approval. It was manageable because the blast radius was limited. A rogue Trello board or unsanctioned Dropbox account caused compliance headaches, not operational catastrophes.

Shadow AI agents are a fundamentally different problem. When a business analyst in finance builds a Copilot agent that queries customer payment data, summarises it, and emails it to external auditors, the governance failure isn't just about an unapproved tool. It's about an autonomous system with access to sensitive data, making decisions without human review, operating outside any security boundary the platform team has established.

The scale of the problem: The session cited research suggesting that enterprises adopting AI agents aggressively are deploying between 50 and 500 agents within the first year. Most organisations cannot enumerate how many agents exist, let alone describe what each one does, what data it accesses, or who built it.

This is not a theoretical risk. It is an operational reality that THR721 argued demands the same governance rigour applied to identity, network, and data management.


The agent registry: Discovery before governance

The foundational principle: You cannot govern what you cannot see. The session positioned agent registries as the first requirement for any governance programme, and the reasoning is straightforward.

An agent registry serves the same function as a CMDB for infrastructure or an application portfolio for software: it provides a single source of truth for what exists, who owns it, and what it does. But agents introduce complexities that traditional asset registries were never designed for.

What an agent registry must capture:

  • Identity: A unique, verifiable identity for each agent, tied to the creating user and owning team
  • Purpose and scope: What the agent is designed to do, what data sources it accesses, and what actions it can take
  • Permissions boundary: The explicit set of tools, APIs, and data the agent is authorised to use
  • Lifecycle state: Whether the agent is in development, testing, production, or decommissioned
  • Dependency map: What other agents, services, or data pipelines the agent depends on

The Microsoft approach: Entra Agent ID was positioned as the identity backbone for agent governance. By treating agents as first-class identity objects alongside users and devices, organisations can apply existing Conditional Access policies, audit logging, and access reviews to their agent fleet.

The concept is architecturally sound. The execution challenge is discovery. How do you find agents that were built outside the governance framework? The session acknowledged that retrospective discovery of existing agents is harder than registering new ones, but didn't provide a concrete mechanism for automated discovery across an enterprise.

The critical question: Is an agent registry useful if registration is voluntary? Without enforcement at the platform level, agent registries become aspirational, not operational. The gap between "all agents should be registered" and "all agents are registered" is where governance programmes fail.


Policy enforcement: The difference between guidelines and guardrails

THR721 drew an important distinction between governance policies that advise and governance policies that enforce. Most organisations today have the former. Almost none have the latter.

Advisory governance looks like documentation: "All agents must be reviewed by the security team before production deployment." This is useful. It is also routinely ignored under delivery pressure.

Enforced governance looks like platform controls: agents that are not registered in the registry cannot access production data sources. Agents without approved safety evaluations cannot be deployed to production endpoints. Agents exceeding their declared permission boundary are automatically throttled or terminated.

The enforcement architecture presented:

Pre-deployment gates

Before an agent reaches production, policy enforcement should validate:

  • Safety evaluation results: Has the agent been tested against the organisation's risk taxonomy? Does it pass groundedness, relevance, and safety thresholds?
  • Data access review: Does the agent's declared data access match its actual configuration? Are there over-permissioned tool connections?
  • Owner attestation: Has the owning team confirmed the agent's purpose, scope, and data handling characteristics?

Runtime enforcement

Once deployed, continuous enforcement monitors:

  • Task adherence: Is the agent operating within its declared scope, or has it drifted into unapproved territory?
  • Data flow monitoring: Is the agent accessing data sources or producing outputs that fall outside its approved boundary?
  • Cost and resource consumption: Is the agent consuming resources at expected levels, or does anomalous consumption indicate misuse or malfunction?

The Foundry Control Plane integration was presented as the runtime enforcement mechanism. Guardrails configured in the control plane operate at the API level, intercepting tool calls and responses to enforce policies in real time. This is materially different from post-hoc audit: it prevents policy violations rather than detecting them after the fact.

What was not addressed: How organisations handle policy conflicts between business units. When the compliance team says an agent must log all interactions and the privacy team says certain interactions must not be logged, who arbitrates? Governance frameworks that don't account for policy conflicts become bureaucratic gridlock.


Observability: You cannot govern in the dark

The session made a compelling case that agent observability requires different instrumentation than application observability. Traditional APM tools measure request latency, error rates, and throughput. Agent observability must measure reasoning quality, task completion accuracy, and behavioural drift.

The observability stack for agent governance:

Operational metrics

Standard operational telemetry remains necessary:

  • Invocation counts: How often is each agent called?
  • Latency profiles: How long do agent interactions take?
  • Error rates: What percentage of interactions fail, and how?
  • Cost per interaction: What is the token and compute cost of each agent operation?

Behavioural metrics

This is where agent observability diverges from traditional monitoring:

  • Task completion rate: What percentage of interactions result in the agent successfully completing its intended task?
  • Groundedness score: Are agent responses grounded in provided data, or is the agent hallucinating?
  • Tool call accuracy: Is the agent selecting the correct tools and passing correct parameters?
  • Drift detection: Is the agent's behaviour changing over time, even without code changes? Model updates, data changes, and prompt injection can all cause drift.

Governance metrics

The metrics that matter for compliance and audit:

  • Policy violation rate: How often do runtime guardrails intervene?
  • Data access patterns: What data is each agent actually accessing, and does it match the declared scope?
  • Escalation frequency: How often do agents escalate to human review, and what triggers escalation?

The honest assessment: Building this observability stack is a significant engineering investment. Most organisations struggle to maintain basic application monitoring. Adding behavioural and governance telemetry for potentially hundreds of agents requires dedicated platform engineering capacity that many organisations do not have.

The session positioned this as a Foundry Control Plane capability, which reduces the build burden. But the configuration, tuning, and operational response to observability signals remains an organisational responsibility that cannot be outsourced to tooling.


The governance maturity model: Where most organisations actually are

THR721 implicitly presented a maturity model for agent governance, though it wasn't explicitly framed this way. Reading between the lines, the progression looks like this:

Level 0 - Ungoverned: Agents are being built and deployed with no central visibility. Multiple teams use different frameworks. No shared standards for safety, security, or data access. This is where most enterprises are today.

Level 1 - Inventory: A registry exists. The organisation can enumerate its agents and identify owners. Registration may not be enforced, but visibility exists for compliant teams.

Level 2 - Policy-defined: Governance policies are documented and communicated. Pre-deployment review processes exist. Safety evaluations are required but may not be automated.

Level 3 - Policy-enforced: Platform controls prevent unregistered agents from accessing production resources. Automated safety evaluations gate deployment. Runtime guardrails enforce behavioural boundaries.

Level 4 - Continuously governed: Behavioural monitoring detects drift and policy violations in real time. Governance metrics feed into risk management. Agent fleet health is managed with the same rigour as infrastructure fleet health.

The uncomfortable truth: The session's content was aimed at Level 3 and Level 4. Most enterprises in the audience were at Level 0 or Level 1. The gap between where organisations are and where they need to be is not primarily a tooling gap. It is an organisational maturity gap that requires executive sponsorship, dedicated staffing, and cultural change.


The organisational challenge: Who owns agent governance?

This was the most interesting tension in the session, largely because it was left unresolved.

The candidates:

Platform Engineering already owns infrastructure governance, CI/CD pipelines, and developer experience. Adding agent governance to their remit is a natural extension, but platform teams are already stretched thin.

Security teams own threat detection, vulnerability management, and compliance controls. Agent-specific risks like prompt injection and data leakage fall within their domain, but most security teams lack AI expertise.

AI/ML teams understand model behaviour, evaluation techniques, and safety testing. But they typically operate as a centre of excellence, not a governance function.

The answer from the session: All three. Microsoft's framing positioned agent governance as a shared responsibility, with Foundry Control Plane as the integrating platform. Security teams configure guardrails. Platform teams enforce deployment gates. AI teams define evaluation criteria.

The problem with shared responsibility: Without clear accountability, shared responsibility becomes no responsibility. The session didn't address how to prevent governance gaps at the boundaries between teams, which is precisely where most governance failures occur.

A practical recommendation that emerged: Establish an Agent Governance Board with representatives from platform, security, and AI teams. This board owns the governance framework, arbitrates policy conflicts, and reviews agent fleet health on a regular cadence. It mirrors the pattern of Cloud Centre of Excellence teams that many enterprises established during cloud adoption.


What the session got right

Treating agents as first-class entities: The identity-first approach, extending Entra to agents, is architecturally sound. Agents need verifiable identity, auditable actions, and governed permissions. Building on existing identity infrastructure rather than creating a parallel governance system is pragmatic.

Runtime enforcement over advisory governance: The emphasis on platform-level enforcement rather than process-level compliance is the right architectural decision. Policies that can be bypassed will be bypassed.

Behavioural observability: Recognising that agent monitoring requires different metrics than application monitoring is important. Task adherence, groundedness, and drift detection address AI-specific risks that traditional monitoring misses.


What the session left unresolved

Discovery of existing agents: How do you find and register agents that were built before the governance framework existed? Automated discovery across Azure, third-party platforms, and custom deployments was mentioned but not demonstrated.

Cross-platform governance enforcement: If agents are built on non-Microsoft platforms, can Foundry Control Plane enforce governance policies, or only observe? The distinction between visibility and control matters enormously.

Governance cost and overhead: Every governance control adds friction to the development process. The session didn't address how to balance governance rigour with development velocity, which is the central tension platform engineering teams face daily.

Regulatory alignment: Different industries face different regulatory requirements for AI governance. The session presented a general-purpose framework without addressing how it maps to specific regulatory regimes like the EU AI Act, financial services AI regulations, or healthcare AI requirements.

Agent decommissioning: The lifecycle discussion focused on registration and runtime governance. What happens when agents need to be retired? Data retention, dependency management, and graceful shutdown of agents that other systems depend on were not addressed.


The verdict

THR721 correctly identified agent governance as one of the most pressing operational challenges facing enterprises deploying AI at scale. The framework presented, built on identity, policy enforcement, and behavioural observability, represents the right architectural direction.

But the session was more aspirational than operational. The tooling exists or is being built. The harder problem is organisational: establishing ownership, enforcing registration, staffing governance functions, and building the operational muscle to manage agent fleets with the same discipline applied to infrastructure fleets.

For platform engineering teams, the immediate action is clear: establish an agent registry, enforce registration at the platform level, and instrument basic observability. These are Level 1 and Level 2 activities that most organisations can begin today, regardless of tooling maturity.

The organisations that solve agent governance early will have a significant advantage. Those that wait until shadow AI agents cause a compliance breach or security incident will find themselves doing the same work under considerably more pressure.


What to watch

EU AI Act compliance tooling: As the AI Act enforcement timeline approaches, agent governance frameworks will need to map directly to regulatory requirements. Watch for compliance-specific governance templates and automated reporting.

Agent discovery automation: The gap between "register all agents" and actually finding them needs automated tooling. Network-level agent traffic detection, API gateway integration, and platform-level agent enumeration will be critical capabilities.

Cross-platform governance standards: Microsoft's approach assumes Foundry Control Plane as the governance hub. Multi-cloud enterprises need governance standards that work across Azure, AWS, and GCP agent platforms. Industry standards for agent governance metadata and interoperability are overdue.

Governance-as-code: The DevOps pattern of infrastructure-as-code should extend to agent governance. Declarative governance policies, version-controlled and applied through CI/CD pipelines, will reduce the operational burden of governing agent fleets at scale.


Related Coverage:


Previous
Copilot Agents with TypeSpec
Built: Mar 13, 2026, 12:43 PM PDT
80d1fe5