Cloud Operations
AKS AI Ops - Self-Healing Kubernetes with AI Agents
Hands-on lab notes from LAB517-R1: AKS AI Ops.
Session: LAB517-R1
Date: Thursday, Nov 20, 2025
Time: 4:30 PM PST - 5:45 PM PST
Location: Moscone West, Level 3, Room 3001
Coming Soon
This article will be published during/after Microsoft Ignite 2025 (Nov 18-21). Full lab walkthrough of AKS AI Ops coming soon.
Lab Overview
What We're Building: Build confidence in managing AKS at scale with next-gen ops tools. In this hands-on lab:
- Simulate a production service hit by traffic spikes
- Discover how AI-driven alerts surface hidden bottlenecks
- Deploy agents that self-heal nodes automatically
- Use open-source tools and the aks-mcp server
- Automate cluster scaling, patch management, and real-time troubleshooting
- Let AI orchestrate Kubernetes and Azure resources with natural-language commands
Technologies:
- Azure Kubernetes Service (AKS)
- AI-driven monitoring and alerting
- Self-healing agent patterns
- aks-mcp server
- Natural-language cluster orchestration
- Pre-built MCP integrations
Key Learning Goals
- AI-Driven Ops - How do AI agents identify and resolve AKS issues?
- Self-Healing - What patterns enable automated node recovery?
- aks-mcp Server - How does MCP enable natural-language cluster management?
- Scale Management - How do agents handle traffic spikes and scaling?
- Production Readiness - What operational patterns work at enterprise scale?
Stay Tuned
Full lab walkthrough, code samples, and operational playbooks coming soon.
Session: LAB517-R1 | Nov 21, 2025 | Moscone West, Level 3, Room 3001