How Microsoft’s new plan for self-repairing data centers will transform IT roles

tiero/iStock / Getty Images Plus
Follow ZDNET: Add us as a preferred source on Google.
ZDNET’s key takeaways
- Microsoft moves toward autonomous, self-repairing data center platforms.
- Foundry enables long-running AI agents with persistent memory.
- Control Plane provides guardrails, IDs, and threat-aware oversight.
Microsoft unveiled a new series of services to address some longstanding problems with managing data centers at its annual Ignite conference, Tuesday. The company, which well understands these problems because it runs some of the world’s largest data centers, has a new multi-tiered solution that may give all of us IT folks a little peace.
Enterprise data centers are huge, complex operations. Enterprise networks alone consist of a bouillabaisse of distributed services, third-party APIs, proprietary and open source cloud services, and local services. All of these face constant large and small updates, along with constant integrations with new capabilities (or re-integrations because someone changed their API). It never stops.
Also: Microsoft’s new AI agents won’t just help us code, now they’ll decide what to code
Exacerbating the situation is a sense of alert fatigue, maintenance debt, observability gaps, and talent shortages, not to mention the constant threat of external attack.
To solve the software management problems discussed above, we’ll need a much more dynamic AI, one that is constantly running and can act, react, modify systems, and even repair problems based on its training.
Here’s how Microsoft is using AI to take aim at some of the issues keeping IT professionals up at night — alerts and all.
Foundry Agent Service
Microsoft says Foundry Agent Service is a fully managed, enterprise-grade runtime for “hosting, scaling, coordinating, and governing AI agents.” This also includes complex, multi-agent systems.
Basically, it provides a cloud environment for agents to run in, without developers needing to manage infrastructure, containers, orchestration engines, or any of the underlying mechanics of system operation.
This allows developers to focus on agent logic, which is getting more and more complex, especially since Foundry Agent Service is built for long-running multi-step agents that can respond to network situations with alacrity.
Also: Microsoft researchers tried to manipulate AI agents – and only one resisted all attempts
The hosted agent capability is not just limited to the Microsoft Agent Framework. The capability is available for LangGraph, CrewAI, and OpenAI APIs. This is critical because agents can be domain-specific. The ability to run agents from many vendors is essential for a comprehensive runtime environment.
A particularly powerful capability of the Foundry Agent Service is a form of persistent memory that “will allow agents to retain context, preferences and conversation history across sessions with secure, persistent recall integrated into the agent runtime,” according to the company.
Although this feature isn’t here now, the company says it’s coming later this year. Given that “this year” is running out of time, you can probably expect the capability pretty soon. That’s important because native memory capability could reduce the need for external data storage capability for agent operations. Like the infrastructure, data retention may well be done for you, right out of the box.
Foundry Control Plane
Every so often, Facebook will feed me a short video from a user called Utah Yorkies. My very good boy is a Yorkie/Poodle mix and, yes, I have Facebook trained to feed me puppy videos. The Utah Yorkies videos are delightful, showing what must be five or six Yorkies all running hither and yon, at full Yorkie speed.
I kind of think of AI agents the same way. AI agents, especially those built to be fully autonomous, could actually be fully autonomous. That’s not necessarily a good thing. So, to make sure the agents in your data center are working for you, and not constantly seeking treats or pets (or the agent equivalent), Microsoft is introducing Foundry Control Plane, available now in preview.
According to the company, “Foundry Control Plane will bring observability, behavioral guardrails and lifecycle management into one environment where teams can monitor agent health, performance and cost, plus apply policies and take action in real time.”
Also: How Microsoft Sentinel is tackling the AI cybersecurity era
There is a fairly big range of capabilities here, but I want to spotlight one that I think all the others can build upon.
Back in May, Microsoft announced Entra Agent ID, an authentication, authorization, identity protection, and access governance tool for agents. More to the point, it’s a unique, secure, and verifiable ID that can be hung on every agent, allowing for much better tracking and control and even lineage history across environments.
Foundry Control Plane uses this identity element through its other capabilities.
Fleet-wide visibility: Foundry Control Plane provides a wide, comprehensive, and unified view of every agent running in the Foundry environment. Essentially, this is the key “watch itself” function that allows the AI to maintain full and complete awareness of all its components. It’s kind of like a pet GPS, but for agents.
Observability: Microsoft calls this observability, but it’s really a tool for pissing off agents by enabling active, adversarial, red-team testing and evaluation. It’s designed to let a system, under AI control, proactively find its own faults and measure quality, safety, and efficiency. In a sense, some AI agents are watching other AI agents.
Agent controls: This defines and locks down what goes in and what comes out of agents. It’s designed to make sure individual agents only handle what they’re allowed to handle, and validates their output so they don’t do stuff they’re not supposed to be doing.
Wrangling “shadow agents”: Shadow agents are those bits of code that run without oversight, often routines and components that were set loose once to solve some long-forgotten problem and are still out there running. Because Entra Agent ID identifies sanctioned agents, any agents without such an ID can be considered “shadow” agents and can be terminated, or a human can be notified to decide what to do with them.
Security: Foundry Control Plane integrates with Microsoft Defender for threat detection. It also connects to Microsoft Purview, Microsoft’s compliance and data-governance products. Together, these allow autonomous agentic AI-based systems to defend against threats as well as maintain compliance with established guidelines.
Copilot Studio enhancements
If you think of Foundry Agent Service as the engine that runs agentic systems, and Foundry Control Plane as the command center, Copilot Studio is the workshop. Copilot Studio is the environment where developers can craft agents for testing and deployment.
We’ve previously talked about Copilot Studio. In fact, I likened it to “a LEGO set for building AI agents.” In that vein, Microsoft is introducing some new LEGO pieces for its studio. There are three features I think are important to mention.
Also: Microsoft is packing more AI into Windows, ready or not – here’s what’s new
First, agents built in Copilot Studio will be assigned an Entra ID. We talked about why that’s important, but even test agents deployed presumably in controlled environments will have an ID that lets them be trackable by Foundry Control Plane.
Next, Microsoft is introducing agent evaluations, a feature that provides automated tests to measure an agent’s performance against defined scenarios and criteria. By running agents through a gauntlet of test protocols before deployment, this tool provides a powerful automatic feedback loop to refine and tune agents.
Finally, Microsoft has added the ability to integrate real-time monitoring during agent runs. This means that admins can run Microsoft Defender, third-party security platforms, and internal or external custom monitoring solutions during agent runs.
“This ability helps organizations harness the full potential of Copilot Studio while safeguarding against threats like prompt injection attacks using their preferred security infrastructure,” the company said.
Deliberately engineered autonomy
All of these features work together to provide software agents that are adaptive, persistent, recoverable, measurable, and governable. Production systems become self-monitoring, self-correcting, and even self-improving.
This will change IT job roles. Developers will become “intent architects,” site reliability engineers will become autonomy supervisors, and compliance officials will focus more on behavioral governance.
Also: Microsoft’s new AI agents create your Word, Excel, and PowerPoint projects now
This is powerful, thought-provoking stuff that could change how data centers run and are managed. It’s encouraging that Microsoft is the company offering this, because as a huge data center operator, we can expect the company to eat its own dogfood, improving the offering over time because it needs it for its own operations.
What do you think about Microsoft’s move toward autonomous, self-regulating agent platforms? Have you explored Copilot Studio, Foundry Agent Service, or Foundry Control Plane in your own work? Do you see AI agents playing a role in managing your systems, or is this still too risky or immature for production? What kinds of oversight or control would you want before trusting agents to act independently? Let us know in the comments below.
You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.




