Business

SaaS

Technology

•

Apr 28, 2026

How to Build an AI Agent Governance and Monitoring Programme from Scratch

Prevention is not a guarantee. When an AI agent is compromised — through prompt injection, a poisoned tool response, or a manipulated context — your governance and monitoring programme is the difference between a contained incident discovered in hours and a severe breach discovered three months later in a forensics report.

Gartner predicts more than 40% of agentic AI projects will be cancelled by end of 2027, with inadequate risk controls as a primary cause. Microsoft’s Cyber Pulse report from February 2026 found that 80% of Fortune 500 companies have active AI agents running, and 29% have unsanctioned agents without IT visibility. Shadow AI is already an operational reality — not a future problem.

This is a sequenced programme-building guide starting from zero — no agent inventory, no policy, no HITL architecture — structured around the Microsoft five-capability framework. For broader context, see the complete AI agent security guide.

Why does monitoring matter as much as your technical security controls?

Monitoring is the safety net that catches what prevention misses. A technically compliant agent can still be manipulated via prompt injection or tool chain abuse, and prevention controls alone cannot detect an agent that continues to behave plausibly while compromised. You need the safety net.

Governance and monitoring are two halves of the same programme. Governance defines what agents are permitted to do. Monitoring proves those policies are being respected — or violated. Without observability, organisations discover compromises far later, by which point lateral movement and data exfiltration have already occurred.

That 40% Gartner cancellation figure is a governance failure signal, not a technical one. Projects fail where there is no audit trail, no ownership clarity, and no defensible record for leadership or regulators.

Here is the core logic: you cannot govern what you cannot see. You cannot monitor what you have not registered. And you cannot respond to what you have not classified. Build the programme in sequence.

That sequence is exactly what the Microsoft five-capability framework provides.

What is the Microsoft five-capability framework and how does it structure a governance programme?

The Microsoft Cyber Pulse report (February 2026) identified five foundational capabilities every AI agent governance programme must develop. They sequence naturally from discovery to enforcement.

Registry — A centralised inventory of every agent: sanctioned, third-party, and shadow. Nothing else in the programme functions without this baseline.

Access Control — Identity and permissions governance using the same least-privilege principles applied to human users.

Visualisation — Real-time dashboards and telemetry that make agent behaviour legible to security and leadership teams. This is the monitoring layer.

Interoperability — Governance of agent-to-agent communication via protocols like MCP (Model Context Protocol) and A2A (Agent-to-Agent Protocol). Prevents unauthorised delegation and privilege escalation.

Security — Threat detection and response integrated into SOC workflows. This is where the dual-use paradox becomes operational.

For SMB teams, the recommended first phase is Registry → Access Control → Visualisation. Build Security alongside Visualisation once agents are in production. Defer Interoperability until you have multiple agent-to-agent workflows running. Only 21.9% of organisations currently treat AI agents as independent, identity-bearing entities. That visibility gap is a genuine business risk.

How do you build an agent registry when you have no existing inventory?

The agent registry is the foundational governance control. Before you can set HITL thresholds, design monitoring dashboards, or assign cross-functional ownership, you need to know what agents exist, who owns them, what access they have, and what tools they use.

A complete registry entry must contain: agent name and version; the business owner who is accountable for outputs, not just the IT owner; access scope; tool permissions; business purpose; a risk classification of LOW, MEDIUM, or HIGH based on the OWASP action classification; last active timestamp; and deployment environment.

On enforcement: block new agents at the network perimeter until they are registered. Require agent identity credentials to be provisioned through the registry workflow — unregistered agents cannot obtain credentials and cannot operate.

On shadow agent discovery: audit authentication logs for unrecognised service accounts calling LLM provider endpoints. Monitor network traffic for outbound LLM API calls. Scan CI/CD pipelines and productivity platforms. Review expense reports for personal API key subscriptions — a common shadow AI entry point.

Only 14.4% of organisations have full IT and security approval for their entire agent fleet. Start with a spreadsheet if necessary. A registry that exists and is actually used beats a sophisticated platform that nobody fills in. For more on governance as the institutional response to shadow AI insider risk, see our article on how AI agents became a new category of insider threat.

How do you set human-in-the-loop thresholds using the OWASP Action Classification framework?

HITL threshold design is a product decision as much as a security decision. Set thresholds too low and reviewers experience alert fatigue, approving everything reflexively. Set them too high and high-impact agent actions proceed without human awareness. You are calibrating a balance, not just a security control.

The OWASP AI Agent Security Cheat Sheet recommends a four-tier framework:

LOW — auto-approved. Read-only data retrieval, status queries, internal report generation.

MEDIUM — logged and audited, auto-approved with notification. Write operations on non-production systems, draft generation for human review before sending.

HIGH — human approval required. Write to production systems, external communications, access to regulated data. For FinTech: any transaction inquiry. For SaaS: any customer-facing communication. For HealthTech: HIPAA-covered records.

CRITICAL — dual-approval required. Financial transactions, credential access or rotation, infrastructure changes, irreversible deletions.

Aim for a 10–15% escalation rate across the programme. Above 15% indicates reviewer fatigue risk. Below 5% for any agent handling HIGH-tier actions suggests permissive miscalibration. Calibrate per agent, not across the entire programme.

Track the override rate — the percentage of escalated decisions where the reviewer disagrees with the agent. A declining rate signals alignment; a rising rate signals model drift or policy misalignment.

Define an operational envelope — the bounded zone of accepted agent behaviour. Any action outside this envelope triggers a governance event, regardless of OWASP tier.

How do you monitor for anomalous agent behaviour without storing sensitive data in logs?

Here is the monitoring paradox: detecting anomalous agent behaviour requires detailed logs, but AI agent logs contain system prompts, API keys, tool schemas, and user data that must not sit in observability systems. The answer is the redaction gateway pattern.

The redaction gateway is middleware between the agent runtime and the observability platform. It strips sensitive content before logs reach storage. Redact: system prompt content, credential parameter values, tool schemas, and raw conversation content.

Retain the reasoning telemetry: tool name called (not the parameters); timestamp and duration; OWASP action classification tier; outcome status (success, failure, or blocked); and the sequence of tool calls during the task. That is sufficient for forensic reconstruction without storing sensitive data.

What counts as anomalous: unexpected tool calls; unusual access patterns; out-of-hours activity; rate spikes — often prompt injection triggering a loop or exfiltration attempt; and deviation from the reasoning chain baseline for a known task type.

For SMBs without a SIEM, structured logging with SDK-level redaction shipped to a lightweight observability platform, with alerts on tool call anomalies and rate spikes, is a perfectly workable starting point. See also: sandboxing telemetry as a foundational observability signal and how real-time observability signals feed your identity authorisation layer.

What is the dual-use paradox in AI agent security and how do you govern it?

Here is the problem. The same capabilities that make SOC automation valuable — autonomous threat detection, alert triage, incident response orchestration — create a new attack surface if those security agents are compromised. A threat actor controlling a SOC monitoring agent has not just compromised one system. They have compromised the system that watches all other systems. Alerts can be suppressed, incidents misclassified, threat intelligence exfiltrated — all while appearing to function normally.

So how do you govern SOC agents without defeating their purpose?

Apply enhanced registry entries with a full tool permission audit and a named human owner. Use tighter HITL thresholds: any action that suppresses, modifies, or routes an alert requires HIGH tier minimum approval.

Segregate SOC agent audit logs from the systems those agents monitor. Use a separate, append-only logging system — a compromised agent cannot suppress the forensic record of its own compromise.

And red-team SOC agents specifically. Test whether a prompt injection into a monitored system’s output can cause the SOC agent to misclassify the incident.

AI-powered security operations is the right direction. But governing SOC agents to the highest standard in the programme is the prerequisite that makes it viable.

Who owns AI agent governance and how do you structure accountability at SMB scale?

AI agent governance is not a security engineering problem. It is a cross-functional organisational design problem. Programmes that live entirely within IT or Security fail because they lack the authority to govern agents owned by Legal, HR, Finance, and Product teams.

This is a roles-based responsibility map, not a headcount requirement — one person may hold several of these in smaller teams.

Legal — Owns the acceptable use policy, data handling compliance, and regulatory mapping (EU AI Act Article 14, HIPAA, GDPR).

HR — Owns employee AI agent use guidelines and the offboarding process for employee-owned agents.

IT / Platform Engineering — Owns the agent registry, credential provisioning, and registration enforcement. The operational hub.

Security — Owns monitoring infrastructure, anomaly detection, SOC integration, red-teaming, and incident response.

Compliance — Owns audit trail requirements, log retention policy, and evidence packaging.

Business / Product owners — Accountable for the outputs of agents deployed in their product areas. This one cannot be delegated to IT.

Governance cadence: quarterly cross-functional review of registry completeness, HITL calibration, red-team findings, and policy updates. Monthly security review of monitoring alerts.

The most common mistake is treating governance as an IT responsibility without Legal and business owner sign-off. Any agent that interacts with customers or handles regulated data requires cross-functional oversight. That is not optional.

How do you build a red-teaming programme when you do not have a dedicated red team?

Red-teaming is continuous adversarial validation — an ongoing practice, not a one-time audit. Governance programmes that red-team at launch and never again create a false sense of assurance.

Here is a minimum viable red-team programme you can actually execute:

Quarterly — prompt injection tests. Test whether injected instructions in user input, retrieved documents, or tool responses can cause the agent to act outside its operational envelope. This requires a systematic approach, not specialist skills.

Quarterly — HITL threshold validation. Replay a sample of recent agent sessions and verify that threshold configuration is routing the right actions to human review.

Annual — tool chain compromise simulation. Simulate what happens if an external tool or API returns malicious content. Does the agent reject, flag, or isolate the contaminated response? Engage a security consultancy if no in-house capability exists.

Ongoing — log review. Assign one person to review the weekly anomaly detection report. See monitoring for anomalous agent behaviour after injection for context. The goal is not investigating every alert — it is working out whether the alerting baseline needs recalibration.

Document every red-team test: what was tested, what was found, what was changed. This is the evidence base for compliance review, insurance underwriting, and board-level risk reporting.

Frequently Asked Questions

What is the difference between an AI agent governance programme and AI agent security?

Governance defines ownership, accountability, and policy. Security enforces controls and detects threats. Governance without security is a policy document with no enforcement. Security without governance is a control set with no ownership.

What is the difference between human-in-the-loop and human-on-the-loop oversight?

HITL pauses execution — the agent cannot proceed with a HIGH or CRITICAL action until a human approves. Human-on-the-loop (HOTL) is supervisory review after execution. Most programmes use both: HITL for HIGH and CRITICAL, HOTL for LOW and MEDIUM.

Do I need to implement all five Microsoft framework capabilities at once?

No. Registry → Access Control → Visualisation is the recommended first phase. Defer Interoperability until multiple agents are communicating in production. Build Security in parallel with Visualisation once agents are live.

How do I find shadow AI agents already operating in my organisation?

Audit authentication logs for unrecognised service accounts calling LLM endpoints. Monitor network traffic. Scan CI/CD pipelines and productivity platforms. Review expense reports. And survey engineering teams with a no-blame amnesty — people surface agents they built when they are not worried about the consequences.

What escalation rate should I target for my HITL programme?

Target 10–15% of all agent decisions routed to human review. Above 15% signals reviewer fatigue risk; below 5% for HIGH-tier agents suggests permissive miscalibration. Calibrate per agent, not across the programme.

How do I handle AI agent governance for contractors and third-party integrations?

Register third-party agents in the same registry as internal agents, apply the same OWASP action classification, and include them in quarterly red-team and HITL validation exercises.

What should a minimum viable agent acceptable use policy include?

Scope (which processes may be automated and which are prohibited), a registration requirement with sanctions for unregistered agents, data handling rules, HITL requirements, and audit expectations.

How does the EU AI Act affect AI agent governance requirements?

EU AI Act Article 14 mandates demonstrable human oversight for high-risk AI systems — meaning audit evidence in the form of logs and escalation records, not just assertions. High-risk categories include credit scoring, employment decisions, health monitoring, and critical infrastructure. Non-compliance: fines up to €35 million or 7% of global turnover.

What is the Microsoft Agent Governance Toolkit and do I need it?

It is an open-source seven-package toolkit under MIT licence addressing the OWASP Agentic AI Top 10. Core packages: Agent OS (policy engine), Agent Mesh (cryptographic agent identity), Agent Runtime (execution privilege levels, kill switch), Agent SRE (SLOs, circuit breakers). The policy engine and identity components are the most transferable for teams on AWS or GCP.

What is reasoning telemetry and why does it matter for governance?

Reasoning telemetry is the captured sequence of tool selections during a task — chain of thought at the tool call level, not the content level. Stored as tool names, timestamps, action classification tier, and outcome status. No parameter values, no conversation text, no credentials. Retain the structural record, redact the content.

How do I know if my confidence threshold is calibrated correctly?

FinTech payment agents: 95%+. Internal reporting agents: 80%. If reviewers disagree with more than 20% of escalated decisions, the threshold needs adjustment. Review quarterly and after any significant model or tool set change.

This article is part of a series on AI agent security. For the full picture, see securing AI agents from supply chain to SOC.