Insights Business| SaaS| Technology How to Measure Whether Your AI Governance Is Actually Working
Business
|
SaaS
|
Technology
Mar 10, 2026

How to Measure Whether Your AI Governance Is Actually Working

AUTHOR

James A. Wondrasek James A. Wondrasek
Graphic representation of measuring AI governance effectiveness

Here is a number that should bother you: 72% of enterprises now formally measure Gen AI ROI. Measuring productivity? Sorted. But ask those same companies whether their AI governance is actually working — whether policies are being followed, whether anyone can prove compliance — and the room goes quiet. Only 32% operate at what Acuvity calls “measured effectiveness” on governance. The other 68% either track nothing at all or track whether reviews got completed, which tells you almost nothing about whether anything is actually being enforced.

This article lays out a practical measurement framework: what to track, how to figure out where you stand, and how to build lightweight monitoring without needing enterprise-scale resources. It covers the AI governance gap that measurement addresses and looks at the execution mechanisms this measurement framework is designed to assess. Governance is not a project with a completion date. It is an ongoing operational capability, and you need to measure it like one.

Why are 72% of companies measuring AI productivity but almost none measuring governance effectiveness?

The Wharton finding is pretty stark. In their third-wave Gen AI adoption study, 72% of enterprise leaders formally measure Gen AI ROI. The productivity side has moved past experimentation into what they call “Accountable Acceleration.”

Governance is a different story. Acuvity’s 2025 State of AI Security survey found that nearly 40% do not have managed governance at all — they are running on ad hoc practices or literally nothing. IBM’s research lines up with this: nearly 74% of organisations report moderate or limited coverage in their AI risk and governance frameworks.

So companies know exactly how much value AI is generating but cannot tell you whether it is being used compliantly or within policy boundaries. That is a problem.

Part of it is governance theatre. Organisations tick checklists without actually verifying enforcement. Process metrics — policies written, reviews completed — create an illusion of control. Less than 47% have adopted formal risk management frameworks for AI. Half of business leaders say their organisation lacks the governance structures needed to manage Generative AI’s ethical challenges.

But the deeper cause is structural. Governance got treated as a one-time project: write a policy, form a committee, move on. Every undocumented deployment, every skipped review widens the gap between what you believe is governed and what actually is. As IBM’s Jeff Crume put it, “it’s pretty hard to know if you’re succeeding if you’ve never even defined the benchmarks.”

That gap between perceived governance and actual governance — that is the gap this measurement framework is designed to close.

What are the three dimensions of governance measurement — and what do you track for each?

The DX AI Measurement Framework, built from research across 184 companies and 135,000+ developers, structures measurement across three dimensions: Utilisation, Impact, and Cost. It was designed for engineering productivity, but the parallel to governance is direct.

Utilisation — is AI being used as intended within governance boundaries?

This is your compliance layer. Track policy compliance rate — the percentage of deployments that completed the required governance review steps. Track model coverage — the percentage of production AI systems with complete documentation, including model cards, audit trails, and risk assessments. Track governance participation rate — are the right people actually showing up? And track shadow AI detection rate. One in five organisations experienced breaches from unsanctioned AI usage, costing $670,000 more than traditional breaches. If you needed a business case for measuring shadow AI, there it is.

Impact — is governance producing measurable risk reduction and compliance improvement?

Impact metrics tell you whether governance is producing results you can actually point to. Track incident response time — how fast you identify and resolve AI-related incidents. Track use case review cycle time — if governance review takes too long, teams bypass it, and that is what drives shadow AI. Track risk reduction metrics — measurable decreases in compliance violations, bias incidents, or data exposure. And track governance maturity score progression over time.

Cost — what is governance consuming, and is it proportionate?

Governance costs money. The question is whether that cost is proportionate to the value AI delivers. Track governance overhead per deployment and the opportunity cost of governance delays. For context: organisations that treat governance as a strategic capability see a 30% ROI advantage over those treating it as a compliance afterthought.

O’Reilly’s DX Core 4 framework complements this structure for engineering-specific contexts where developer experience metrics overlap with governance participation.

How do you move up the governance maturity ladder — and what does measurement look like at each level?

Acuvity’s research defines four maturity levels. Most organisations sit lower than they think.

Level 1 — Ad Hoc. No systematic governance. AI deployments happen without formal review. Measurement capability: zero. You cannot measure what you do not track. The tell: nobody can say how many AI systems are in production. 39% of organisations operate below managed governance levels entirely.

Level 2 — Defined. Governance policies exist on paper. Some review processes are in place. You can measure process — policies written, reviews completed — but that is about it. The risk here is governance theatre. High process scores masking zero enforcement. A perfect Regulatory Compliance Score means nothing if teams circumvent governance because they find it burdensome.

Level 3 — Managed. Governance is enforced with tooling. Outcome metrics sit alongside process metrics — policy compliance rate, model coverage, incident response time. The milestone here: a governance scorecard exists and gets reported to leadership. This is where governance becomes visible to the people who fund it.

Level 4 — Optimised. Continuous monitoring, drift detection, and accountability reporting are automated. Leading indicators — rising undocumented deployments, declining participation rates — trigger proactive intervention before things go sideways. Only 32% of organisations operate here.

You do not have to climb each rung one at a time. You can leapfrog from Ad Hoc to Managed by adopting a measurement-first approach. As David Talby, CTO of John Snow Labs, put it: the most successful organisations are those that can explain, defend, and adapt their systems under sustained scrutiny.

Think of the maturity ladder as a diagnostic tool, not a compliance requirement. Use it to figure out where you are so you can prioritise where to invest.

What happens to governance after deployment — and why is runtime the most vulnerable phase?

What happens after your AI systems go live? For most organisations, the honest answer is: not much. And that is exactly where the risk concentrates.

38% of organisations identify runtime as their most vulnerable phase. Another 27% say it is where they are least prepared. Pre-deployment concerns rank far lower.

Most governance frameworks focus on pre-deployment review and then neglect post-deployment monitoring entirely. But AI models drift over time. Training data goes stale, usage patterns shift, edge cases pile up. Without continuous monitoring, governance documentation becomes outdated within months.

IBM’s CIO Matt Lyteson described it directly: “We’ve even seen instances where you put it out there and then, a week later, it’s producing different results than you initially tested for.” When their Ask IT support agent’s resolution rate dropped from 82% to 75%, they investigated immediately. That is what runtime measurement looks like in practice.

For teams without enterprise tooling, a lightweight monitoring layer still does the job. Scheduled model performance reviews — quarterly at a minimum. Automated alerts for usage anomalies. Quarterly policy currency checks with an emergency update process for new capabilities. And periodic shadow AI sweeps — the average enterprise runs over 320 unsanctioned AI applications, so scanning SaaS subscriptions and API integrations regularly is not optional. It is just something you have to do.

Agentic AI adds another layer. Measurement needs to extend to decision traceability, tool call logs, and human escalation triggers. And if you need a regulatory reason to care: regulators are signalling that documentation gaps themselves may constitute violations. That connects directly to the measurable evidence regulators now require.

How do you build governance measurement that works without enterprise tooling?

Most 50-to-500-person companies do not have a dedicated governance team, enterprise GRC platforms, or six-figure tooling budgets. That is fine. Measurement scales down.

No tooling required — spreadsheet level.

Start here. AI inventory: list every AI system in production with owner, purpose, and last review date. If you cannot answer “how many AI systems are in production?” then you are at Level 1. Policy compliance rate: a manual quarterly audit of whether deployments followed governance steps. Governance participation rate: are the right people showing up to reviews? Declining attendance is the earliest signal that teams are treating governance as optional. Policy currency check: are your policies less than six months old? A dated document review takes one hour per quarter.

Basic instrumentation — existing observability stack.

Model coverage: track it via a Jira or Linear ticket that cannot close until the model card and risk assessment are attached. Use case review cycle time: tracked via your ticketing system — if average review time is climbing, teams are bypassing governance. Incident response time: measured from your existing incident management tooling. Shadow AI detection: a manual sweep of expense reports and browser extensions surfaces most unsanctioned AI tool usage.

Enterprise tooling — dedicated platforms.

Continuous drift detection and automated alerting. Real-time governance dashboards through platforms like IBM watsonX Governance and Credo AI. Automated audit trail generation and regulatory reporting.

The enterprise tier is optional. Governance measurement works at every budget level. As IBM’s Jeff Crume observed, “saying no doesn’t stop the behaviour, it just drives it underground” — measurement lets you govern with evidence, not blanket restrictions.

Here is a self-assessment you can run by Friday. Five questions: How many AI systems are in production right now? When was the last governance policy update? Who is accountable for AI risk in each business unit? How long does it take to approve a new AI use case? How many unsanctioned AI tools were discovered last quarter?

If you cannot answer those, you know where to start.

Why governance measurement is never done — and why that is a feature, not a flaw

Governance is a continuous operational capability. The moment measurement stops, governance debt starts piling up.

The “governance project” failure mode is common. Organisations that treat governance as a discrete initiative — write policy, implement, declare success — fail slowly. Governance decay is invisible until an audit, an incident, or a regulatory inquiry surfaces it. As David Talby put it, “governance is no longer judged by policy statements, but by operational evidence.”

Continuous measurement acts as an immune system. It catches policy drift, rising shadow AI, declining participation, and emerging risks before they turn into compliance failures. And it sustains investment — boards fund what they can see working. IBM’s daily cost visibility per AI use case demonstrates the kind of granular accountability that keeps governance funded.

The regulatory trajectory makes this non-negotiable. The EU AI Act, NIST AI RMF, and ISO/IEC 42001 all assume continuous governance. The EU AI Act’s high-risk obligations become fully applicable in August 2026. Compliance is not a one-time assessment. It is an ongoing obligation.

The organisations that treat governance execution as permanent infrastructure — not a project deliverable — are the ones that will scale AI safely. The 32% that operate at measured effectiveness are not done. They are simply measuring continuously. That is the difference.

FAQ

Is governance measurement the same as AI ROI measurement?

No. ROI measurement tracks business value — productivity gains, revenue, cost reduction. Governance measurement tracks whether AI is being used responsibly and within policy boundaries. The 72% (ROI) versus 32% (governance) split shows these are entirely separate disciplines.

What is the minimum set of metrics for a 100-person company’s governance programme?

Start with four. AI inventory completeness — do you know every AI system in production? Policy compliance rate — did each deployment follow your governance process? Policy currency — are your policies less than six months old? Governance participation rate — are the right people showing up to reviews? You do not need anything beyond a spreadsheet and a calendar for these.

How does NIST AI Cybersecurity Profile relate to governance measurement?

The NIST AI RMF is structured around four functions — Map, Measure, Manage, and Govern. The Measure function defines what to track: risk metrics, performance thresholds, and stakeholder feedback. The Cybersecurity Profile layers security-specific measurement on top. Together they give you a government-endorsed measurement backbone you can adopt without building from scratch.

What is the difference between process metrics and outcome metrics in AI governance?

Process metrics measure activity: policies written, reviews completed, training sessions held. Outcome metrics measure impact: incidents prevented, compliance violations reduced, model drift detected before harm. Mature governance programmes lead with outcome metrics. Organisations stuck on process metrics alone risk governance theatre — high activity scores with no verified enforcement.

How do I translate technical governance metrics into language the board understands?

Focus on three translations. Model coverage becomes “percentage of AI systems under documented control.” Incident response time becomes “how quickly we contain AI-related problems.” Governance maturity score becomes “our progress on a four-level scale benchmarked against industry.” Boards respond to trend lines and benchmarks, not raw technical data.

How often should governance metrics be reviewed?

Quarterly for formal review, with monthly pulse checks on leading indicators — shadow AI discoveries, participation rate changes, policy expiry dates. Organisations at maturity Level 4 run continuous automated monitoring. Match your review cadence to how fast you are deploying AI.

What are leading indicators that governance is failing before a major incident occurs?

Five warning signs. Rising undocumented AI deployments. Declining attendance at governance reviews. Policies not updated in over six months. Increasing use case review cycle times — which suggests teams are bypassing governance. And nobody can name the person accountable for AI risk in their business unit.

Can governance measurement be automated, or does it require manual effort?

Both. Basic metrics — AI inventory, policy currency, participation rates — need manual tracking or simple automation. Advanced metrics — drift detection, continuous compliance monitoring, automated audit trails — need dedicated tooling. Start manual, automate incrementally, and do not wait for perfect tooling to start measuring.

How does governance measurement change for agentic AI systems?

Agentic AI introduces new measurement dimensions. Decision traceability — can you reconstruct why the agent acted? Tool call logging — what external systems did the agent access? Human escalation frequency — how often did the agent defer to a human? Most existing frameworks were designed for traditional or generative AI, not autonomous agents. They need extending.

What does a governance scorecard or dashboard actually look like?

A governance scorecard aggregates four to six KPIs into a single view: model coverage, policy compliance rate, incident response time, governance maturity score, shadow AI count, and governance participation rate. Each metric shows current value, trend direction, and a red/amber/green status. The dashboard is what makes governance visible to leadership and keeps the funding flowing.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter