Business

SaaS

Technology

•

Apr 24, 2026

The Complete AI Compliance Documentation Stack Your Team Needs to Build in 2026

AI compliance documentation is no longer something you can defer until you’re bigger. Starting mid-2026, three overlapping regulatory frameworks — OMB M-26-04 (March 2026), Colorado SB 24-205 (June 2026), and the EU AI Act (August 2026) — create artefact requirements with real legal and commercial consequences for companies that can’t produce them on demand.

Most documentation guides are written for frontier model developers at large AI labs. Most engineering teams are not that. They’re deployers: companies building products on top of third-party models via API. This article is written for them — specifically for 10–500 person teams that need to know what to build, what each artefact must contain, and what the minimum viable version looks like.

The five core artefacts are: a model card or system card, an acceptable use policy, an impact assessment, a red-team results report, and an incident response plan. Each section below gives you the regulatory trigger, a minimum viable field list, and practical guidance for a small team. For the broader regulatory context, see the 2026 AI compliance landscape.

Before building any of these documents, there is one prerequisite: determine your classification tier. It takes an hour and determines whether your minimum viable stack is three artefacts or six.

Why are regulators, procurement teams, and enterprise clients now asking for AI documentation?

The compliance cascade is why documentation demand reaches companies not directly subject to any regulation. A regulation sets a requirement, a memo operationalises it in procurement terms, a procurement office embeds it in contract clauses, and those clauses get passed down the supply chain. What changed in 2026 is that AI system behaviour became a contractual attribute.

OMB M-26-04, effective March 2026, requires all federal agencies and their vendors to produce model cards, system cards, acceptable use policies, and evaluation artefacts. Any company with a federal agency or contractor as a customer will encounter these requirements through their procurement chain.

Colorado SB 24-205 (effective June 30, 2026) applies to any deployer of a high-risk AI system used in employment, lending, housing, education, or healthcare — regardless of company location, with no revenue threshold. Violations carry $20,000 per violation under the Colorado Consumer Protection Act.

The EU AI Act takes effect for existing high-risk Annex III systems in August 2026, with fines up to €15 million or 3% of global annual turnover. Its conformity assessment requirements are producing artefact expectations that procurement teams are extending to all vendors, not just EU-facing products.

No documentation means blocked procurement deals, failed security questionnaires, and regulatory exposure simultaneously. For the regulatory triggers behind each documentation requirement, the 2026 AI compliance landscape covers each law’s scope thresholds in detail.

What should you determine before building any documentation artefact?

The classification tier determines which artefacts are mandatory versus recommended. Building the full stack without scoping first produces over-engineered documents that are harder to maintain. So do this first.

Developer versus deployer. California SB 53 targets frontier model developers with training runs exceeding $5 million in compute costs. Most 10–500 person engineering teams are deployers building on third-party model APIs — that shifts required artefacts from model provenance documentation toward system-level documentation.

Risk classification. High-risk AI under EU AI Act Annex III requires conformity assessments and technical file documentation. Colorado SB 24-205 lists high-risk categories covering employment, lending, housing, education, and healthcare.

Geography. Colorado SB 24-205 applies to any deployer with Colorado customers. The EU AI Act applies to any system with EU users. OMB M-26-04 applies if any customer is a federal agency or contractor.

Run the classification exercise first. To determine whether your AI system is high-risk before building the documentation stack, the classification framework is in the linked article.

What is a model card (or system card) and what does a minimum viable version contain?

A model card documents a trained model — relevant for teams that fine-tuned or trained their own model. A system card documents a deployed application built on top of a third-party model — relevant for most engineering teams using OpenAI, Anthropic, or similar APIs. OMB M-26-04 explicitly requires system cards from application builders.

Minimum viable system card for a 10–50 person team:

Base model identification — Provider, model version, access method
System prompt summary — Purpose and scope, without exposing proprietary instructions
Tool integrations — Connected APIs, databases, or retrieval sources
Human oversight checkpoints — Where human review is required before AI output triggers an action
Safety mitigations — Content filters, output validation, rate limits, guardrails
Responsible disclosure contact
Last reviewed date and review cadence

Minimum viable model card for a team that fine-tuned or trained a model:

Training data sources and provenance — Including third-party dataset licences
Model capabilities and intended use cases
Out-of-scope uses and known failure modes
Evaluation results summary — Benchmark or internal test results
Bias and fairness testing outcomes
Responsible disclosure contact

OMB M-26-04 requires evaluation artefacts alongside the system card — the card establishes system scope, evaluation artefacts demonstrate safety testing was conducted. EU AI Act technical documentation under Article 11 and Annex IV overlaps significantly with model card fields, so a well-constructed card accelerates conformity assessment preparation.

Anthropic’s Claude system cards are the standard for system card format. Hugging Face‘s YAML-based format is the reference for model cards. FairNow’s Model Card Template maps 17 fields across EU AI Act, ISO 42001, and Colorado SB 24-205 — a practical starting point for teams aligning across multiple frameworks.

What is an AI impact assessment and when is it required?

An AI impact assessment is a pre-deployment analysis documenting the system’s potential harms, affected populations, mitigation measures, and monitoring plan. It’s distinct from a bias audit — which tests model outputs for discriminatory patterns — and from the EU’s conformity assessment. Colorado SB 24-205 requires both. They are complementary, not interchangeable.

Colorado SB 24-205: Any deployer of a high-risk AI system in employment, lending, housing, education, or healthcare must complete a documented assessment before deployment and at defined intervals. Completing assessments on statutory timelines gives deployers a rebuttable presumption of reasonable care.

EU AI Act: High-risk AI systems require a Fundamental Rights Impact Assessment (FRIA), mandatory from August 2026.

Minimum viable impact assessment for a Colorado SB 24-205 deployer:

System description and intended use
Affected population — Who interacts with the AI or whose outcomes it affects
Data inputs and known limitations
Potential harms analysis — Discrimination, privacy risks, safety risks, economic harms
Bias testing methodology and results
Mitigation measures implemented
Human oversight mechanism
Monitoring and escalation plan
Review cadence — Re-assessment required on substantial modification

If the AI system processes personal data — and most LLM applications do — a GDPR Data Protection Impact Assessment (DPIA) is also required. An integrated document can satisfy both, as long as it addresses fundamental rights beyond data protection alone. For the privacy by design architecture decisions that reduce your incident surface, the GDPR architecture article covers the design patterns.

Colorado SB 24-205 provides an affirmative defence to companies that have formally adopted NIST AI RMF or ISO 42001. Worth doing if you have Colorado customers.

What does an acceptable use policy need to cover for AI compliance purposes?

An AI acceptable use policy (AUP) is not your general IT AUP. It defines the operational envelope of the AI system — what it is permitted to do, who may use it, and under what conditions. OMB M-26-04 requires an AUP as a procurement artefact for federal AI vendors. Enterprise procurement teams request it as a standalone document in RFP security questionnaires.

Minimum viable AUP:

Permitted use cases — Specific tasks the system is approved to perform
Prohibited use cases — Out-of-scope uses, including high-risk use cases the system wasn’t evaluated for
User eligibility — Who may access the system
Data handling constraints — What data the system may process; personal and sensitive data restrictions
Human oversight requirements — When human review is required before action
Incident escalation path — What users must do if the system produces unexpected or harmful output
Policy version and review cadence

The AUP’s permitted and prohibited use cases must directly reference the system card — inconsistencies are an audit finding. The AUP’s definition of a “policy violation” feeds into the incident classification criteria in the incident response plan. Build these three documents together, not in isolation.

How do you build an incident response plan that meets the 15-day and 72-hour notification windows?

Design for the fastest window your team could face. The New York RAISE Act requires 72-hour notification from frontier developers. California SB 53 requires 15-day full notification plus 24-hour emergency notification for imminent risk. The EU AI Act requires 15-day notification for serious incidents. Design for 72 hours and you cover all current and near-term windows.

The plan has to answer four operational questions before an incident occurs:

What constitutes a “critical safety incident” for this specific AI system?
Who owns the initial triage decision?
What is the internal escalation workflow?
How is the regulatory notification prepared and to whom is it submitted?

Minimum viable incident response plan:

Incident definition and severity classification — Critical / High / Medium / Low, with product-specific examples for each tier
Detection sources — Monitoring alerts, user reports, internal testing triggers
Triage owner — Named role, not “the engineering team” — plus a backup
Internal escalation workflow — Triage owner → legal or compliance review → executive sign-off for regulatory notifications
Regulatory notification decision tree — Which law applies? Which regulator? What format?
Regulatory contacts and submission channels — California AI Safety Bureau for SB 53; relevant EU national authority for EU AI Act incidents
Post-incident review process
Review cadence and tabletop exercise schedule — Minimum: annual tabletop; quarterly review of severity examples

When the RAISE Act window applies, triage and legal review must happen within 72 clock hours, not business hours. Account for weekend and holiday coverage in your named escalation paths.

California SB 53 and the RAISE Act both require anonymous internal reporting channels for employees who identify safety concerns, along with documented non-retaliation policies. You can have every technical artefact in order and still fail a regulatory review because this HR documentation is absent.

For teams also building GDPR breach notification procedures, the two processes overlap when an AI safety incident involves personal data. The GDPR breach notification documentation requirements article covers the notification timeline specifically.

How do you use Promptfoo to generate red-team results reports for audit and procurement?

Red-teaming produces evaluation artefacts — the term OMB M-26-04 uses for adversarial safety testing results. Required for federal procurement. Increasingly demanded in enterprise RFPs. Promptfoo is the purpose-built open-source tool for producing these in an auditable, exportable format. Install via npm; Node.js required.

Step 1 — Create a configuration file. Define the target provider (OpenAI, Anthropic, a local endpoint), the system prompt under test, and the plugins to run. At minimum, run the harmful and pii:direct plugin categories.

Step 2 — Select red-team plugins. Promptfoo ships with 157 plugins. The security and access control plugins map to OWASP Top 10 for LLMs, covering prompt injection, data exfiltration, PII leakage, and jailbreak scenarios. It supports native framework mappings for NIST AI RMF, MITRE ATLAS, ISO 42001, EU AI Act, and GDPR.

Step 3 — Run the evaluation. Promptfoo generates adversarial test cases, executes them, and scores results automatically.

Step 4 — Export results. HTML report; JSON also available for CI/CD integration.

The HTML report includes an executive summary with pass rate and failure category breakdown, plus a table of attack scenarios with pass/fail status, severity, and remediation suggestions. This is the format procurement teams expect.

If your AI can take actions — issue refunds, send emails, modify records — compliance requirements apply to the action path. Promptfoo’s agentic architecture plugins cover application-level vulnerabilities specifically.

Run Promptfoo in CI before each major release and store results as a versioned artefact alongside the system card. That’s continuous testing, not a one-time compliance exercise.

How do you keep the documentation stack current as new laws ship?

This is not a one-time project. Colorado SB 24-205 requires annual re-assessment and reassessment after material system changes. OMB M-26-04 requires updated system cards when system scope changes materially. The EU AI Act defines “substantial modification” broadly — changes in functionality or intended purpose trigger a new conformity assessment. New state laws are shipping monthly.

Each artefact needs a named owner (not “the team”), a review trigger, and a version history.

Practical review cadence for a 10–50 person team:

Quarterly — Review AUP and system card; run Promptfoo red-team scan; check for new state AI laws affecting your customer geography
Annually — Full impact assessment re-assessment; update model or system card if model version or architecture changed; third-party audit if required
On material system change — Update system card and rerun impact assessment; update AUP if use cases have changed
On incident — Update incident response plan; add incident to the log

The RAISE Act requires annual third-party audits for frontier developers when it takes effect. Even for teams not directly in scope, regulated-industry customers are starting to require evidence of third-party evaluation. Missing documents generate findings that extend the engagement timeline and increase cost.

For the legal-engineering loop that keeps your documentation current, the linked article covers the operational mechanics.

Documentation stack reference

System card — Triggers: OMB M-26-04, EU AI Act (Annex IV). Content: base model, system prompt summary, tool integrations, human oversight checkpoints, safety mitigations, disclosure contact. Owner: Engineering lead.

Model card — Triggers: EU AI Act (Annex IV), OMB M-26-04 (fine-tuned models only). Content: training data provenance, capabilities, out-of-scope uses, evaluation results, bias testing. Owner: ML lead or engineering lead.

Acceptable use policy — Triggers: OMB M-26-04, enterprise RFPs. Content: permitted uses, prohibited uses, user eligibility, data handling constraints, human oversight requirements, escalation path. Owner: Product lead with legal review.

AI impact assessment — Triggers: Colorado SB 24-205, EU AI Act (FRIA). Content: system description, affected populations, harms analysis, bias testing, mitigations, human oversight, monitoring plan. Owner: Product lead or compliance lead.

Red-team results report — Triggers: OMB M-26-04 (evaluation artefact), EU AI Act conformity assessment. Content: Promptfoo output — executive summary, attack scenarios, pass/fail by category, severity, remediation status. Owner: Engineering lead.

Incident response plan — Triggers: California SB 53 (15-day), RAISE Act (72-hour, when effective), EU AI Act. Content: incident definition, severity tiers, triage owner, escalation workflow, regulatory notification decision tree, post-incident review. Owner: Engineering lead with legal review.

Frequently asked questions

Do I need a model card if I am using the OpenAI API rather than my own model?

No. A system card is the applicable artefact for teams using third-party model APIs. OMB M-26-04 explicitly requires system cards from application builders — see the model card section above for the full field list.

What is the difference between an impact assessment and a bias audit?

An impact assessment covers the full scope of potential harms — bias, privacy, safety risks — along with mitigation measures and a monitoring plan. A bias audit specifically tests model outputs for discriminatory patterns. Colorado SB 24-205 requires both: the impact assessment at deployment, ongoing bias testing post-deployment. They are complementary, not interchangeable.

How do I create an incident response process for a 10-person engineering team?

Keep it simple. One page. Four questions: what counts as an incident (with three product-specific examples), who makes the triage call and who is the backup, what constitutes a regulatory reporting trigger, and who notifies the regulator. The goal is a plan that can be executed at 2am by someone who hasn’t read it in six months.

What is Promptfoo and how does it work?

Promptfoo is an open-source LLM evaluation and red-teaming platform. Install via npm, configure a YAML file with your target provider and plugins, run the evaluation, export the HTML report. It ships with 157 plugins covering OWASP LLM Top 10, NIST AI RMF, ISO 42001, EU AI Act, and GDPR. The output is accepted as evaluation artefact evidence under OMB M-26-04 procurement requirements.

What is OMB M-26-04 and why does it affect software vendors who don’t sell to the government?

OMB M-26-04 is a US Office of Management and Budget memo (December 2025) requiring federal agencies to update procurement policies by March 2026. The cascade is why it reaches beyond direct federal vendors: if any customer in your supply chain has a federal contract, they pass the same evidence requests down to their suppliers. The requirements reach you through commercial contracts.

Which regulations apply if my company is based outside the US but has US customers?

Geography of customers, not company location, determines which laws apply. Colorado SB 24-205 applies to any deployer affecting Colorado consumers regardless of where the company is based. The EU AI Act applies to any provider placing AI systems on the EU market. If you have customers in Colorado, the EU, and one government contractor, all three frameworks apply simultaneously.

What does whistleblower protection have to do with AI compliance documentation?

California SB 53 and the RAISE Act both require anonymous internal reporting channels for employees who identify safety concerns, plus documented non-retaliation policies. Regulators can request evidence these channels exist. Their absence is a finding even when all the technical artefacts are in order.

Where do GDPR and EU AI Act compliance obligations overlap?

Most LLM applications process personal data, so both a GDPR DPIA and an EU AI Act FRIA apply. You can produce an integrated document satisfying both — as long as it addresses fundamental rights beyond data protection and isn’t just a DPIA with different headings.

What does a minimum viable documentation stack look like for a team just getting started?

System card, AUP, and a one-page incident response plan. These three satisfy OMB M-26-04 procurement basics and cover the minimum threshold for regulatory incident reporting readiness. Add the impact assessment when you have Colorado customers or a high-risk use case. Add the red-team results report before your first major enterprise deal.

When is a NIST AI RMF adoption worth the effort for a small team?

When you have Colorado customers (formal adoption gives you an affirmative defence under SB 24-205), sell into federal procurement, or sell into regulated industries. The four functions — Govern, Map, Measure, Manage — map directly to the documentation artefacts in this article. A lightweight adoption is achievable in a sprint.

What does the EU AI Act conformity assessment involve for a small team?

A technical file documenting the system’s purpose, risk management process, data governance, testing results, cybersecurity measures, and human oversight mechanisms. The model card and system card together form the core of this file. Most small teams can prepare it internally — third-party auditors are only required for the highest-risk categories like biometric identification. Annex III deadline: August 2, 2026.