Llama Mistral DeepSeek and Qwen Licence Terms Compared for Commercial Use

Llama, Mistral, DeepSeek, and Qwen are the four most deployed open-weight model families right now. All of them get called “open source” routinely. None of them are, in any consistent legal sense, and each one carries different commercial restrictions.

Here’s the thing about that badge on Hugging Face: it’s metadata, not a legal grant. Research from February 2026 found that 96.5% of datasets and 95.8% of models are missing the licence text needed to make their permissive label actually mean anything. Only 3.2% of models satisfy both requirements.

This article compares the actual licence terms: what’s permitted commercially, what thresholds trigger obligations, which models qualify for EU AI Act open-source exemptions, and what to check before you approve anything for production. For the foundational context, see open AI supply-chain licensing risk and permissive-washing in AI explained.

Why does “open source” mean different things for Llama, Mistral, DeepSeek, and Qwen?

“Open source” has a specific legal meaning from the Open Source Initiative (OSI): free redistribution, access to source, the right to create derivative works, no prohibited use cases. MAU thresholds and commercial agreement requirements above a user count are not compatible with that definition.

What most “open” AI models actually offer is open weights — publicly downloadable parameters. That’s not the same thing.

The distinction that matters is between a permissive label (a badge on Hugging Face) and a permissive grant (legally operative text conveying rights). Stefano Maffulli, executive director of the OSI, puts it simply: companies assume openness and get caught by restrictive provisions they never read.

What does the Llama Community Licence actually restrict for commercial use?

Llama is not open source by any standard definition. It is the model most commonly mislabelled as such.

The Llama Community Licence permits commercial use below 700 million monthly active users. Above that threshold, you need a separate Meta agreement. Beyond the MAU threshold, the licence prohibits using Llama outputs to train competing models and restricts certain military and surveillance applications.

Fine-tune a Llama model and your derivative is a covered work — the MAU threshold and prohibited use cases propagate to it. There are approximately 27,000 Llama derivative models on Hugging Face already carrying these restrictions, most of them unknowingly. There is also a naming requirement: Llama 3-based models must be prefixed “llama3”, and 85.8% of Llama 3-licensed models on Hugging Face are currently failing to meet it.

Llama does not qualify for the EU AI Act open-source exemption. Meta monetises through enterprise licensing, which is the disqualifying condition. EU deployments must comply with Article 53 GPAI obligations. For a complete overview of how these obligations apply across your full AI stack, see the open AI supply-chain licensing risk guide.

Bottom line: Commercially usable for most current-scale deployments, but not open source. Legal review required, especially for fine-tuned derivatives.

Is Mistral Apache 2.0 genuinely permissive for commercial use?

The brand name does not determine the licence. Two Mistral models can have fundamentally different commercial rights.

Apache 2.0 variants (Mistral 7B, Mixtral 8x7B) are genuinely permissive — commercial use without user thresholds. Apache 2.0 still requires the NOTICE file and full licence text preserved in all distributions, and modifications documented. Attribution is a binding condition, not optional.

MNPL variants (Mistral Large, Mistral Medium) restrict use to development and evaluation only. Deploying in commercial production is a licence violation. The name says it: non-production.

Some Mistral models may also cross the GPAISR threshold (10^25 FLOPs), which makes them ineligible for EU AI Act open-source exemptions regardless of what their licence says.

Bottom line: Apache 2.0 variants are permissive when the compliance requirements are met. MNPL variants are not commercially deployable. Verify which licence applies to the specific model version — do not assume from the brand name.

Is DeepSeek truly open source and can it be used without licence restrictions?

DeepSeek generates the most questions and the most frequently wrong answers. There are two separate assessments to make here.

The licence question (answered): MIT label, and it’s genuine. Commercial use is permitted. Preserve the copyright notice and full licence text in distributions. That is all. But verify the repository — not just the badge — since only 3.2% of models actually provide both. The reasons why metadata labels routinely mislead are covered in detail in permissive-washing in AI explained.

The training data provenance question (unresolved): DeepSeek was trained on Chinese internet data. Robots.txt compliance, copyright clearance, and data licensing for the training corpus are not publicly documented. The model licence does not retroactively clean the training data. As Mike Lieberman, CTO of Kusari, puts it: with open models, if the training data turns out to be legally or ethically problematic, the liability shifts to you, not the vendor. The broader pattern of how licence risk compounds across the dataset, model, and application layers is covered in how AI licence risk compounds across your stack.

DeepSeek’s EU AI Act classification depends on capability thresholds (GPAI/GPAISR), not the MIT licence.

Bottom line: MIT is genuine and commercially permissive for the weights. Training data provenance is a separate unresolved risk. These are two distinct assessments — keep them separate.

What does the Qwen licence require and when does the commercial threshold trigger?

Qwen has the largest derivative ecosystem of any open-weight model family — 113,000+ derivative models on Hugging Face — and two different licence types depending on which model version you’re looking at.

Apache 2.0 variants (Qwen 3, Alibaba’s flagship) are commercially permissive with standard attribution requirements.

Tongyi Qianwen Licence variants require a commercial agreement with Alibaba above user thresholds — similar in concept to Llama’s MAU cliff. The specific thresholds are not publicly documented. If you cannot determine the threshold, you cannot tell when you have crossed it. This licence sits in Hugging Face’s “Other” category and does not appear in standard licence filter searches.

Any restriction propagates through those 113,000+ downstream projects, most of which have not verified the applicable licence. ModelScope (Alibaba) is the primary alternative platform for Qwen artefacts — check both, since metadata conventions may differ.

Bottom line: Apache 2.0 variants are commercially permissive. Tongyi Qianwen variants require a commercial agreement above undocumented thresholds. Verify the specific model version and check both platforms.

What does a genuinely open AI model look like and why does it matter?

OLMo from AI2 (Allen Institute for AI) is the reference point. Apache 2.0 with a full open training data release, training code, and evaluation framework. It qualifies for the EU AI Act open-source exemption because AI2 does not monetise it. The four criteria for genuine openness are: OSI-compliant licence + full training data disclosure + no commercial use restrictions + no user thresholds. None of the four comparison models meet all four.

On the data side, the Common Pile from EleutherAI is the equivalent reference — 8TB with explicit licence verification for every included work. That is what legally clean training data actually requires.

OLMo is not a drop-in replacement — capability benchmarks differ. But it establishes what “genuinely open” means and makes the gaps in each commercial model’s licence visible.

Which models qualify for EU AI Act open-source exemptions and which do not?

The EU AI Act provides partial exemptions from GPAI obligations for open-source models. It is not automatic. The key condition: the model provider must not monetise commercially.

Models crossing the 10^25 FLOPs GPAISR threshold are never exempt. And qualifying for the exemption does not eliminate all obligations — EU copyright compliance and a training data summary are still required. For a full breakdown of the GPAI category, GPAISR thresholds, and what these mean for procurement, see EU AI Act and Cyber Resilience Act supply chain obligations explained.

How should you evaluate any AI model’s licence before approving it for production?

Four independent assessments before any production approval. Do not skip any of them.

Layer 1 — Licence text verification: Does the repository contain the full licence text and copyright notice? Not the badge — the file. Only 3.2% of models satisfy both.

Layer 2 — Commercial use restrictions: MAU thresholds (Llama), production restrictions (Mistral MNPL), undocumented user thresholds (Qwen Tongyi Qianwen). If you fine-tune, does the original licence propagate? Llama’s does. MIT and Apache 2.0 require attribution only.

Layer 3 — Training data provenance: The model licence governs the weights, not what was in the training data. Non-commercial dataset licences (CC BY-NC) can bind downstream uses even if the model carries MIT or Apache 2.0.

Layer 4 — Regulatory classification: Does the model trigger GPAI or GPAISR obligations under the EU AI Act? Does it qualify for the open-source exemption? The licence badge does not answer this.

Before production approval, get answers in writing:

  1. Does the repository contain a LICENSE file with the full licence text?
  2. Does the repository contain a copyright notice?
  3. What are the specific commercial use restrictions?
  4. Is there an MAU, user count, or revenue threshold that triggers additional obligations?
  5. What are the prohibited use cases?
  6. What is the training data provenance and is it documented?
  7. Does any training data carry a non-commercial licence?
  8. If we fine-tune this model, what restrictions propagate to our derivative?
  9. Does this model trigger GPAI or GPAISR obligations under the EU AI Act?
  10. Does it qualify for the EU AI Act open-source exemption, and does that matter for our use case?

Treat AI model procurement with the same legal review process as any other critical infrastructure dependency. A benchmark score does not resolve a licence question. For a comprehensive framework covering all aspects of AI supply-chain licensing risk — from model selection through to ongoing governance — see the full AI licensing risk picture.

Frequently Asked Questions

Can I use Llama commercially without paying Meta?

Yes, if your application stays below 700 million monthly active users. Above that threshold you need a separate commercial agreement. The Llama Community Licence also prohibits training competing models and restricts certain military and surveillance applications.

Does DeepSeek’s MIT licence mean I can use it for anything?

The MIT licence permits commercial use — preserve the copyright notice and licence text. But it only covers the model weights. Training data provenance is a separate and unresolved question the MIT label does not address.

What is the difference between open weights and open source for AI models?

Open weights means the model parameters are publicly downloadable. Open source, per the OSI definition, requires full freedoms: training data access, unrestricted commercial use, the right to modify and redistribute. Most “open” AI models are open weights with custom restrictions, not open source.

Which AI model licence is safest for commercial enterprise use?

No single model is universally safest. Apache 2.0 models (some Mistral variants, some Qwen variants, OLMo) offer the most permissive terms when the full licence text is preserved. DeepSeek’s MIT is similarly permissive for the weights but carries unresolved training data provenance questions.

What does Apache 2.0 actually require me to do when I deploy an AI model?

Preserve the NOTICE file, include the full licence text in distributions, and document any modifications. Attribution is a binding licence condition, not optional.

Do I need a separate licence if I fine-tune a Llama model?

Yes. Fine-tuned Llama models are derivative works under the Llama Community Licence. The 700M MAU threshold and prohibited use cases propagate to your derivative.

What is the Tongyi Qianwen Licence and how does it differ from Apache 2.0?

It is Alibaba’s custom licence for certain Qwen models. Unlike Apache 2.0, it requires a commercial agreement above specified user thresholds. Not all Qwen models use it — some are Apache 2.0. Check the specific model version.

Does the EU AI Act open-source exemption apply to Llama or DeepSeek?

The exemption requires the model provider not to monetise commercially. Meta monetises Llama, so Llama does not qualify for commercial deployers. DeepSeek’s eligibility depends on regulatory classification (GPAI/GPAISR thresholds), not the MIT licence.

What is the Common Pile and why does it matter for AI licensing?

The Common Pile is a pre-training dataset from EleutherAI built with explicit licence verification for each included source — 8TB of texts in the public domain or under Open Definition-compliant licences. It is the reference standard for legally clean training data.

How do I check if an AI model on Hugging Face actually has a valid licence?

Check the repository — not just the badge — for a LICENSE file with full licence text and a copyright notice. The badge is metadata that may not reflect actual repository contents. Only 3.2% of models satisfy both requirements.

What happens if I deploy a model trained on non-commercial data?

If training data carries a non-commercial licence (e.g., CC BY-NC), that restriction may bind downstream uses of the model even if the model is labelled MIT or Apache 2.0. Both must be evaluated independently.

Is there a truly open-source AI model with no commercial restrictions?

OLMo from AI2: Apache 2.0 licence, full training data release, training code, and evaluation framework. Qualifies for the EU AI Act open-source exemption. Capability benchmarks differ from the four commercial-scale models in this article.

What an AI Bill of Materials Is and What to Demand From Vendors

AI models arrive wearing permissive licence labels — MIT, Apache 2.0, “open weights” — that look clean in a repository but hide downstream restrictions on training data, acceptable use, and weight access. A recent analysis of LLMware supply chains found 52% exhibit at least one licence conflict, and 35.4% of AI artefacts have no licence declaration at all. The pattern has a name: permissive-washing. And the traditional Software Bill of Materials was never built to catch it.

An AI Bill of Materials (AI-BOM) closes that gap. It is a structured, machine-readable inventory of every AI artefact in a system — models, datasets, weights, inference libraries — along with provenance, licensing terms, access restrictions, and regulatory compliance metadata. This article defines what an AI-BOM must contain, explains why permissive labels are not enough, and gives you a vendor procurement checklist you can take into your next model approval conversation. For the bigger picture, see the broader licensing risk landscape and how AI licence risk compounds across your supply chain.


What is an AI Bill of Materials and how does it differ from a standard SBOM?

Most teams know what an SBOM is: a record of code dependencies — libraries, packages, versions, licences. It answers the question: what code is in this product?

An AI-BOM answers a harder question: what went into making this AI artefact, under what terms, and what restrictions does that impose?

The difference matters because a model is trained on data from thousands of sources — each potentially under different terms — through a pipeline that may have involved a third-party base model, fine-tuning steps that create new obligations, and weight access policies that have nothing to do with the headline licence. None of those dimensions appear in a standard SBOM schema.

The naming varies — AI-BOM, AIBOM, ML-BOM, AI SBOM — and you will encounter all of them in vendor documentation. The best analogy: if an SBOM is the ingredients list on a packaged food product, an AI-BOM is the full supply chain audit trail — where each ingredient was grown, who processed it, and what the grower’s contractual terms allow the manufacturer to do with the final product.


Why do permissive AI licence labels not always mean what they say?

A model repository labelled “MIT” or “Apache 2.0” may carry acceptable use restrictions, training data licence conflicts, or weight access limitations that directly contradict that label. The licence on a model card reflects only what the publisher chose to show. This is permissive-washing: the gap between what the label signals and what your downstream rights actually are.

GitHub and Hugging Face are publisher-controlled metadata environments. Anyone who publishes a model controls what appears in the licence field — there is no independent verification. A study of 760,460 models on Hugging Face found one Google dataset appearing under six distinct licence designations in metadata, while the dataset card read “More Information Needed” in the licence field. In the LLMware analysis, one model categorised as “Other” was actually under the Llama3 licence — even the licence category itself can be wrong.

The practical risk is that you discover after deployment that training data includes copyleft-licensed material, the acceptable use policy prohibits your use case, or redistribution rights restrict how you can package your product. These things typically surface during M&A due diligence, a regulatory audit, or a customer contractual review. Not great timing.

Heather Meeker, an open-source licensing attorney and FOSSA adviser, puts it plainly: “Not all public code is open source. There’s a lot of public code on GitHub covered by other licenses that might very well specifically prohibit AI training, grant other limited licenses, or grant no rights at all.”

Checking the label and calling it done is not enough. A verified AI-BOM is the only defensible path.


What five things must an AI-BOM capture that a standard SBOM misses?

An AI-BOM has to document five categories that sit entirely outside a traditional SBOM’s schema. Each one maps to a concrete legal and operational risk.

1. Training dataset identity and licence terms — Which datasets were used, under what terms, and do those terms permit commercial use and redistribution of model outputs? If training data includes copyleft-licensed material, the permissive headline licence does not override those upstream obligations.

2. Model lineage — The documented chain from base model through fine-tuning and adaptation, with licensing terms at each stage. A restrictive licence at any point flows downstream. Only 15.4% of models on Hugging Face declare any base model relationship — without lineage, you cannot confirm the model you are deploying is legally unencumbered.

3. Weight access policy — Whether weights are openly available, gated, or proprietary — and what the access terms actually permit. A model can be “MIT licensed” in its code while the weights are gated behind terms that prohibit commercial redistribution. These need to be documented separately.

4. Acceptable use restrictions — Contractual constraints that override a permissive headline licence — prohibitions on military use, surveillance, or medical diagnosis. Often buried in terms of service documents that are entirely separate from the repository licence file.

5. EU AI Act compliance metadata — EU AI Act Article 53(1d) requires providers of general-purpose AI (GPAI) models to produce a Training Data Summary documenting datasets, copyright compliance, and data governance. Mandatory since August 2, 2025, for GPAI providers placing models on the EU market — regardless of where the provider is established.

Item 5 is a regulatory obligation, not a best practice. If you use GPAI models and deploy into EU markets, it is not optional. For what the EU AI Act requires from vendors, see EU AI Act and Cyber Resilience Act supply chain obligations explained.


What should you demand from a vendor before approving their AI model?

Treat this like a security questionnaire. You are not being difficult — you are doing your job.

AI Model Procurement Checklist

  1. Complete AI-BOM in SPDX 3.0 or CycloneDX format — An actual file, not a summary. A proprietary format with no standard export is a red flag.
  2. Training dataset licence documentation with dataset-level detail — Specific datasets, not broad categories. “We used publicly available internet data” is not adequate.
  3. Model lineage showing base model and all fine-tuning steps — The chain with licence terms at each stage. “Proprietary process” should trigger legal review.
  4. Weight access terms and redistribution rights — Specific, written, matched to your intended use case.
  5. Acceptable use policy with explicit commercial use confirmation — Ask the vendor to confirm in writing that your specific use case is permitted.
  6. EU AI Act Article 53(1d) Training Data Summary (if applicable) — Mandatory since August 2025 for GPAI providers on the EU market. Inability to provide this is a material procurement risk.
  7. Evidence of independent licence audit or SCA scan results — A tool name, scan scope, and date. Self-attestation without tooling evidence is insufficient.

Red flags that should trigger escalation to legal review:

The EU AI Act makes these demands legitimate, not adversarial. A vendor of a general-purpose AI model has regulatory obligations to provide Training Data Summary documentation. Frame it that way in the conversation and it stops feeling like an unusual request.


SPDX 3.0 AI Profile vs. CycloneDX ML-BOM: which format should you require?

Two leading standards exist. And the good news is it is not either-or — they serve different contexts, and you will probably use both.

SPDX 3.0 AI Profile has ISO/IEC 5962 lineage that gives it regulatory weight in procurement contexts and aligns with the NIST AI Risk Management Framework. Use it when legal defensibility is the priority.

CycloneDX ML-BOM comes from OWASP and is designed for CI/CD automation. The OWASP AIBOM Generator outputs CycloneDX format directly from Hugging Face model metadata, making it the practical open-source generation path.

Here is the approach that makes sense: require SPDX 3.0 AI Profile from external vendors for its regulatory weight, and use CycloneDX ML-BOM for internal generation in CI/CD pipelines where tooling availability matters more. Protobom and BomCTL translate between the two formats, so mandating SPDX from vendors does not prevent you from working with CycloneDX internally. For implementation detail, see adding AI licence compliance to your existing engineering workflow.


How do you generate an AI-BOM for models your team is already using?

Step zero is shadow AI discovery: finding every model in use, including the ones engineering teams adopted without procurement approval. Sonatype and Wiz both offer shadow AI tracking that surfaces unapproved model usage. Survey your teams, check cloud service logs, and establish an internal model registry for every model entering production. You cannot generate AI-BOMs for models you have not catalogued.

Once you have your registry, there are two tooling paths:

Enterprise pathFOSSA provides software composition analysis with snippet scanning for AI-generated code, integrating into CI/CD pipelines alongside existing dependency scanning. Sonatype is an AI governance platform with AI-BOM generation, shadow AI tracking, and procurement governance.

Open-source pathOWASP AIBOM Generator produces CycloneDX-format AI-BOMs from Hugging Face model metadata with field completeness scoring. GUAC (OpenSSF) aggregates SBOM and AI-BOM data across an organisation. BomCTL / Protobom provides CLI tooling for CI/CD integration.

If your team is already using Hugging Face models, start with the OWASP AIBOM Generator. It is the fastest path to a CycloneDX AI-BOM with completeness scoring that shows you exactly what fields are missing. For broader context on why shadow AI and licence ambiguity are converging into a single governance problem, see the open AI supply-chain licensing risk overview.


What is snippet scanning and why does it matter for AI-generated code compliance?

Snippet scanning detects code fragments that match known licensed material. Applied to AI-generated code, it identifies licence obligations introduced by AI coding assistants — GitHub Copilot, Cursor — that are invisible to standard dependency scanning because they show up as copied text, not declared dependencies.

AI-BOM covers pre-procurement risk. Snippet scanning covers post-deployment risk. Heather Meeker again: “There are two legally independent sources of IP risk with AI coding tools: model training (input risk) and model output (output risk).” You need to address both.

The 2026 Black Duck OSSRA report found that 68% of audited codebases contain open source licence conflicts, partly driven by AI assistants generating code derived from copyleft sources. And 76% of companies that explicitly prohibit AI coding tools acknowledge their developers are using them anyway. A policy is not a solution.

FOSSA Snippet Scanning integrates into pull request workflows and CI/CD pipelines, surfaces results alongside dependency results, and lets you apply a consistent licence policy to both.


How does AI-BOM generation fit into your existing DevSecOps pipeline?

AI-BOM generation slots into existing CI/CD pipeline gates as an additional verification step — not a new workflow built from scratch. The goal is a standard gate, not a manual one-off audit, so compliance scales as model adoption grows.

Three integration points cover the AI artefact lifecycle:

1. Model intake gate — Before any model enters your internal registry, verify its AI-BOM. Vendor-provided SPDX documentation is evaluated against your procurement checklist. Models without adequate provenance are blocked here.

2. Build gate — Generate or refresh the AI-BOM during CI. The OWASP AIBOM Generator can run as a pipeline step to produce CycloneDX output automatically. FOSSA integrates as a pipeline plugin and runs snippet scanning at the pull request and CI stage.

3. Deployment gate — Before production, confirm AI-BOM compliance metadata meets your policy requirements and link with compliance reporting so outdated models are flagged automatically.

The EU AI Act requires this gate for GPAI model deployments into EU markets — Article 53(1d) Training Data Summary requirements need documentation that tracks with model versions, and automated CI/CD generation is the only scalable path. The EU Cyber Resilience Act adds to the pressure: its SBOM mandate for software products with digital elements extends to AI-BOM documentation for AI-enabled software.

For the full implementation path, see adding AI licence compliance to your existing engineering workflow. For the regulatory obligations behind these gates, see EU AI Act and Cyber Resilience Act supply chain obligations explained.


Frequently Asked Questions

What is the difference between an AI-BOM and a traditional SBOM?

An SBOM lists software components and dependencies. An AI-BOM extends this to cover training data provenance, model lineage, weight access policies, acceptable use restrictions, and regulatory compliance metadata. A standard SBOM was designed before AI supply chains existed and cannot capture the data and model context that governs how AI systems can legally be used.

Can a vendor’s “MIT licensed” AI model still create legal problems?

Yes. A permissive label reflects only what the publisher chose to display. It does not guarantee the training data was permissively licensed, that outputs are unencumbered, or that acceptable use terms permit your deployment. 52% of LLMware supply chains exhibit at least one licence conflict.

What tools exist for generating AI-BOMs automatically?

Enterprise: FOSSA (SCA with snippet scanning) and Sonatype (AI governance with shadow AI tracking). Open-source: the OWASP AIBOM Generator (CycloneDX output from Hugging Face metadata), GUAC (OpenSSF supply chain aggregation), and BomCTL/Protobom (CLI tooling for CI/CD integration).

Should I require SPDX or CycloneDX format from AI vendors?

Require SPDX 3.0 AI Profile from external vendors for regulatory weight. Use CycloneDX ML-BOM for internal generation in CI/CD pipelines. The two formats are interoperable via Protobom.

What is snippet scanning and do I need it?

Snippet scanning detects code fragments matching known licensed material — it identifies licence obligations introduced by AI coding tools that standard dependency scanning cannot detect. If your team uses AI coding assistants, you need it. FOSSA is the primary commercial tool.

What does the EU AI Act require regarding AI-BOM documentation?

EU AI Act Article 53(1d) requires providers of general-purpose AI models to produce a Training Data Summary documenting datasets, copyright compliance, and data governance. Mandatory since August 2, 2025, for GPAI providers placing models on the EU market. An AI-BOM operationalises this in machine-readable format.

How do I discover undocumented AI usage (shadow AI) in my organisation?

Survey teams and check cloud service logs. Sonatype and Wiz offer shadow AI tracking. Establish a model registry requirement first — you cannot generate AI-BOMs for models you have not catalogued.

What should I do if a vendor refuses to provide AI-BOM documentation?

Treat refusal as a material procurement risk. Escalate to legal review and consider alternative vendors. Under the EU AI Act, GPAI model vendors have regulatory obligations to provide Training Data Summary documentation — refusal may indicate they cannot meet their own requirements.

What is model lineage and why does it matter for procurement?

Model lineage documents the chain from base model through fine-tuning and adaptation. A restrictive licence at any point flows downstream to the final model. Only 15.4% of models on Hugging Face declare any base model relationship — without it, you cannot confirm the model you are deploying is legally unencumbered.

How does the EU Cyber Resilience Act interact with AI-BOM requirements?

The EU Cyber Resilience Act mandates SBOM documentation for software products with digital elements. For AI-enabled software, this creates convergence with AI-BOM requirements — organisations must produce both traditional SBOMs and AI-specific documentation. Both SPDX and CycloneDX are recognised by the OpenSSF as viable CRA-compliance formats.

Can I use open-source tools instead of enterprise platforms for AI-BOM generation?

Yes. The OWASP AIBOM Generator, GUAC, BomCTL, and Protobom provide a viable open-source path. The OWASP AIBOM Generator produces CycloneDX-format AI-BOMs from Hugging Face model metadata with field completeness scoring. More integration work than enterprise platforms, but a solid starting point for teams building from scratch.


Where to go next

An AI-BOM is one tool in a larger governance picture. Understanding what to demand from vendors gets you past the first gate — but the licence risk that makes that demand necessary runs through every layer of your AI supply chain. For the broader supply-chain licensing landscape — how permissive-washing, training data ambiguity, and regulatory pressure interact — see the complete overview.

How AI Licence Risk Compounds Across Your Dataset, Model, and Application Stack

If you’ve ever managed software dependencies at scale, you know the feeling. A late-night Slack message. A transitive dependency buried four levels deep — it carries a vulnerability, and it’s already in production everywhere. AI supply chains carry the same risk. Quieter, less visible, and already inside more stacks than most engineering leaders realise.

Here’s the thing. A permissive licence label on a model repository does not mean the underlying training data carries those same rights. The MIT or Apache-2.0 tag is a metadata field. It’s not a legal grant. It describes how the model weights are distributed — not what rights were secured over the data used to train them.

A 2025 audit of 124,278 AI supply chains found that only 5.75% of applications preserved compliant licence notices from their upstream models. By the time an AI component reaches your production stack, roughly 94% of the required legal documentation has already vanished.

This article maps the three-tier AI supply chain — dataset, model, application — explains how licence risk compounds at each layer, and closes with a first-pass audit checklist for production model approval. For the full scope of open AI supply-chain licensing risk, see our pillar overview. For the foundational definition of what permissive-washing actually means, start with our ART001 explainer.


How Do AI Licences Work Across the Three Supply Chain Tiers?

The AI supply chain has three distinct layers — dataset, model, and application. Licence obligations are supposed to propagate through each one. In practice, they almost never do.

At the dataset layer, training data is collected under specific terms: Creative Commons variants, public domain dedications, commercial restrictions, or the legally ambiguous category of “publicly available” web content. Whatever restrictions exist at this layer define what downstream uses are legally permitted — regardless of any label applied later.

At the model layer, a model inherits the obligations of the most restrictive licence in its training corpus. A single restrictively licensed dataset in a training run taints the entire model’s legal posture. Documented or not.

At the application layer, any product integrating that model inherits both the model’s stated licence terms and every unresolved obligation from every upstream dataset. Each integration adds another layer of legal uncertainty.

If you work with npm or pip, you already understand how this works. A licence issue in a transitive dependency propagates through every package that depends on it. You have package-lock.json, requirements.txt, and automated scanners to track that dependency tree. In AI, there’s no equivalent. Training data licences are almost never bundled with the resulting model weights. The chain of custody breaks at the first training run.


Why Do Attribution Obligations Almost Never Make It Downstream?

They fall apart at scale. A 2025 audit (arXiv:2602.08816) of 124,278 dataset-to-model-to-application supply chains found:

Multiply those failure rates and the probability of a downstream application retaining complete, legally compliant attribution from its original training data comes out at roughly 1.59%. Below 2%.

Why does this happen? Model training pipelines simply aren’t designed to carry licence metadata. Model cards are the primary documentation vehicle, but they’re manually authored and frequently incomplete — only 7.1% of Hugging Face models declare their training datasets at all. Application developers consuming a model rarely look beyond the top-level licence tag, because every other software component they work with is covered by automated tooling. That tooling doesn’t exist for AI training provenance.


Why Is “Publicly Available” Training Data Not the Same as “Openly Licensed” Training Data?

This is the misconception behind most AI licence risk at the dataset layer. “Publicly accessible on the internet” and “freely usable for any purpose including AI training” are legally distinct categories. A significant chunk of the current AI industry has been treating them as the same thing.

Copyright exists by default. A blog post, a news article, a StackOverflow answer — all copyrighted at the moment of creation, regardless of whether they require a login to access. Public visibility does not waive the creator’s rights. Many of the largest pre-training datasets are derived from Common Crawl and similar web scrapes — vast quantities of copyrighted material collected without explicit licence grants. A dataset may be published with an open licence applied to the collection, but that licence cannot grant rights the publisher did not hold.

If your vendor tells you their model was trained on “publicly available” data, that is not an answer to the provenance question. It’s a deflection of it. This is one of several compounding factors across the open AI supply-chain licensing risk landscape that make dataset-level scrutiny non-negotiable.


What Is Data Laundering and How Does It Enter Your AI Stack Invisibly?

Data laundering — sometimes called licence laundering — is how the “publicly available” problem gets into your supply chain even when the model’s documented training datasets appear to have clean licences.

Here’s how it works. A dataset aggregator collects text, images, or code from sources with various licence restrictions. Rather than verifying that every constituent source permits relicensing, the aggregator publishes the combined dataset under a single permissive licence at the collection level. Downstream model trainers consume it trusting the aggregator’s label. The original restricted content is now embedded in a corpus that looks clean.

EleutherAI‘s Common Pile project was created as a direct response to this problem — verify licensing at the level of individual constituent works, not at the collection level. The result is an 8TB corpus where every source is either public domain or Open Definition-compliant, and the Comma language model trained on it performs on par with or outperforms Llama 2 and Qwen3. Legally clean training is technically achievable. It’s currently just rare.

Data laundering is difficult to detect at the procurement layer. A model card may accurately identify its training datasets by name, and still be inheriting laundered content from those datasets with no visible signal. For the foundational definition of this problem, see our complete permissive-washing explainer.


What Is Shadow AI and Why Is It a Compliance Gap You Cannot See?

Shadow AI is the AI-era equivalent of shadow IT: AI tools and models your team is actively using that have never been through formal procurement review, compliance vetting, or IT governance. The licence exposure they create is real — and invisible to any audit.

A developer fine-tunes a Hugging Face model on company data without checking the base model’s licence. A team adopts an AI coding assistant without reviewing its training data. An engineer integrates an open-weights model labelled “MIT” without verifying that an actual licence file exists — which, given the statistics above, it probably does not. These aren’t hypothetical edge cases. They’re the norm when teams are iterating on models faster than any oversight process can track.

Without a policy requiring AI model review before deployment, you have no visibility into what licence obligations your team has already accepted on your behalf. For more on building that governance layer, see our guide to AI Bills of Materials.


What Does Clean Data Provenance Actually Look Like — and How Do You Ask for It?

Clean data provenance means a verifiable, documented record of where every component of a training dataset came from, what licence it was collected under, and that the licence actually permits the downstream use. Not a blanket collection-level declaration. A source-level verified audit trail.

The open weights versus open source distinction matters here. An open-weights model releases weights for download but may not disclose training data composition or provenance. An open-source AI model requires disclosure of training data, training code, and methodology in addition to weights. Most models on Hugging Face are open-weights. When a vendor cannot disclose training data provenance, the documentation may simply not exist.

The EU AI Act‘s Article 53(1)(d) gives enterprise buyers a lever: GPAI model providers placing models on the EU market must publish a sufficiently detailed summary of training data content. A vendor who cannot produce that documentation is telling you something about their provenance practices.

Before approving any model, ask:

  1. Does your model card identify all training datasets by name?
  2. Can you provide licence documentation for each named training dataset?
  3. Has the training data been independently audited for licence compliance?
  4. Do you accept contractual liability for training data licence defects?
  5. Can you provide an AI Bill of Materials for this model?

A First-Pass Audit Checklist Before Integrating an AI Model Into Production

This is the minimum-viable entry point — not a substitute for formal tooling or legal review, but the baseline every team should clear before any model goes near production.

Step 1: Read the model card beyond the licence tag. Does the model card identify training datasets by name? Only 9.9% of Hugging Face models declare any training dataset. This step will fail for roughly 90% of models — and that failure is a risk signal.

Step 2: Verify the licence file exists and matches the tag. Is there an actual LICENCE file in the repository, or just a metadata field in the README? 93.4% of models lack a dedicated licence file. Absence is the norm — but it is still a compliance gap.

Step 3: Check upstream dataset licences. For each training dataset named in the model card, check its licence independently. If the model card does not name datasets, you cannot perform this check — which should prompt escalation.

Step 4: Look for attribution notices. Does the model repository include copyright notices for upstream datasets? 96.5% of datasets and 95.8% of models lack required licence text. Expect to find nothing — and treat that absence as a documented risk.

Step 5: Ask about fine-tuning data. If the model has been fine-tuned, what data was used? 24% of parent-to-child model relationships have different licensing between child and parent, and fine-tuned models frequently drop licence provenance entirely.

Step 6: Check for restrictive use clauses. Does the licence contain restrictions not typical of the label it carries? LLaMA 2’s user-count threshold, non-commercial clauses, output-usage restrictions — none of these are permissive terms regardless of the metadata tag. Read the full licence text.

Step 7: Document your findings. Record what you verified, what you could not verify, and what risk you’re accepting. “We checked and found gaps” is a defensible position. “We assumed the label was accurate” is not.

The AI supply chain does not propagate licence information reliably. Manual verification at the point of integration is, currently, the minimum standard.

For formal tooling and structured governance, see our guide to AI Bills of Materials. For the full picture of open AI supply-chain licensing risk, see the comprehensive overview.


Frequently Asked Questions

What is permissive-washing in AI?

Labelling AI artefacts with permissive licence tags like MIT or Apache-2.0 while omitting the required legal documentation — licence text, copyright notices, upstream attribution — that makes the label enforceable.

Does an MIT licence on a Hugging Face model mean the training data is also MIT-licensed?

No. The licence tag describes the terms under which model weights are distributed. It says nothing about the licences governing training data, which may include copyrighted material or data with no explicit licence at all. Only 7.1% of models declare their training datasets.

What is the difference between open weights and open source AI?

Open weights means model weights are available for download. Open source AI requires disclosure of training data, training code, and methodology in addition to weights. Most models on Hugging Face are open-weights — which determines whether independent provenance verification is even possible.

How does licence risk compound across the AI supply chain?

Risk introduced at the dataset layer transfers to any model trained on that data, and then to any application integrating that model. With only 27.59% of models preserving compliant dataset notices and only 5.75% of applications preserving compliant model notices, the probability of complete attribution in a production application is below 2%.

What is an AI Bill of Materials (AI-BOM)?

A structured inventory of all components in an AI system — training datasets, model weights, training code, and dependencies — analogous to a Software Bill of Materials (SBOM). It enables verifiable licence and provenance tracking across the AI supply chain.

What is data laundering in AI training datasets?

When copyrighted or restrictively licensed content is collected into an intermediary dataset that applies an incorrect permissive licence to the entire collection. Downstream consumers trust the intermediary’s label and unknowingly inherit legal exposure from the original restricted sources.

What should I ask an AI model vendor about data provenance before procurement?

Ask whether the model card identifies all training datasets by name, whether licence documentation exists per dataset, whether the training data has been independently audited for licence compliance, whether the vendor accepts contractual liability for licence defects, and whether they can provide an AI Bill of Materials.

What is shadow AI and why is it a licence compliance risk?

AI tools and models in active use within an organisation that have never been through formal procurement or governance review. No one has verified the licence terms, training data provenance, or attribution obligations of the models in use — so the liability accumulates invisibly.

Does the EU AI Act require AI model providers to disclose training data?

Yes. Article 53(1)(d) requires providers of GPAI models on the EU market to publish a sufficiently detailed summary of training data content. Enterprise buyers can use this as a procurement lever when vetting vendors.

Can I rely on fair use to avoid AI training data licence issues?

Fair use as a defence for AI training on copyrighted data remains unsettled in the US, with over 50 cases pending as of early 2026. Relying on an untested legal defence as your primary risk mitigation is not a defensible governance position.

What is the Common Pile and why does it matter?

A pre-training dataset created by EleutherAI with 8TB of texts verified for legal compliance at the level of individual constituent works. The Comma language model trained on it performs on par with or outperforms Llama 2 and Qwen3 — demonstrating that legally clean training is achievable.

Permissive-Washing in AI Explained — Why Open Source Labels Cannot Be Trusted

When a package in your npm registry says MIT, you trust it. The licence file is there, the copyright notice is there — the whole compliance payload ships with the dependency. That mental model works for traditional software. AI artefacts are different. The label and the legal grant are not the same thing.

New research from Queen’s University found that 96.5% of permissively-labelled datasets and 95.8% of permissively-labelled models on Hugging Face lack the documentation required to make those labels legally enforceable. The researchers coined a term for it — permissive-washing. When that documentation is missing, the artefact reverts to all-rights-reserved status regardless of what the tag says.

This article explains what permissive-washing is, why a model card badge is not a licence, and what five checks your team can run before adopting any open AI component. It is part of a broader look at open AI supply-chain licensing risk — but this is where the explanation has to start.

What does “permissive-washing” actually mean?

Permissive-washing is what happens when an AI artefact — a model, a dataset, a fine-tuned checkpoint — gets labelled with a permissive licence (MIT, Apache-2.0, BSD-3-Clause) while leaving out the documentation that makes the label mean anything.

That documentation is the full licence text, the copyright notice, and upstream attribution notices. Without it, Jewitt, Rajbahadur, Li, Adams, and Hassan call it a missing “compliance payload.” The licence grant is void.

It is rarely intentional. AI platforms make it easy to pick a licence tag from a dropdown, and that selection gets treated as equivalent to granting a licence. It is not. But the effect is the same either way: downstream users believe they have rights they do not legally hold.

When you install from npm or pip, the licence file is included by convention — its absence would be flagged. AI platforms have no equivalent norm. 53.5% of Hugging Face datasets carry MIT or Apache-2.0 labels. 96.5% lack the documentation those labels require.

Why is a licence badge on a Hugging Face model not a legal licence?

A Hugging Face model card has a license: field in its YAML header. That field is metadata — a machine-readable tag declaring intent. It is not a legal instrument. It does not grant you anything.

A legal licence grant requires three things to actually exist in the repository: the full licence text in a file, a valid copyright notice identifying a rights holder, and (for Apache-2.0) a NOTICE file preserving upstream attributions. When you select “MIT” from the Hugging Face dropdown, none of those things are created or verified. You get a populated metadata field. That is all.

Licence metadata ≠ licence grant. A tag is a claim. A licence is a legal document.

SPDX identifiers — the standard format Hugging Face uses for licence tags — were designed as cataloguing tools. They express what licence applies. They are not the licence.

GitHub at least has a norm around including a LICENSE file. Hugging Face has no equivalent. Widely-used models including sentence-transformers/all-MiniLM-L6-v2 — Apache-2.0, over 3,700 likes — lack the full licence text needed to legally rely on the declared terms. That model enables 907 downstream applications. Not one can satisfy the licence’s conditions.

What did the research find — how widespread is permissive-washing?

The arXiv:2602.08816 study constructed 124,278 supply chains across three tiers — dataset to model to application. The numbers reflect the ecosystem, not a cherry-picked sample.

In the permissively-labelled subset: only 3.5% of datasets include a complete licence text file. Only 3.0% include a valid copyright notice. Full compliance — both conditions at once — is 2.3% for datasets and 3.2% for models. Fewer than one in thirty permissively-labelled artefacts is actually compliant.

Attribution propagation is worse. Only 27.59% of models preserved attribution from their training datasets, and only 5.75% of applications preserved notices from the models they incorporated.

Compare that to GitHub applications in the same study, where 91.9% include a complete licence text. The gap compounds at every tier.

What do MIT, Apache-2.0, and BSD-3-Clause actually require — and what do AI projects skip?

Developer shorthand for permissive licences is “do what you want.” That shorthand leaves out a word: “if.” Permissive licences say do what you want if you meet these conditions. The conditions are real.

MIT requires two things: include the full licence text with all copies, and include the copyright notice. Two conditions. Both routinely absent from AI artefacts.

Apache-2.0 adds more: full licence text, copyright notice, a NOTICE file preserving upstream attributions, and a statement of changes for modified files. There is also an explicit patent grant — but that grant only activates when the conditions are met. As Heather Meeker, an open source licensing attorney, puts it: “Open source licenses allow you to do anything you want with the licensed code, with conditions that mostly trigger on distribution.” Fail to meet those conditions and you lose not just the copyright permission but the patent protection too.

BSD-3-Clause requires the full licence text in both source and binary redistributions, the copyright notice in both, and a non-endorsement clause. All routinely omitted.

When the conditions are not met, the licence grant does not activate.

What does “default copyright” mean when there is no valid licence file?

Default copyright is the legal state that applies to any creative work when no valid licence has been granted. The creator holds all rights. Nobody else may copy, modify, or distribute the work without explicit permission. Under the Berne Convention, this applies in all major jurisdictions — US, EU, UK, Australia — automatically.

When an AI artefact carries a permissive licence tag but lacks the compliance payload, the grant is void. A team using a “MIT-labelled” model without a LICENSE file has no more legal right to use it than if the model had no label at all.

Here is the scenario. Your team finds a Hugging Face model with an MIT badge, solid benchmarks, and 2,000 likes. You integrate it and ship. The repository has no LICENSE file and no copyright notice. The original creator retains full copyright. If they assert it, you have no defence. The exposure does not expire. It persists.

Default copyright is the current reality for over 95% of permissively-labelled AI artefacts right now.

Why does attribution almost never make it downstream — the 5.75% problem?

Attribution propagation is the requirement that upstream copyright notices and licence texts be preserved at each stage of the supply chain — dataset to model to application. It is how the conditions of permissive licences are supposed to travel with the work.

In traditional software, package managers track dependency trees. A licence scanner like FOSSA or Sonatype surfaces the obligations you have inherited. AI artefact platforms have no equivalent. The three-tier chain is essentially untracked.

27.59% of models preserved compliant attribution from their training datasets. 5.75% of applications preserved compliant attribution from the models they used. The platforms leave attribution tracking to the uploader. Uploaders do not track it.

For your team, this means: even if you verify the licence on the model you are adopting, you inherit unverified obligations from every dataset and component upstream. AI Bills of Materials (AIBOMs) are emerging to address this. They are not yet standard in most teams. The structural reasons how licence risk compounds across your AI stack explains why attribution failure is a feature of the system, not a series of individual mistakes.

What should you check before trusting an open model licence label?

These are five-minute checks, not a legal review. They catch the overwhelming majority of permissive-washing cases and give you a clear stop signal before you commit to an artefact.

Check 1: Look for a LICENSE or LICENCE file in the repository root. Not the model card metadata field. The actual file. If it is absent, the licence grant is not established.

Check 2: Verify there is a copyright notice. “Copyright (c) [Year] [Name]” identifying a real rights holder. MIT, Apache-2.0, and BSD-3-Clause all require this. It is commonly absent even when a licence file exists.

Check 3: For Apache-2.0 models, check for a NOTICE file. This is the mechanism by which upstream attributions are preserved. If it is missing, you cannot satisfy the attribution requirement.

Check 4: Review the model card’s training data section. If it lists datasets, spot-check those datasets using Checks 1 and 2. The 5.75% propagation figure means your upstream almost certainly has problems you cannot see from the model level alone.

Check 5: Treat any failure as a stop signal. Failing any one of these checks means the artefact has no valid licence — legally, regardless of what the badge says.

The Jewitt et al. audit found only 3.2% of permissively-labelled models satisfy Checks 1 and 2 simultaneously. Apply the checklist to every model you evaluate — it is faster than assuming and being wrong.

The EU Cyber Resilience Act already requires SBOMs for digital products, and SCA tooling can automate parts of this. Until you have that in place, manual checks are your first line of defence. This five-check process is one entry point into the full landscape of AI supply-chain licensing risk — covering everything from foundational definitions to regulatory obligations.

For the next step in understanding how this risk propagates, how AI licence risk compounds across your dataset-model-application stack maps what happens when a non-compliant artefact enters a production pipeline. If you are already evaluating specific models for commercial use, Llama, Mistral, DeepSeek and Qwen licence terms compared for commercial use applies these checks to the four dominant open model families. The full series starts at the open AI supply-chain licensing risk overview.

Frequently asked questions

What is the arXiv permissive-washing paper and where can I read it?

arXiv:2602.08816 by Jewitt, Rajbahadur, Li, Adams, and Hassan (Queen’s University, February 2026). The first large-scale empirical study of licence compliance across the AI supply chain — 6,664 models, 3,338 datasets, 28,516 applications.

Does an MIT licence on a Hugging Face model mean I can use it freely in my product?

Not unless the repository includes the full MIT licence text and a valid copyright notice. Without both, the MIT label is metadata only — not a legal grant. The artefact is under default copyright, and you have no legally defensible right to use, modify, or distribute it.

Can my company get sued for using an AI model without a proper licence file?

Yes. If the model lacks valid licence documentation, it is under default copyright. Using it commercially without the rights holder’s explicit permission is copyright infringement. The original creator retains all rights and can enforce them at any time.

What is the difference between a permissive licence and a custom AI licence like Llama’s?

Permissive licences (MIT, Apache-2.0, BSD-3-Clause) are standardised instruments with specific, well-understood conditions. Custom AI licences — Meta’s Llama licence, BigScience RAIL — are bespoke and vary by model. Meta’s Llama 2 licence blocks use by companies with over 700 million monthly active users. Custom licences require case-by-case review; permissive licences require compliance payload verification.

How does licence compliance for AI models compare to compliance for traditional open-source software?

Traditional software has mature tooling: package managers track dependencies, licence scanners (FOSSA, Sonatype) automate compliance, community convention demands a LICENSE file. AI artefacts have none of that. No enforcement, opaque dependency chains, no standard tooling tracing obligations from dataset to application.

What does “open source AI” actually mean from a legal standpoint?

There is no settled definition. The OSI’s evolving definition requires access to training data and model weights, but actual training datasets remain optional in current drafts. In practice, “open source AI” often means “publicly downloadable” — not “OSI-approved with full compliance.”

What is an AI Bill of Materials (AIBOM) and do I need one?

An AIBOM is a structured inventory of all AI models, datasets, and dependencies in your system, including their licence status and provenance. It extends the SBOM concept to AI artefacts. The EU Cyber Resilience Act already requires SBOMs for digital products in EU markets, and the EU AI Act is moving toward requiring transparency documentation that an AIBOM would directly support.

What happens if I discover a permissive-washing problem in a model I have already deployed?

Three options: contact the original creator and request proper licence documentation, replace the model with a compliant alternative, or negotiate a separate commercial licence. The longer the model stays in production without a valid licence, the greater the exposure.

Are there AI models with genuinely clean licensing that I can use without legal review?

Some exist, but they are the minority — only 3.2% of permissively-labelled models in the Jewitt et al. audit were fully compliant. Apply the verification checklist in this article to any model you evaluate. The checks take minutes; the risk of skipping them is real.

Why do AI model creators skip licence documentation if the requirements are straightforward?

Most creators are researchers or developers, not lawyers. Hugging Face makes uploading straightforward and licence tag selection optional — nothing enforces a LICENSE file. The community norm became “tag it and ship it,” mirroring early GitHub culture before licence scanners became standard.

How does the EU AI Act affect AI licence compliance requirements?

It requires GPAI model providers to publish a summary of training data (Article 53(1d)) and implement a policy to comply with EU copyright law (Article 53(1c)). It does not directly mandate licence compliance, but its provenance documentation requirements make unverified AI licensing a compliance gap for EU market deployments.

Sovereign Cloud Explained — Data Residency, Legal Jurisdiction, and What Protection You Actually Get

The statement “your data stays in Europe” appears in nearly every hyperscaler sovereign cloud pitch. Stored in Europe, yes. Protected by European law from US government access? Not necessarily — and the difference matters more than most cloud buying decisions acknowledge. The gap between where your data sits and who can legally access it is the core question this guide addresses. Below, you will find a framework for evaluating sovereignty claims across three layers — data residency, data sovereignty, and jurisdictional control — with links to the right guide for your current decision stage.

In this hub:

What Is Sovereign Cloud and Why Does the Definition Matter?

Sovereign cloud refers to cloud infrastructure where data storage, processing, and governance fall exclusively under a defined jurisdiction’s legal system. It goes beyond a standard public cloud region — covering not just where servers sit, but which laws govern access and who can compel disclosure. Two clouds can share the same physical building and carry entirely different legal exposure.

Start here: Data Residency, Data Sovereignty, and Jurisdictional Control Are Not the Same Thing — defines the three-layer sovereignty framework and introduces sovereign washing as a named concept.

What Is the Difference Between Data Residency, Data Sovereignty, and Jurisdictional Control?

Data residency is the physical fact of where servers are located. Data sovereignty is the legal question of which jurisdiction’s laws govern the data. Jurisdictional control is the layer that determines your actual protection: which courts or government agencies can compel access, regardless of data location. You can have residency without sovereignty, and sovereignty without full jurisdictional control. Most hyperscaler “sovereign cloud” offerings deliver residency and partial sovereignty but leave jurisdictional control unresolved. The framework gives you a consistent test: for each layer, ask whether the provider’s claim is verified, partial, or unresolved.

Deep dive: Data Residency, Data Sovereignty, and Jurisdictional Control Are Not the Same Thing

How Does the US CLOUD Act Create Legal Exposure for EU Cloud Data?

The US CLOUD Act (2018) allows US authorities to compel data disclosure from US-based cloud providers regardless of where data is physically stored. If a provider’s parent company is incorporated in the United States — as AWS, Microsoft, and Google all are — US courts can order data access even when the data resides on European servers. The EU-US Data Privacy Framework does not modify this provision.

FISA Section 702 operates in parallel, authorising warrantless collection of foreign nationals’ communications by US intelligence agencies. Microsoft’s own legal officer admitted before the French Senate an inability to guarantee EU customer data protection from US law enforcement access. The CLOUD Act versus GDPR Article 48 conflict remains unresolved.

Full legal analysis: How the US CLOUD Act and FISA 702 Create Legal Exposure for EU Cloud Data

What Do Hyperscaler Sovereign Cloud Offerings Actually Deliver?

Hyperscaler sovereign offerings — AWS European Sovereign Cloud, Microsoft Azure EU Data Boundary, Google Cloud’s partner-led model (S3NS) — deliver capability gains: EU-domiciled legal entities, EU-only operational staff, EU-based key management, and BSI C5 attestation in some cases. What they do not deliver is resolution of the CLOUD Act exposure. As long as the parent company is US-incorporated, US authorities can compel access. The term “guardrail sovereign model” describes this category: improved operational separation with residual legal risk. It is a better posture than a standard hyperscaler region, but it is not the same as full jurisdictional isolation.

Full assessment: AWS European Sovereign Cloud and Azure Sovereign Options Assessed Against the Three-Layer Framework

What Are the EU-Native Cloud Alternatives to Hyperscalers?

EU-native cloud providers — Hetzner, OVHcloud, Scaleway, T-Systems/Open Telekom Cloud, Aruba Cloud, Exoscale — are incorporated and operated entirely under EU law with no US parent company exposure. They represent the full EU isolation model: 100% EU-owned, EU-operated infrastructure with no foreign jurisdictional risk. The tradeoff is service breadth — EU-native providers cannot yet match hyperscaler managed service catalogues, though the gap is narrowing.

EU location alone does not guarantee absolute protection — provider jurisdiction, not just server location, is what determines your exposure. The cluster article below covers provider selection by company profile and the certification landscape.

Provider comparison: EU-Native Cloud Providers Compared — Hetzner, OVHcloud, Scaleway, and T-Systems

Which EU Regulations Are Making Sovereign Cloud Mandatory?

Three EU regulations are transforming sovereign cloud from optional to mandatory for specific workload categories. DORA (Digital Operational Resilience Act) requires financial entities — potentially including your company — to maintain documented exit strategies from cloud providers. NIS2 imposes sovereignty requirements on essential services operators. The EU AI Act introduces data governance and localisation obligations for AI systems. GDPR remains the baseline, requiring Transfer Impact Assessments for any US-controlled provider.

Regulatory mapping: DORA, NIS2, and the EU AI Act Are Making Sovereign Cloud Mandatory for Some Workloads

How Do You Assess Your Own Cloud Sovereignty Posture?

Start with workload classification: map every cloud workload into sovereignty risk tiers — regulated or sensitive data requiring sovereign infrastructure versus non-critical workloads where standard hyperscaler regions are acceptable. For each regulated tier, apply the three-layer test: confirm data residency, verify which legal system governs, and assess who can compel access. Where full jurisdictional isolation is required, EU-native providers are the correct solution. Where the guardrail model is sufficient, hyperscaler sovereign offerings may be viable.

The multi-cloud approach most companies land on: EU-native providers for regulated or sensitive workloads, hyperscaler regions for everything else.

Full playbook: A Sovereign Cloud Due Diligence Playbook — Workload Classification, Exit Architecture, and BYOK

Why Are European Organisations Rethinking Their Cloud Dependency?

Sixty-one percent of European CIOs say they want to reduce dependency on US cloud providers, according to Gartner. Sixty-two percent of EU organisations are actively seeking sovereign cloud solutions, according to Accenture. US hyperscalers currently hold approximately 70% of the EU cloud market. The shift is driven by regulatory pressure, geopolitical risk awareness, and sovereign cloud posture becoming a competitive differentiator when selling to regulated enterprise buyers.

If you sell to regulated enterprises, your buyers’ sovereignty posture affects their vendor selection. A customer subject to DORA or NIS2 may prefer a supplier whose data processing arrangements do not introduce additional cross-border transfer risk.

Market context: Why 61 Percent of European CIOs Want to Reduce US Cloud Dependency and What It Means

Resource Hub: Sovereign Cloud Library

Foundations — Vocabulary and Legal Framework

Provider Evaluation — Hyperscalers and EU-Native Alternatives

Regulatory Drivers and Decision Frameworks

Market Context and Strategic Framing

Frequently Asked Questions

What is sovereign washing and how do I identify it?

Sovereign washing is the marketing practice of presenting data residency features — physical EU server locations — as equivalent to data sovereignty and full jurisdictional protection. You can identify it by applying the three-layer test: does the provider clearly confirm which legal system governs the data, and who can compel access? If a marketing claim stops at “your data stays in Europe” without addressing the parent company’s legal jurisdiction, that is sovereign washing. For a full taxonomy of sovereign washing signals in hyperscaler marketing, see Data Residency, Data Sovereignty, and Jurisdictional Control Are Not the Same Thing.

Can the US government access my data if it is stored in Europe?

Yes, if your cloud provider’s parent company is US-incorporated. Under the CLOUD Act, US authorities can compel data disclosure from US-based service providers regardless of where the data physically sits. The EU location of the servers does not override the US legal obligation on the provider. BYOK/HYOK encryption reduces the practical impact of a successful access order but does not remove the legal exposure. Full analysis: How the US CLOUD Act and FISA 702 Create Legal Exposure for EU Cloud Data.

Is the EU-US Data Privacy Framework sufficient protection?

No. The EU-US Data Privacy Framework (the successor to Privacy Shield, itself the successor to Safe Harbor) governs commercial data transfer arrangements between EU and US entities. It does not modify the CLOUD Act or FISA Section 702. US law enforcement and intelligence agencies can still compel data access under those statutes regardless of the Privacy Framework status. The European Court of Justice has already invalidated two previous EU-US transfer frameworks on precisely this basis.

Should my company use AWS or Azure for sensitive customer data?

It depends on which layer of sovereignty your workload requires. For workloads requiring full jurisdictional isolation — financial data under DORA, health data under the European Health Data Space regulations, government-classified information — hyperscaler sovereign offerings do not resolve the CLOUD Act exposure. For workloads where the guardrail model is sufficient — improved operational separation, EU-domiciled legal entity, EU key management — AWS ESC and Azure sovereign options represent a materially better posture than a standard hyperscaler region. The due diligence playbook provides a workload classification framework.

What sovereign cloud certifications should I look for?

BSI C5 (Germany’s Federal Office for Information Security Cloud Computing Compliance Criteria Catalogue) and SecNumCloud (France’s ANSSI certification framework) are the primary national-level standards. BSI C5 is an attestation confirming specific security and operational controls — it does not certify full jurisdictional isolation. SecNumCloud is stricter, requiring that providers and their supply chains have no non-EU jurisdictional exposure. ISO 27001 is a necessary baseline but does not address sovereignty specifically. Certification scope matters: always verify what a certification confirms and what it does not.

How do I know if my company is affected by DORA, NIS2, or the EU AI Act?

DORA applies to EU financial entities and their ICT third-party service providers from January 2025. If your company is a bank, insurer, investment firm, payment institution, or provides ICT services to those entities in the EU, DORA applies. NIS2 applies to medium and large organisations in 18 essential and important sectors including energy, transport, finance, healthcare, digital infrastructure, and ICT services. The EU AI Act applies to any organisation placing AI systems on the EU market or using AI systems in the EU. For a full regulatory mapping by company type, see DORA, NIS2, and the EU AI Act Are Making Sovereign Cloud Mandatory for Some Workloads.

What is the practical first step for a company reviewing its sovereignty posture?

Workload classification. Map every current cloud workload into sovereignty risk tiers before evaluating any provider or making any migration decision. Identify which workloads involve regulated data (personal data, financial records, health records, AI training data for EU-market systems), which involve sensitive but unregulated data, and which are non-critical. Once you know which workloads carry sovereignty risk, you can apply the three-layer test to your current provider and determine the gap. The full classification framework is in A Sovereign Cloud Due Diligence Playbook.

Why 61 Percent of European CIOs Want to Reduce US Cloud Dependency and What It Means

A Gartner survey of 241 Western European CIOs published November 2025 found that 61 percent plan to increase their reliance on local cloud providers — and they’re doing it for geopolitical reasons. A separate Accenture study found 62 percent of European organisations are actively seeking sovereign cloud solutions. And Airbus has issued a ~€50 million, decade-long tender to migrate mission-critical applications away from US hyperscalers to an EU-native provider.

These aren’t isolated data points. They’re converging signals of a genuine market realignment that is already changing procurement decisions at major European enterprises.

Here’s the context you need: AWS, Microsoft Azure, and Google Cloud still control around 70 percent of the European cloud market. There’s a very long road between stated intent and actual market share change.

But the direction is clear. This article explains what’s driving the shift, where the alternatives actually stand, and what it means if you’re an SMB tech company selling into European enterprises. For the foundational framework on this topic, see our guide to sovereign cloud explained.

What Does the 61 Percent CIO Statistic Actually Tell Us?

The 61 percent figure comes directly from Gartner’s survey of 241 Western European CIOs, published November 2025. The term Gartner uses for what these CIOs are planning is “geopatriation” — deliberately migrating workloads to local providers for geopolitical or jurisdictional reasons. That’s different from migration driven by cost or technical capability. It’s strategic.

The Accenture finding — 62 percent of European organisations seeking sovereign solutions — uses a different methodology and lands within one percentage point of Gartner’s number. Two independent research firms arriving at the same conclusion isn’t a statistical artefact. It’s a credible signal.

And it’s not just intent. Gartner also found that 44 percent of responding CIOs have already restricted their use of global providers. A substantial chunk of that stated intent has already converted into action.

Why Has European Cloud Dependency Become a Strategic Problem?

The core legal driver is the US CLOUD Act (2018). It lets US law enforcement compel American companies to hand over data stored anywhere in the world — including on servers physically inside the EU — regardless of what local data protection laws say. Jurisdiction follows the company’s nationality, not the location of the data.

This is a structural problem that data residency alone can’t solve. Storing data in Frankfurt on AWS satisfies data residency requirements. That same data is still potentially accessible under a US legal order directed at Amazon. Microsoft has publicly acknowledged it can’t guarantee data independence from US law enforcement. FISA Section 702 adds another layer: US intelligence agencies can access data held by US companies under separate surveillance authorities.

EU regulatory pressure makes things worse. DORA requires regulated sectors to demonstrate multi-vendor survivability, which directly penalises concentration on a single non-EU provider. NIS2 requires full visibility and control over infrastructure supply chains. The EU AI Act requires high-risk AI systems to operate entirely under EU jurisdiction. For financial services and healthcare in particular, these regulations directly intersect with sovereignty requirements.

The upshot: for any organisation with genuine sovereignty requirements, using a US-headquartered cloud provider cannot fully address legal exposure — regardless of which sovereign tier that provider makes available.

What Is GAIA-X and Can It Replace US Hyperscalers?

GAIA-X is a European Commission-backed initiative launched in 2019, designed to create a federated cloud ecosystem with shared sovereignty standards. Short answer on whether it can replace US hyperscalers: not yet.

GAIA-X is a standards and governance framework. Rules, interoperability standards, trust mechanisms. It is not a cloud provider. Its objective is to create the conditions for sovereignty-compliant cloud services to emerge — not to deliver them itself.

The near-term limitations are real and documented. US hyperscalers are members of GAIA-X working groups. CISPE, the EU cloud trade association representing EU-native providers, has accused the initiative of being structured to favour American hypercloud providers. The project has experienced significant internal friction and delays. Even GAIA-X chairwoman Catherine Jestin was candid about it: “The idea was more to foster the creation of it and to put in place the conditions for this to emerge. This has not been completely successful.”

GAIA-X still matters as a long-term foundation for EU-native alternatives to scale. But it is not a near-term answer to the operational sovereignty problem CIOs are dealing with today.

What Are EuroStack and CAIDA Adding Beyond GAIA-X?

Two newer initiatives are trying to fill that gap. EuroStack, commissioned by Bertelsmann Stiftung, takes a more stringent approach than GAIA-X — addressing all seven layers of Europe’s digital stack with EU-native governance. Where GAIA-X welcomed US hyperscalers into working groups, EuroStack focuses explicitly on European providers.

CAIDA (Cloud and AI Development Act) is proposed EU legislation that would mandate tripling EU data centre capacity within 5-7 years, aiming to meet EU business and public administration needs by 2035.

Neither is operational yet. But together they signal that the EU has learned from GAIA-X. The next generation combines stricter governance (EuroStack) with actual infrastructure investment (CAIDA). That’s a more complete strategy than the standards-only approach that limited GAIA-X.

What Does the Airbus Migration Tell Us About Where This Is Heading?

Airbus is the clearest signal that the sovereignty shift has moved from survey responses to actual procurement action.

A ~€50 million tender to migrate mission-critical applications from AWS, Azure, and Google Cloud to a sovereign European provider. Decade-long programme. The scope covers data at rest, data in transit, logging, identity and access management, and security monitoring — all of it.

Catherine Jestin, Airbus EVP Digital, put it simply: “We want to ensure this information remains under European control.” That is not compliance language. That is strategic sovereignty reasoning from one of Europe’s largest industrial organisations.

She was also candid about the supply-side challenge: “The question that I have is: is there any existing European infrastructure capable to deliver that service? That’s why we are launching this request for proposal.” That uncertainty tells you exactly where the market gap sits right now.

When a company of Airbus’s scale commits to EU-native cloud for sovereignty reasons, it normalises the decision for other European enterprises and sets a real-world reference point for what procurement sovereignty requirements look like in practice. For readers wanting to understand which providers are positioned to capture this demand, see our analysis of EU-native cloud providers driving this market shift.

What Does the European Cloud Sovereignty Shift Mean for SMB Tech Companies?

If you sell to regulated European enterprises, your cloud sovereignty posture is increasingly a factor in their procurement decisions. Here’s what that means in practice.

The procurement blocker: Regulated enterprises are including sovereignty requirements in RFPs and vendor criteria. Some are specifically requesting natively European cloud providers — not just EU data residency. If your infrastructure is entirely on a US hyperscaler with no sovereignty measures in place, you risk getting screened out before a sales conversation even begins.

The sales enabler: Having a clear sovereignty posture — EU-hosted infrastructure for customer data, documented data residency controls, a CLOUD Act exposure assessment — can differentiate you in competitive enterprise sales. Most SMB tech companies haven’t developed a sovereignty narrative yet. Being ahead of the curve is a differentiator now. In a few years it’ll be table stakes.

For FinTech companies specifically, DORA adds a compliance dimension beyond procurement preference. If your platform depends entirely on a single US hyperscaler and you can’t demonstrate a credible continuity plan, you may be creating a compliance problem for your banking customers.

And then there’s the cascade effect. When large enterprises like Airbus operationalise sovereignty policies, those requirements flow down through their supplier and vendor networks. That reaches further than you might expect.

The practical starting point is a cloud sovereignty assessment. Map where customer data is stored. Identify which workloads are on US-headquartered infrastructure. Evaluate your CLOUD Act exposure. Work out whether your provider’s sovereign tier actually addresses the jurisdictional exposure your customers face — for a detailed breakdown of what hyperscaler sovereign cloud offerings actually deliver, see our assessment of AWS ESC and Azure sovereign options. Then build the narrative to explain your position to procurement teams.

The question isn’t whether this shift is happening — the convergence of data makes that clear. The question is how quickly you get ahead of it.

For a full strategic framework, see our guide to understanding sovereign cloud.

Frequently Asked Questions

Why do 61 percent of European CIOs want to reduce reliance on US cloud providers?

Legal risk from the US CLOUD Act, regulatory pressure from DORA, NIS2, and the EU AI Act, and accelerating geopolitical tensions have combined to make US cloud dependency a strategic liability.

What is the US CLOUD Act and how does it affect European companies?

The CLOUD Act (2018) allows US law enforcement to compel American companies to produce data stored anywhere in the world. For European organisations using AWS, Azure, or Google Cloud, their data may be accessible to US authorities regardless of where it is physically stored. Jurisdiction follows the nationality of the company, not the location of the server.

What is GAIA-X and what are its current limitations?

GAIA-X is a European Commission-backed standards and governance framework for cloud sovereignty — not a cloud provider. Near-term impact is limited because US hyperscalers are members of working groups, which CISPE has criticised as undermining the sovereignty objectives.

What is the EuroStack initiative?

EuroStack is a newer EU initiative taking a more stringent approach than GAIA-X, addressing all seven layers of Europe’s digital stack with EU-native governance. It is currently in development.

What is CAIDA and how does it support European cloud sovereignty?

CAIDA (Cloud and AI Development Act) is proposed EU legislation that aims to triple EU data centre capacity within 5-7 years — the legislative and investment backing for building sovereign cloud infrastructure at scale.

How much of the European cloud market do US hyperscalers control?

AWS, Microsoft Azure, and Google Cloud together control approximately 70 percent of the European cloud market. That structural dominance means the gap between CIO intent and actual market share change will take years to close.

What is the Airbus sovereign cloud migration?

Airbus is preparing a ~€50 million, decade-long tender to migrate mission-critical applications from US hyperscalers to a sovereign European provider. Catherine Jestin, Airbus EVP Digital, stated the goal is to ensure Airbus’s information “remains under European control.”

Does data residency in the EU protect against US government data access?

No. The US CLOUD Act can compel a US-headquartered provider to produce data regardless of where it is stored. Full sovereignty requires addressing both physical location and legal jurisdiction. Microsoft has acknowledged it cannot guarantee data independence from US law enforcement.

What does cloud sovereignty mean for SMB SaaS companies selling to European enterprises?

Your cloud sovereignty posture can be a sales enabler — differentiating you in competitive processes — or a procurement blocker — failing you at screening stage. The dynamic is already operational in regulated sectors.

Is European cloud sovereignty just a compliance trend or a lasting market shift?

The convergence of legal risk, regulatory pressure, enterprise procurement action, and legislative investment indicates a structural market realignment, not a temporary compliance trend.

How can SMB tech companies assess their cloud sovereignty exposure?

Map where customer data is stored, identify workloads on US-headquartered infrastructure, evaluate CLOUD Act exposure for each, and determine whether your provider’s sovereign tier actually addresses the jurisdictional exposure. Then develop a sovereignty narrative for enterprise procurement teams.

What is geopatriation in the cloud context?

Geopatriation is Gartner’s term for deliberately migrating workloads to local or regional providers for geopolitical or jurisdictional reasons — the strategic behaviour that 61 percent of European CIOs are planning.

A Sovereign Cloud Due Diligence Playbook — Workload Classification, Exit Architecture, and BYOK

Most sovereign cloud guidance is written for enterprises with dedicated legal teams and seven-figure cloud budgets. Here’s the problem: a 100-person SaaS company faces exactly the same CLOUD Act exposure and GDPR obligations as a Fortune 500. You just can’t afford to over-invest in controls you don’t need — or get the evaluation wrong.

This playbook gives you five concrete steps: classify workloads by sovereignty risk tier, evaluate providers against three layers (residency, jurisdiction, cryptographic control), implement BYOK or HYOK encryption, design for exit, and build a government-request SOP. The goal is a differentiated approach — sovereign-grade protection where it matters, standard hyperscaler where it doesn’t.

For foundational context, start with our sovereign cloud hub.


What Are the Five Steps in a Sovereign Cloud Due Diligence Framework?

Five steps, each producing a concrete deliverable:

  1. Classify workloads by sovereignty risk tier → tiered workload register
  2. Evaluate providers against the three-layer sovereignty model → provider scorecard
  3. Implement cryptographic controls (BYOK or HYOK) → encryption configuration per tier
  4. Design for exit → exit runbook with quantified egress costs
  5. Build a government-request SOP → documented triage procedure for CLOUD Act warrants

The whole framework is anchored to three sovereignty layers. A provider can pass one while failing the others:

The CLOUD Act follows a simple principle: jurisdiction follows the provider. Frankfurt servers, Seattle headquarters — that satisfies residency. It does not satisfy jurisdiction or cryptographic control without specific configurations.


How Do You Classify Workloads by Sovereignty Risk Tier?

Workload classification assigns every cloud-hosted workload to one of three tiers based on data sensitivity, regulatory exposure, and the consequences of jurisdictional access. Get this right and every downstream decision follows naturally.

Tier 1 — High Sovereignty Risk: production databases with PII, financial records, healthcare data, AI training datasets with personal data. Requires HYOK with EU-based HSM; EU-native or sovereign cloud only.

Tier 2 — Medium Sovereignty Risk: business analytics, internal tool logs, configuration state, non-personal operational data. Requires BYOK with EU-located HSM; standard EU regions with guardrails; metadata sovereignty must be assessed.

Tier 3 — Low Sovereignty Risk: public-facing CDN content, open-source assets, CI/CD pipelines, dev/staging. Provider-managed keys acceptable; standard hyperscaler regions are fine.

The 100-Person SaaS Company

Here’s what that looks like in practice:

The Metadata Sovereignty Gap

Here’s something that catches people out. A Tier 1 workload’s content may be correctly classified, but its telemetry and logs can leak to non-sovereign infrastructure — a compliance gap even when data residency is satisfied. Classification is also not permanent; regulatory changes or entering a new market trigger reclassification. Build that review into your annual compliance cycle.

For which regulations apply to each tier, see which regulations apply to your workload tiers.


How Do You Evaluate a Cloud Provider Against the Three-Layer Sovereignty Framework?

Remember: a provider can satisfy one layer while failing the others. Data residency alone is not sovereignty. Tier 1 workloads require all three layers to pass. Tier 3 may accept a residency-only pass.

Layer 1 — Data Residency

Layer 2 — Legal Jurisdiction

Layer 3 — Cryptographic Control

Exit and Portability

AWS European Sovereign Cloud (Brandenburg): Operated exclusively by EU residents under German law legal entities. Passes Layer 1. Layer 2 is qualified — AWS remains a US-headquartered company subject to the CLOUD Act. Layer 3 depends on your encryption configuration.

EU-native providers (Hetzner, OVHcloud, Scaleway, T-Systems): EU-headquartered, no CLOUD Act exposure. Passes all three layers. Trade-off: smaller managed service portfolios.

For a detailed scoring, see how hyperscaler sovereign offerings score against these criteria and EU-native providers to consider for your regulated workloads.


How Do You Implement BYOK or HYOK to Reduce CLOUD Act Exposure?

Once you know a provider’s jurisdictional status, the right encryption tier follows directly.

BYOK (Bring Your Own Key): You generate keys and import them into the provider’s Key Management Service (e.g., AWS KMS). The provider stores and manages them — you have more control than with provider-managed keys, but the provider still has operational access after import. Under a CLOUD Act warrant, the provider controls both data and keys. BYOK reduces practical risk but does not remove the provider’s technical ability to comply.

HYOK (Hold Your Own Key): Keys never leave your environment — stored in your own EU-based Hardware Security Module (HSM). The provider requests temporary access for encryption operations; keys are then purged and never persisted. The provider cannot technically comply with a decryption order because it does not possess the keys. As Brian Robertson of Thales Group puts it: “Encrypting data without managing the keys is like locking the door and leaving the key under the mat.”

Which Tier Gets Which Approach

Tier 1 — High Risk: HYOK. EU-based HSM; external key store never hosted on provider infrastructure.

Tier 2 — Medium Risk: BYOK with EU HSM. Customer-managed key in AWS KMS / Azure Key Vault; HSM must be EU-located.

Tier 3 — Low Risk: Provider-managed keys acceptable.

For full detail on what BYOK mitigates and what it does not, see the specific CLOUD Act exposure that BYOK mitigates.


How Do You Design an Exit Architecture to Avoid Sovereign Cloud Vendor Lock-in?

Exit architecture is the operational layer of sovereignty — the deliberate design that ensures workloads can migrate without prohibitive cost. The core principle is simple: adopt open standards from the start so switching providers is a configuration change, not a rewrite.

Every proprietary dependency is an exit cost multiplier. Map these during workload classification. Kubernetes (portable) vs AWS ECS. Terraform vs CloudFormation. S3-compatible storage vs AWS S3 with proprietary features. PostgreSQL-compatible vs Aurora Serverless or DynamoDB.

Egress fees run $0.05–$0.09 per GB. For 50TB, that’s $2,500–$4,500 — before labour, refactoring, and downtime. DORA Article 28 requires financial entities to maintain documented, tested exit plans for critical ICT service providers. Even outside financial services, this is sound practice.

Exit Architecture Checklist

For DORA exit planning context, see which regulations apply to your workload tiers.


How Do You Build a Government-Request SOP Without a Legal Team?

The legal layer of sovereignty is a government-request SOP — a pre-built procedure for when your cloud provider receives a CLOUD Act warrant. Have it documented before it happens. The CLOUD Act has no company-size threshold, so there’s no getting out of this one.

Step 1 — Validate the legal basis: Which jurisdiction? Which instrument? Specific in scope? If not, the provider can challenge it.

Step 2 — Assess GDPR transfer basis: GDPR Article 48 prevents transfer to non-EU authorities based solely on a foreign court order. Does a lawful transfer basis exist? Document it.

Step 3 — Determine MLAT routing: Mutual Legal Assistance Treaties provide a bilateral process satisfying both US and EU requirements. If appropriate, advocate for it.

Step 4 — Consider a comity challenge: If US and EU obligations conflict, providers can contest orders using comity principles.

Step 5 — Escalate to external legal counsel: Engage counsel with steps 1–4 complete. It saves fees and response time.

For HYOK-protected workloads, the provider cannot technically comply — the request is redirected to you as data controller. Designate a named role for receiving requests, not a general inbox, and review the SOP annually.

For detailed coverage of the legal exposure behind this SOP, see the specific CLOUD Act exposure that BYOK mitigates.


What Does a Multi-Cloud Hybrid Sovereignty Strategy Look Like in Practice?

Here’s the pragmatic answer for a 100-person SaaS company: EU-native or sovereign cloud for Tier 1 and Tier 2 regulated workloads, standard hyperscaler for Tier 3. This avoids both extremes — full hyperscaler exit (expensive and unnecessary for non-sensitive workloads) and full hyperscaler reliance (Tier 1 data exposed to CLOUD Act risk).

Concrete pattern: Hetzner Cloud or OVHcloud for Tier 1 (German/French jurisdiction, no CLOUD Act exposure, HYOK). EU-native provider with BYOK for Tier 2, monitoring and logging in the EU. AWS Frankfurt for Tier 3.

AWS European Sovereign Cloud carries a 10–15% premium over standard Frankfurt regions. Hetzner offers significantly lower compute and storage costs — one case study saw a platform cut infrastructure costs by 60% after migrating from AWS to Hetzner using Kubernetes. The trade-off: fewer managed services, more operational responsibility.

The hybrid strategy needs a unified management layer — Kubernetes and Terraform spanning both environments, EU-based monitoring for Tier 1 and Tier 2 workloads. A Tier 1 workload on Hetzner can be undermined by a monitoring agent that phones home to a US-headquartered analytics service.

For EU-native provider options, see EU-native providers to consider for your regulated workloads. For the sovereign cloud overview, see our sovereign cloud hub.


Sovereign Cloud Due Diligence Checklist

Copy this into your internal documentation or use it as a quarterly review.

Step 1 — Workload Classification

Step 2 — Provider Evaluation

Step 3 — Encryption Configuration

Step 4 — Exit Architecture

Step 5 — Government-Request SOP

Red Flags — Act Now


FAQ

Does BYOK actually protect my data from a CLOUD Act warrant?

BYOK reduces practical risk but does not eliminate it. With BYOK, the provider stores your key material in their KMS — they control both data and keys, and can produce decrypted data if compelled. HYOK removes the provider’s technical ability to comply entirely.

What is the difference between data residency and data sovereignty?

Data residency means data is physically stored within a specific geographic boundary. Data sovereignty means it is also subject only to the laws of that jurisdiction. A US-headquartered provider with EU data centres satisfies residency but not sovereignty — the CLOUD Act gives US authorities legal access regardless of server location.

How much does it cost to migrate off a hyperscaler?

Egress fees run $0.05–$0.09 per GB. For 50TB, that’s $2,500–$4,500 in egress alone — before labour, refactoring, and downtime. Open standards adoption (Kubernetes, Terraform) reduces refactoring costs significantly.

Do I need a sovereign cloud for all my workloads?

No. Tier 3 workloads (dev/staging, CDN, open-source assets) are fine on standard hyperscaler regions. The hybrid approach — sovereign-grade for sensitive workloads, hyperscaler for non-critical — is the cost-effective strategy.

What metadata leaves the EU in a standard AWS deployment?

CloudWatch logs, CloudTrail audit records, AWS Config snapshots, and API call metadata may be processed outside the EU even when content data stays in an EU region. This telemetry reveals access patterns, user behaviour, and system architecture — a compliance gap even when data residency is satisfied.

Is AWS European Sovereign Cloud sufficient for GDPR compliance?

It addresses data residency and operational control — EU-resident staff, German law legal entities. But AWS remains a US-headquartered company subject to the CLOUD Act. For Tier 1 workloads, assess whether HYOK is needed to close the jurisdictional gap. For Tier 2 and Tier 3, it may be sufficient depending on your risk assessment.

What should I do if my cloud provider receives a government data request?

Follow the five-step SOP: validate the legal basis, assess GDPR transfer basis, determine if MLAT routing is appropriate, consider a comity challenge, then escalate to external legal counsel with your triage complete. For HYOK-protected data, the request is redirected to you as data controller.

How does DORA Article 28 affect my cloud exit planning?

It requires financial entities to maintain documented, tested exit plans for critical ICT service providers — regularly updated, tested at intervals, approved by senior management. Even outside financial services, this is sound practice.

Is Hetzner a viable alternative to AWS for production workloads?

Hetzner offers lower compute and storage costs and operates under German legal jurisdiction with no CLOUD Act exposure. For Tier 1 and Tier 2 workloads that don’t need AWS’s managed service breadth, it’s a solid sovereign option. Trade-off: smaller managed service portfolio, more operational responsibility.

How do I handle sovereignty for AI training data?

AI training datasets with personal data are Tier 1. The EU AI Act requires full traceability, secure storage, risk classification, and auditability of AI pipelines. Apply HYOK encryption and ensure the training infrastructure itself — not just data storage — operates within EU jurisdiction.

What is the minimum viable sovereignty posture for an SMB?

Classify workloads into three tiers, implement BYOK for Tier 2, use EU regions for all production data, document a government-request SOP, and maintain an exit runbook. For companies handling regulated PII, add HYOK for Tier 1 and evaluate EU-native providers. Achievable without a dedicated compliance team.

DORA, NIS2, and the EU AI Act Are Making Sovereign Cloud Mandatory for Some Workloads

Three EU regulations have turned sovereign cloud from a preference into a hard compliance requirement for specific industries and workload types. DORA entered full enforcement in January 2025 and targets financial entities. NIS2 applies to essential services — energy, healthcare, and more. The EU AI Act reaches full application in August 2026 and creates data governance obligations for high-risk AI systems that flow directly into infrastructure decisions.

If you’re running FinTech, HealthTech, or regulated SaaS, your cloud infrastructure is no longer purely a technical call. For certain workloads, it’s a regulatory one.

If you need the basics on what sovereign cloud actually means and how providers differ, start with understanding sovereign cloud.


Why Are EU Regulations Now Directly Shaping Cloud Strategy?

Until recently, GDPR was the primary data protection baseline for EU organisations — but it left cloud infrastructure choices largely to each organisation’s own risk assessment. DORA, NIS2, and the EU AI Act each add specific obligations around exit strategies, supply chain sovereignty, and AI data governance that directly constrain the infrastructure decisions you can make.

Things tightened further in October 2025 when the European Commission published its Cloud Sovereignty Framework, defining sovereignty objectives for EU institutions procuring cloud services. The context matters here: US hyperscalers control more than 70% of the EU cloud market and they’re subject to US extraterritorial laws — the CLOUD Act and FISA — that allow US authorities to compel data access regardless of where the servers physically sit. Regulators have taken note and are building that reality into compliance frameworks.

Before we get into each regulation, let’s be precise about the terminology, because it matters:

Most regulated workloads now require sovereignty, not just residency. A US hyperscaler running EU-based servers gives you data residency. It does not give you data sovereignty when US law can compel disclosure regardless of server location.


What Does DORA Require for Financial Entity Cloud Strategy?

DORA applies to all financial entities regardless of company size — FinTech companies, banks, payment processors, insurance firms, and investment firms. If you’re a 50-person FinTech processing payments, you’re in scope. Same as the major banks.

The mandatory exit strategy under Article 28 requires financial entities to maintain documented, tested, and auditable plans to migrate away from any critical ICT provider without service disruption. Audits are already happening. If your institution can’t demonstrate operational resilience, you’re looking at remediation requirements.

DORA’s ICT concentration risk rules also require that critical functions not be too heavily concentrated with a single provider. Running your entire platform on one hyperscaler’s managed services creates a compliance problem if those services are critical functions.

Here’s where the practical challenge bites: if your platform is tightly coupled to proprietary APIs from a single vendor, your documented exit plan may require a multi-year migration. That’s not a contingency plan — that’s lock-in with a compliance label on it.

DORA Article 28 also mandates specific contractual provisions with all critical ICT providers: audit rights, data access guarantees, and incident support obligations. The EU Data Act’s cloud switching regime, effective September 2025, complements DORA — customers can initiate a provider switch on two months’ notice, with transitions completing within 30 days and switching charges eliminated from January 2027.

Here’s what you should be doing right now:


Does NIS2 Require Sovereign Cloud for Essential Services?

NIS2 expanded cybersecurity obligations across two entity categories:

NIS2 doesn’t use the words “sovereign cloud.” But its supply chain security obligations create de facto sovereignty requirements for many organisations in scope. Under Article 21, risk assessments may require choosing EU-based providers, particularly where a provider’s exposure to foreign government data access poses supply chain security concerns.

The practical effect: NIS2 makes you accountable for your entire supply chain’s sovereignty posture. The US CLOUD Act exposure of US-headquartered providers becomes an explicit input into that assessment. And because classification thresholds vary by member state transposition, compliance gets more complicated if you’re operating across multiple EU jurisdictions.

For HealthTech companies, NIS2 is compounded by the European Health Data Space (EHDS), Regulation (EU) 2025/327. The EHDS allows EU member states to require that health data be stored and processed exclusively within the EU, unless a GDPR adequacy decision exists for the destination country. HealthTech companies within the EHDS framework face additional constraints on where health data can be hosted, layered on top of NIS2’s baseline obligations.


How Does the EU AI Act Create Cloud Sovereignty Requirements for SaaS Companies?

The EU AI Act (Regulation 2024/1689) reaches full application on August 2, 2026, with penalties reaching 7% of global annual turnover for high-risk AI system violations. It classifies AI systems by risk: prohibited systems (unacceptable risk), high-risk systems in Annex III categories (biometric identification, critical infrastructure, education, employment, access to essential services), limited-risk systems with transparency requirements, and minimal-risk systems.

Here’s the thing: many SaaS products in FinTech (creditworthiness assessment), HR tech (CV screening), and HealthTech (medical-device-adjacent AI) deploy features that may qualify as high-risk under Annex III. If you haven’t checked, now’s the time.

If a system is high-risk, the data governance obligations kick in: complete audit trails for training data provenance, data quality protocols across training, validation, and testing datasets, detailed technical documentation, ongoing risk management, and human oversight mechanisms.

The infrastructure implication is direct. SaaS platforms using shared hyperscaler managed AI services often can’t provide the infrastructure-level provenance required for compliance. Self-hosted or sovereign environments enable the complete data lineage documentation the EU AI Act requires. Most SaaS companies haven’t yet connected their EU AI Act obligations to their cloud infrastructure decisions. That link exists. The August 2026 deadline doesn’t wait.

If you’re in FinTech deploying AI, you’re likely facing both DORA and the EU AI Act simultaneously. HealthTech adds NIS2 and potentially EHDS on top. These regulations stack. Plan accordingly.


What Is the Difference Between BSI C5 and SecNumCloud Certification?

BSI C5 (Cloud Computing Compliance Criteria Catalogue) is the German Federal Office for Information Security certification, with Type I (design attestation) and Type II (operational effectiveness) audit levels. It focuses on operational security practices but doesn’t exclude non-European providers — US hyperscalers can and do hold it. AWS European Sovereign Cloud holds BSI C5. You can verify certifications at https://www.bsi.bund.de.

SecNumCloud is the French ANSSI certification with stricter sovereignty requirements. It requires the cloud provider to be immune to requests from public authorities of third countries, store and process client data exclusively within the EU, and have its registered office within the EU. Those requirements effectively exclude US-headquartered providers in their native form. SecNumCloud is required for French government and critical infrastructure workloads.

The practical difference is straightforward. BSI C5 certifies operational security practices. SecNumCloud certifies both operational security and legal sovereignty. A provider holding SecNumCloud guarantees no foreign government data access. A provider holding BSI C5 does not — US hyperscalers with BSI C5 remain subject to the CLOUD Act.

The EUCS (European Cybersecurity Certification Scheme for Cloud Services) from ENISA was supposed to harmonise these national schemes into a single EU-level framework. Political controversy over whether the highest assurance level should require EU-headquartered providers has blocked its finalisation. As of late 2025, the EUCS had not been adopted.


What Does Each Regulation Mean for Your Cloud Strategy?

If you’re in FinTech, DORA compliance is the immediate priority and audits are already happening. Audit existing cloud contracts for Article 28 provisions — switch notice rights, data portability guarantees, and zero switching charges from January 2027. Classify workloads by DORA criticality and document exit strategy architecture now.

If you’re in HealthTech, you need to determine whether you classify as essential or important entity under NIS2 for your member state’s transposition. Assess whether your current cloud providers satisfy Article 21 supply chain security obligations. Then layer EHDS on top: health data may need to be stored and processed exclusively within the EU. Where NIS2 or EHDS push you toward full legal isolation, EU-native providers that meet these certification requirements are the practical answer.

If you’re in SaaS and deploying AI, assess whether any AI system falls into the high-risk Annex III categories. If it does, your current infrastructure must support complete data lineage documentation, audit trails, and human oversight mechanisms by August 2026.

The Transfer Impact Assessment (TIA) applies across all three scenarios. Under GDPR Chapter V, a TIA is required whenever data is accessible by entities in jurisdictions without adequate data protection. The US CLOUD Act means any US-headquartered cloud provider triggers this requirement — even when data is stored in EU data centres.

For most organisations, a hybrid sovereign model is a practical response: regulated and mission-critical workloads in a sovereign environment, less sensitive internal applications on standard public cloud. See how to integrate these requirements into your workload classification for a systematic approach to sorting it all out.


FAQ

Which EU regulations specifically require sovereign cloud?

No single EU regulation uses the term “sovereign cloud” as a mandate. But DORA’s exit strategy and concentration risk requirements, NIS2’s supply chain security obligations, and the EU AI Act’s data governance framework collectively make sovereign cloud effectively mandatory for regulated workloads in financial services, essential services, and high-risk AI deployments.

Does DORA apply to FinTech startups or only large banks?

DORA applies to all financial entities regardless of size. Scope is determined by function (financial services), not company size. A 50-person FinTech processing payments is in scope alongside major banks.

Can AWS or Azure satisfy DORA sovereign cloud requirements?

US hyperscalers can satisfy many DORA technical requirements. AWS European Sovereign Cloud holds BSI C5 certification. However, US-headquartered companies remain subject to the US CLOUD Act, which allows US authorities to compel data access regardless of server location — a risk that must be assessed in any DORA exit strategy and Transfer Impact Assessment.

What is the difference between data residency and data sovereignty?

Data residency means data is physically stored within a geographic boundary. Data sovereignty means data is subject to the laws and governance of a specific jurisdiction with no foreign government access. A US hyperscaler with EU-based servers provides data residency — but the CLOUD Act can compel disclosure regardless of server location.

What is a DORA exit strategy and what must it include?

Under DORA Article 28, financial entities must maintain documented, testable, regulator-auditable plans to migrate away from any critical ICT provider without service disruption. Requirements include switch notice rights (two months), 30-day transitional periods, data portability provisions, and zero switching charges from January 2027.

Is SecNumCloud required for all EU sovereign cloud deployments?

No. SecNumCloud is required for French government and critical infrastructure workloads. For German regulatory contexts, BSI C5 is the relevant standard. Which certification matters depends on your operational jurisdiction and the specific regulation creating the sovereignty obligation.

How does the EU AI Act affect cloud infrastructure decisions?

The EU AI Act requires operators of high-risk AI systems to implement data governance frameworks with documented data sources, quality controls, and complete audit trails. SaaS platforms running on shared managed AI services typically can’t provide the infrastructure-level provenance required for compliance. August 2026 is the enforcement deadline.

What is the EU Cloud Sovereignty Framework from October 2025?

The EU Cloud Sovereignty Framework was published by the European Commission in October 2025 to define sovereignty objectives for EU institutions procuring cloud services. Key criteria include whether the cloud service is headquartered outside the EU, processes data outside the EU, or is exposed to foreign government influence.

Do I need a Transfer Impact Assessment if I use a US cloud provider in Europe?

Yes. Under GDPR Chapter V, a TIA is required whenever data is accessible by entities in jurisdictions without adequate data protection. The US CLOUD Act means a TIA is required even when data is stored in EU data centres.

What is BSI C5 and where can German companies verify it?

BSI C5 (Cloud Computing Compliance Criteria Catalogue) is the German Federal Office for Information Security certification scheme for cloud services. It has Type I (design attestation) and Type II (operational effectiveness) audit levels. Verify provider certifications at https://www.bsi.bund.de.

What workloads actually require sovereign cloud under these regulations?

DORA covers financial transaction processing, payment systems, and critical ICT functions in financial entities. NIS2 covers workloads in essential services — energy, health, transport, digital infrastructure. The EU AI Act covers training data and inference infrastructure for high-risk AI systems. Non-regulated internal tooling typically falls outside these requirements.

What is the EHDS and how does it affect HealthTech cloud decisions?

The European Health Data Space (EHDS) is Regulation (EU) 2025/327, regulating primary and secondary use of electronic health data. It allows EU member states to require that health data be stored and processed exclusively within the EU. HealthTech companies within the EHDS framework face additional constraints on where health data can be hosted, layered on top of NIS2’s baseline obligations.


For a complete overview of data residency, sovereignty, and jurisdictional control across all these dimensions, see our sovereign cloud explained guide.

EU-Native Cloud Providers Compared — Hetzner, OVHcloud, Scaleway, and T-Systems

Here’s the thing most people miss about cloud sovereignty: it’s not about where your data physically lives — it’s about who controls it legally.

US hyperscalers own around 70% of Europe’s cloud infrastructure. They’ll tell you their EU sovereign offerings come with data residency and technical controls. That’s true. But the US CLOUD Act means American authorities can compel a US-headquartered provider to hand over data stored in the EU regardless of where the servers are. Jurisdiction follows the company, not the data centre.

EU-native cloud providers — 100% EU-owned, EU-operated, and governed exclusively by EU law — cut that exposure off at the source. No US parent company means no entity that can receive a CLOUD Act warrant. This is what’s called the Full EU Isolation Model, and it’s the strongest legal protection tier available to European organisations.

This article compares the four leading EU-native providers — Hetzner, OVHcloud, Scaleway, and T-Systems — on legal protection, service capability, and workload fit. We’ll also cover two important nuances: the OVHcloud Canadian court order that shows EU-native protection isn’t absolute, and why GAIA-X certification isn’t the sovereignty guarantee it sounds like. For the complete framework behind all of this, see our sovereign cloud explained guide.


What Does Full EU Isolation Actually Mean — and How Is It Different from Hyperscaler Sovereign Cloud?

Three things determine genuine sovereignty — and it’s worth being clear about what each one means.

Data residency is where data physically sits. Data sovereignty is which legal system governs it. Jurisdictional control is who can legally compel access. EU-native providers satisfy all three. Hyperscaler sovereign cloud offerings typically satisfy only the first.

The gap in AWS’s European Sovereign Cloud and Azure’s EU Data Boundary isn’t technical — it’s structural. When a US parent company exists, that entity can receive a CLOUD Act warrant. That directly conflicts with GDPR Article 48, which requires an international agreement before EU data reaches non-EU authorities. Microsoft’s chief legal officer admitted before the French Senate that Microsoft cannot guarantee EU data is safe from US access requests.

Here’s the terminology you need to cut through the marketing. “EU-native” means EU-headquartered, EU-owned, and EU-law governed. “EU-based” means servers are in Europe — necessary but not sufficient. “EU-sovereign” is a marketing term with no consistent legal definition.

Hetzner, OVHcloud, Scaleway, and T-Systems all qualify as EU-native. None of them have a US parent company or CLOUD Act exposure. The differences between them come down to service capability, compliance certifications, and the type of workloads they suit best — not their fundamental legal structure.


How Does Hetzner Compare on Sovereignty, Pricing, and Service Capability?

Hetzner is German-headquartered with data centres in Germany and Finland. It’s 100% EU-owned, no foreign parent, zero CLOUD Act exposure.

Pricing is where it gets really interesting. An independent benchmark by Callista (February 2026) found Hetzner delivers approximately 14.3 times the value-per-compute-unit of AWS. A Hetzner CPX32 instance runs €16.36 per month versus €162.88 for an equivalent AWS instance — 10% of the price with 71% better multi-core performance. As the benchmark puts it: “The undisputed value winner. For 10% of the AWS price, you get 70% better multi-core performance.”

That laser focus on compute and storage is also its main limitation. The service catalogue is solid on compute, storage, and networking, but there are no native AI/ML services, limited serverless options, and no enterprise software integrations. Hetzner also doesn’t participate in GAIA-X — worth knowing if your procurement frameworks require it.

Best fit: SMB SaaS companies, developer teams, and cost-sensitive startups that need genuine EU legal protection without enterprise overhead. The value advantage over hyperscalers is unmatched in the EU-native tier.


Is OVHcloud Actually Safe for Sensitive European Data — and What Does the Canadian Court Order Mean?

OVHcloud is France’s largest EU-native cloud provider and the most full-stack alternative in the tier. It operates 30+ data centres across Europe, holds ISO 27001 and HDS certifications, is SecNumCloud-qualified, participates in GAIA-X, and offers a broad managed service catalogue including Kubernetes, databases, and AI.

But here’s something you need to know before you commit.

In September 2024, the Ontario Court of Justice ordered OVHcloud to hand over user data stored in France, Great Britain, and Australia to Canadian police. The ruling held that OVHcloud’s “virtual presence” in Canada subjected it to Canadian jurisdiction regardless of where the data was stored. This was widely reported in November 2025.

OVHcloud invoked France’s blocking statute (loi de blocage), which prohibits French companies from disclosing sensitive data to foreign authorities outside official channels. The court rejected it. Complying would expose OVHcloud’s French executives to six months in prison and €90,000 fines per violation. OVHcloud appealed; France confirmed direct disclosure would be illegal and offered expedited processing through MLAT channels.

What this means: A non-EU court can direct orders at a foreign subsidiary the provider controls in its jurisdiction — targeting the local entity, not the EU parent, and thereby bypassing GDPR Article 48. This wasn’t a systemic failure, and it involved OVHcloud’s Canadian entity specifically. But it proves that EU-native providers with non-EU entities carry cross-jurisdictional legal risk. Check the full corporate structure — not just data centre locations — before assuming complete jurisdictional immunity.

Best fit: FinTech and HealthTech companies requiring compliance certifications (ISO 27001, HDS, SecNumCloud) and a broader managed service catalogue. Also the right call for EU-wide enterprise deployments needing the broadest European data centre footprint.


How Does Scaleway Compare for Developer-Centric Sovereignty?

Scaleway is Paris-headquartered, a subsidiary of Iliad Group (French telecom), with data centres exclusively in Paris, Amsterdam, and Warsaw. 100% EU-owned, no CLOUD Act exposure.

It sits between Hetzner’s pure cost efficiency and OVHcloud’s enterprise breadth — and it has the strongest developer experience in the tier. The catalogue covers managed Kubernetes, GPU-powered instances, serverless functions, managed databases, and AI compute. An independent benchmark (Callista, February 2026) found Scaleway delivers approximately 4.8x the value per euro of AWS — double the single-core performance for a quarter of the price, with free egress where AWS charges €15.20 per 200GB.

GPU infrastructure is Scaleway’s key differentiator. If you’re running inference or training workloads that would otherwise require hyperscaler GPU instances, you can do that here with full EU legal protection. Hetzner doesn’t offer equivalent options.

One thing worth knowing: a 2025 Xomnia analysis noted Scaleway uses some US-based services for management console infrastructure. This doesn’t affect data centre operations or storage jurisdiction, but if you have extremely strict operational sovereignty requirements, evaluate the full stack before committing.

Scaleway participates in GAIA-X and is working toward SecNumCloud alignment.

Best fit: SMB SaaS startups and development teams prioritising developer experience alongside sovereignty, particularly those with GPU or AI compute needs.


What Makes T-Systems and Open Telekom Cloud Different from Other EU-Native Providers?

Where Hetzner and Scaleway target cost-sensitive developer teams, T-Systems is the EU-native option for the other end of the market — regulated industries, government agencies, and large enterprise.

T-Systems is the parent entity; Open Telekom Cloud (OTC) is the actual cloud platform, backed by Deutsche Telekom — one of Europe’s largest telecoms. Open Telekom Cloud runs on OpenStack with high-availability zones in Germany and the Netherlands. It holds BSI C5 certification — the mandatory German federal attestation for government agency cloud use — alongside ISO 27001 and GAIA-X participation.

One distinction you need to get right: Delos Cloud is a sovereign GCP stack for the German market where T-Systems controls the operational layer. It uses Google Cloud technology under T-Systems management — that’s Guardrail Sovereign positioning, not Full EU Isolation. Open Telekom Cloud is fully EU-native. They’re distinct products; evaluate them separately.

For German federal workloads, BSI C5 is a mandatory procurement requirement. T-Systems and StackIT (Schwarz Group / Lidl/Kaufland) are the primary EU-native providers at this level. AWS’s European Sovereign Cloud also holds BSI C5 — but retains CLOUD Act exposure through its US parent.

T-Systems is enterprise-oriented in pricing and support. Not the cost-efficient choice for SMBs, but absolutely the right choice where compliance depth and institutional backing are non-negotiable.

Best fit: Government, healthcare, automotive, financial services, and large enterprise workloads, particularly in the German market.


When Does GAIA-X Certification Actually Matter — and What Are Its Limits?

GAIA-X is the European Commission-backed federated cloud ecosystem promoting data portability, transparency, and interoperability. Europe contributes nearly 25% of global cloud revenues but owns less than 2% of cloud infrastructure — GAIA-X was designed to address exactly that imbalance.

Here’s the catch though: AWS, Azure, and Google Cloud are all GAIA-X members, right alongside OVHcloud, Scaleway, and T-Systems. GAIA-X membership does not mean Full EU Isolation. CISPE, the EU cloud trade association, has called this a “Trojan horse” — US hyperscaler membership dilutes the sovereignty signal the framework was supposed to provide. GAIA-X labels certify interoperability, transparency, and data portability. They say nothing about ownership structure or CLOUD Act exposure.

EuroStack is the newer EU initiative responding to GAIA-X’s structural compromise — more focused on AI regulation, blockchain identity, and genuinely sovereign alternatives built without US hyperscaler co-authorship.

GAIA-X certification does have practical value when procurement policy or regulatory guidance requires it — a compliance checkbox in certain public-sector frameworks. But it’s not a substitute for verifying whether a provider is genuinely EU-native.


How Do You Choose the Right EU-Native Provider for Your Workloads?

The right choice depends on your company profile, regulatory requirements, service capability, and budget. There’s no single best EU-native provider — it depends on what you’re actually running.

SMB SaaS: Hetzner or Scaleway. Hetzner for maximum cost savings on compute and storage — 14.3x value over AWS is the strongest advantage in the market. Scaleway for GPU infrastructure, managed Kubernetes, and developer tooling.

FinTech and HealthTech: OVHcloud or T-Systems. OVHcloud’s ISO 27001, HDS, and SecNumCloud profile covers HealthTech requirements with the broadest EU footprint. T-Systems suits organisations operating primarily in Germany or requiring BSI C5.

German federal workloads: T-Systems or StackIT. BSI C5 is mandatory. AWS ESC holds it too, but retains CLOUD Act exposure — for full EU isolation plus BSI C5, T-Systems and StackIT are your options.

EU-wide enterprise scale: OVHcloud, with 30+ European data centres and the most comprehensive managed service catalogue in the tier.

Service tradeoffs to plan for: EU-native providers offer genuine legal protection, but their managed service catalogues are narrower than AWS or Azure. The gaps are real: no native AI/ML platform equivalent to SageMaker or Vertex AI; narrower serverless orchestration; smaller global edge networks. Expect to self-manage Redis, use GitHub or GitLab for CI/CD, and potentially run Kafka clusters independently.

The broader EU-native landscape also includes Aruba Cloud (Italian-origin, SME-focused, data centres in Italy, France, Germany, and Czech Republic), StackIT (Schwarz Group, Germany-based with BSI C5 and full EU isolation), and Exoscale (Swiss-based with no CLOUD Act exposure but governed by Swiss law — for GDPR, NIS2, or DORA compliance, an EU-headquartered provider is the cleaner path).

To give you a sense of scale: Airbus issued a €50 million, decade-long EU-native cloud migration tender. Catherine Jestin, EVP Digital: “We want to ensure this information remains under European control.” EU-native adoption isn’t a niche choice — the most sovereignty-sensitive enterprises at scale are making the same move.

For regulatory requirements driving provider selection, see when DORA, NIS2, or SecNumCloud mandate full EU isolation. For the full due diligence framework, see how to evaluate and select an EU-native provider. Return to our sovereign cloud hub for the complete sovereignty framework.


Frequently Asked Questions

What does “EU-native cloud provider” mean?

An EU-native cloud provider is 100% EU-owned, EU-operated, and governed exclusively by EU law, with no foreign parent company subject to the US CLOUD Act. This is the Full EU Isolation Model — the strongest legal protection tier for European data. EU data residency (servers in the EU) is necessary but not sufficient for genuine sovereignty.

Is Hetzner cheaper than AWS for cloud computing in Europe?

Yes, significantly. Independent benchmarks (Callista, February 2026) show Hetzner delivers approximately 14.3 times the value-per-compute-unit of AWS. The pricing difference is structural, not marginal.

What happened with the OVHcloud Canadian court order, and does it mean EU-native providers are not safe?

The Ontario Court of Justice ordered OVHcloud to hand over data stored in France because OVHcloud has a Canadian entity. The court rejected France’s blocking statute defence. This is a procedural case involving a non-EU entity, not a systemic failure — but verify any provider’s full corporate structure before assuming complete jurisdictional immunity. See the OVHcloud section above for full detail.

What is GAIA-X and does it guarantee cloud sovereignty?

GAIA-X promotes interoperability and transparency standards — it does not guarantee sovereignty. AWS, Azure, and Google Cloud are all GAIA-X members alongside EU-native providers. A GAIA-X label indicates data portability compliance, not Full EU Isolation. CISPE has described US hyperscaler membership as a “Trojan horse” diluting the framework’s intended sovereignty signal.

What is BSI C5 and why does it matter for German cloud compliance?

BSI C5 (Cloud Computing Compliance Criteria Catalogue) is the mandatory German federal attestation for government-use cloud services. T-Systems (Open Telekom Cloud) and StackIT are the primary EU-native providers certified at this level. AWS ESC has also achieved BSI C5 but retains CLOUD Act exposure through its US parent company.

What managed services will I lose switching from AWS to an EU-native provider?

The main gaps: no native AI/ML platform (no SageMaker or Vertex AI equivalent), narrower serverless orchestration, fewer managed database variants, and smaller global edge networks. Expect to self-manage Redis, use GitHub or GitLab for CI/CD, and potentially run Kafka clusters independently.

What is the Delos Cloud and how does it relate to T-Systems?

Delos Cloud is a sovereign GCP stack for the German market where T-Systems controls the operational layer — Google Cloud technology under T-Systems management, making it Guardrail Sovereign positioning, not Full EU Isolation. T-Systems’ own Open Telekom Cloud is fully EU-native. These are distinct products that must be evaluated separately against sovereignty requirements.