Insights Business| SaaS| Technology What You Need to Know About Open AI Supply-Chain Licensing Risk
Business
|
SaaS
|
Technology
Mar 2, 2026

What You Need to Know About Open AI Supply-Chain Licensing Risk

AUTHOR

James A. Wondrasek James A. Wondrasek
Comprehensive guide to open AI supply-chain licensing risk

You’ve picked a model from Hugging Face. The metadata says MIT. Your team integrates it, ships the feature, and moves on. Except that MIT tag is a metadata field, not a legal instrument. A February 2026 audit of 124,278 AI supply chains found that 95.8% of models on Hugging Face are missing the licence text, copyright notice, and attribution records required to make that tag enforceable. Without those three components, the model defaults to all rights reserved.

This page maps the full landscape and links to the articles that cover each topic in depth.

What is permissive-washing in AI?

Permissive-washing is the practice of labelling an AI artefact with a permissive licence tag — MIT, Apache 2.0, BSD-3-Clause — while omitting the documentation that makes that label enforceable. The tag is metadata. The actual grant requires the full licence text, a copyright notice, and upstream attribution records. Without all three, the artefact reverts to default copyright.

Research by Jewitt et al. coined the term after auditing 124,278 supply chains. They found 96.5% of datasets and 95.8% of models lack the required licence text. Only 3.2% of models satisfy both text and copyright requirements — the bare minimum for a valid grant.

For the full explanation, read Permissive-Washing in AI Explained — Why Open Source Labels Cannot Be Trusted.

Why does a permissive label not guarantee the right to use a model?

A licence tag on a Hugging Face model card is a metadata field, not a signed legal instrument. For a permissive licence like MIT to be enforceable, the repository must also include the full licence text, a copyright notice identifying the rights holder, and attribution records for any upstream components incorporated into the model or its training data. If any of these are absent, the licence grant is legally incomplete and the underlying copyright applies.

Hugging Face does not currently enforce that a matching licence file exists before a tag is applied. Developers moving quickly on model selection routinely rely on the repository metadata tag without inspecting the actual licence file — if one exists at all. The consequence is not a grey area: without the required documentation, reuse of the artefact is legally equivalent to reproducing a fully copyrighted work without permission.

For the full treatment, read Permissive-Washing in AI Explained — Why Open Source Labels Cannot Be Trusted.

How does licence risk compound across the AI supply chain?

The AI supply chain has three layers: training datasets, models, and applications. Licence obligations flow upward — if a dataset’s compliance documentation is missing, every model trained on it inherits that gap, and every application deploying that model inherits it again.

Attribution is where this breaks down. Only 27.59% of models preserve compliant dataset notices. Only 5.75% of applications preserve compliant model notices. And 76% of companies that prohibit AI coding tools acknowledge their developers use them anyway — ungoverned adoption bypasses whatever verification your workflow includes. For the full structural analysis, read How AI Licence Risk Compounds Across Your Dataset Model Application Stack.

What is an AI Bill of Materials and why do you need one?

An AI Bill of Materials (AI-BOM) extends the familiar Software Bill of Materials (SBOM) to cover artefacts that standard formats miss: model weights, training data sources, fine-tuning history, and the compliance documentation at each layer. SPDX 3.0 and CycloneDX have both added AI-specific profiles, and the EU Cyber Resilience Act‘s SBOM mandate is accelerating enterprise adoption.

Only 54% of organisations currently evaluate AI-generated code for IP and licensing risks — an AI-BOM process directly closes that gap. For the governance framework and procurement checklist, read What an AI Bill of Materials Is and What to Demand From Vendors.

Which open-weight models are actually safe for commercial use?

None of the four dominant open-weight families — LLaMA, Mistral, DeepSeek, and Qwen — carry a straightforwardly permissive licence. LLaMA 2 blocks use above 700 million monthly active users. Qwen, with over 113,000 derivatives on Hugging Face, carries Alibaba Cloud commercial restrictions in some versions. DeepSeek’s MIT tag does not substitute for reviewing the actual licence file.

Open-weight means downloadable weights. It does not mean disclosed training data or unencumbered rights. Treat licence selection as a procurement variable with the same weight as capability benchmarks. For the side-by-side comparison, read Llama Mistral DeepSeek and Qwen Licence Terms Compared for Commercial Use.

What are the main AI licensing misconceptions that create risk?

Three misconceptions cause the most damage. First, that a metadata licence tag constitutes a valid licence grant — Hugging Face does not validate tags against actual licence files. Second, that open-weight models are legally equivalent to open-source software. Third, that the EU AI Act open-source exemption covers any model labelled as open — it requires a genuinely open licence, public parameters, and no monetisation.

For the foundational explainer, read Permissive-Washing in AI Explained — Why Open Source Labels Cannot Be Trusted.

How do you add AI licence compliance to your engineering workflow?

AI licence compliance slots into your existing DevSecOps workflow as an extension of software composition analysis (SCA). The additions: model card review before adoption, AI-BOM generation, snippet scanning for AI-generated code, and licence validation gates in CI/CD.

The 2026 Black Duck OSSRA report found 68% of audited codebases contain licence conflicts — partly driven by AI coding assistants generating snippets from copyleft sources without the original licence information. For the implementation playbook, read Adding AI Licence Compliance to Your Existing Engineering Workflow.

What do the EU AI Act and Cyber Resilience Act require for AI supply chains?

Both regulations apply if you use AI components from open repositories and sell into the EU market. The AI Act’s GPAI obligations — training data summaries and copyright compliance policies — have applied since August 2025. The CRA’s SBOM mandate becomes binding in 2027.

The AI Act’s open-source exemption is narrower than most people assume — it requires a genuinely open licence, public parameters, and no monetisation. LLaMA 3’s commercial terms likely disqualify it in many contexts. For the full regulatory translation, read EU AI Act and Cyber Resilience Act Supply Chain Obligations Explained.

Resource hub: AI supply-chain licensing library

Understanding the problem

Making procurement decisions

Implementing governance

FAQ

What is the difference between an open-weight AI model and an open-source AI model?

An open-weight model makes its trained parameters publicly downloadable but may not disclose training data, full methodology, or carry a licence that grants unrestricted use. An open-source AI model — in the strictest interpretation — discloses weights, architecture, training data, and grants clear rights to use, modify, and distribute. The practical consequence: open-weight does not mean legally unencumbered. Before commercial deployment, the actual licence text must be reviewed, not just the repository metadata tag. The model-by-model licence comparison shows exactly where each major family falls on this spectrum.

Can my legal team rely on the Hugging Face metadata tag to confirm a model’s licence status?

No. Hugging Face metadata tags are self-reported by the model publisher and are not validated against an actual licence file. Research from arXiv:2602.08816 found that 95.8% of models on Hugging Face are missing the required licence text. A licence verification workflow must check for the presence of a complete licence file, a copyright notice, and upstream attribution records — not just the metadata tag. The open source label trust guide explains why the tag gap exists and what a compliant licence payload requires.

What happens legally if we deploy a model whose licence documentation is incomplete?

If the compliance payload (licence text, copyright notice, and attribution records) is absent, the licence grant is legally incomplete. The artefact’s effective status reverts to the underlying copyright, which means reuse is legally equivalent to reproducing a fully copyrighted work without permission. This creates exposure to claims for licence violation or copyright infringement, depending on jurisdiction and the rights holder’s willingness to enforce. Understanding how that risk compounds through your dataset, model, and application stack is essential before evaluating your real exposure.

Is fine-tuning a model on our own data enough to clear the upstream licence obligations?

No. Fine-tuning creates a derivative work that inherits the licence obligations of the upstream model. If the upstream model carries incomplete or restricted licensing, those obligations carry through to your fine-tuned version. The EU AI Act provides a partial boundary condition: if your fine-tuning compute exceeds one-third of the original training compute, you may be treated as a GPAI provider with the associated documentation obligations. The EU AI Act and CRA supply chain obligations guide translates these thresholds into practical procurement terms.

Does the EU AI Act open-source exemption cover LLaMA 3 or DeepSeek?

Not in most commercial deployment contexts. The EU AI Act open-source exemption applies to models that are genuinely open, non-monetised, and carry complete licence documentation. LLaMA 3 carries Meta’s commercial use terms — not a standard permissive licence — which may disqualify it from the exemption when used in monetised applications. DeepSeek’s MIT tag does not substitute for a review of the actual licence file and training data provenance. Exemption eligibility is model-specific and use-case-specific. See the complete open-weight model licence comparison for a per-family exemption eligibility breakdown.

What is the quickest check I can perform before integrating a model from Hugging Face?

Check whether a licence file exists in the repository root — not just the metadata tag — and that it contains the full licence text, a copyright notice, and any attribution requirements. Review the model card for training data disclosure. If either the licence file or training data disclosure is missing, treat the model as requiring legal review before integration into any production system. For a repeatable process, the AI licence compliance engineering workflow guide covers how to embed these checks into your existing CI/CD pipeline. For teams adopting models at scale, the AI-BOM procurement framework provides a vendor checklist that turns this verification into a structured requirement.

AUTHOR

James A. Wondrasek James A. Wondrasek

SHARE ARTICLE

Share
Copy Link

Related Articles

Need a reliable team to help achieve your software goals?

Drop us a line! We'd love to discuss your project.

Offices Dots
Offices

BUSINESS HOURS

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Monday - Friday
9 AM - 9 PM (Sydney Time)
9 AM - 5 PM (Yogyakarta Time)

Sydney

SYDNEY

55 Pyrmont Bridge Road
Pyrmont, NSW, 2009
Australia

55 Pyrmont Bridge Road, Pyrmont, NSW, 2009, Australia

+61 2-8123-0997

Yogyakarta

YOGYAKARTA

Unit A & B
Jl. Prof. Herman Yohanes No.1125, Terban, Gondokusuman, Yogyakarta,
Daerah Istimewa Yogyakarta 55223
Indonesia

Unit A & B Jl. Prof. Herman Yohanes No.1125, Yogyakarta, Daerah Istimewa Yogyakarta 55223, Indonesia

+62 274-4539660
Bandung

BANDUNG

JL. Banda No. 30
Bandung 40115
Indonesia

JL. Banda No. 30, Bandung 40115, Indonesia

+62 858-6514-9577

Subscribe to our newsletter