Every C2PA signing pipeline follows the same four stages regardless of what stack you’re running: hash the content, build the Manifest, sign the Claim, embed the Manifest Store. Two published reference implementations now exist for media organisations — ARD‘s fully serverless AWS pipeline (Lambda + KMS + S3) and CBC/Radio-Canada‘s dual-compute architecture (Lambda + Fargate + KMS + MediaConvert). Both are open source.
This article walks through the architecture patterns, key management decisions, SDK selection, certificate procurement, credential preservation through transcoding, live streaming provenance (C2PA 2.3), and cost economics at SMB scale. We’re assuming you already know what C2PA is and how the infrastructure works. We’ll open with pipeline stages, not definitions.
Every C2PA implementation runs the same four stages in sequence. The stack changes; the stages don’t.
Stage 1 — Prepare and hash. Read the asset bytes and compute a cryptographic hash. This hash is what binds the manifest to the specific file. Change a single bit and the hash changes, breaking the binding.
Stage 2 — Build the C2PA Manifest. C2PA information is a series of assertions — statements about the asset. These are wrapped into a digitally signed entity called a claim. The manifest definition is a JSON template specifying which assertions to include: creation timestamp, creator identity, AI usage flags, action history.
Stage 3 — Sign the Claim. The claim is signed using COSE (CBOR Object Signing and Encryption) and the signer’s private key, from a Certificate Authority enrolled in the C2PA Trust List. In a well-designed cloud pipeline the private key never touches application code or Lambda memory — it gets delegated via the CallbackSigner pattern to AWS KMS. More on that in the Key Management section below.
Stage 4 — Embed the Manifest Store. The C2PA Manifest Store — a collection of one or more manifests — is embedded in the output asset or produced as a sidecar file.
The Claim Generator is the software component that runs stages 2 and 3. In cloud architectures that is typically a Lambda function or containerised microservice running c2pa-rs (the open-source Rust SDK from the Content Authenticity Initiative), c2pa-node-v2 (its Node.js bindings for TypeScript Lambda functions), or c2pa-python for ML and AI pipeline environments.
Two binding strategies affect how your pipeline handles transcoding. Hard binding embeds a cryptographic hash of the exact asset bytes in the manifest — the default. Soft binding uses watermarking or fingerprinting for assets that will be transcoded, embedding the credential in the content signal rather than the file metadata. For a complete overview of C2PA’s trust model and ecosystem components, see what C2PA is and how the infrastructure works.
ARD (Arbeitsgemeinschaft der öffentlich-rechtlichen Rundfunkanstalten) built the first fully serverless C2PA signing pipeline on AWS. WDR’s Streaming Architect Martin Grohme published the pattern via the AWS Media & Entertainment blog. It demonstrates signing at the source with minimal operational overhead.
The ARD pipeline works like this:
One key ARD design decision worth noting: they use AWS KMS for private key storage rather than Secrets Manager. The certificate chain (PEM format) lives in Secrets Manager, referenced by ARN — keeping certificate storage separate from key storage. This is what makes zero-downtime certificate rotation possible. The ARD code (AWS SAM, MIT licence) is at github.com/ARD-C2PA-SAMPLES/c2pa_signfrag_awslambdakms.
CBC/Radio-Canada extended the ARD pattern with a dual-compute architecture that routes signing jobs via an Application Load Balancer to either Lambda or Fargate:
The CBC solution also includes AWS Elemental MediaConvert native C2PA support — the simplest path for teams already using MediaConvert for transcoding. No signing code required in Lambda or Fargate on this path. Certificates sit in Secrets Manager and the signing key is referenced by KMS ARN in the job definition.
The CBC reference repository — github.com/aws-solutions-library-samples/guidance-for-media-provenance-with-c2pa-on-aws — includes AWS CDK deployment scripts and both signing paths. It is a prototype for experimentation, not a production-ready system.
Key management is the architectural decision in a C2PA signing pipeline. The private key must never reside in application code, Lambda memory, or container storage. That is not a recommendation — it is a C2PA conformance requirement.
AWS KMS is the cloud-native choice. It uses FIPS 140-2 Level 2 validated hardware security modules, logs all key usage in AWS CloudTrail, and performs asymmetric signing without exposing private key material. Both the ARD and CBC reference implementations use it.
The CallbackSigner in c2pa-node-v2 is how you connect your Claim Generator to KMS. The callback receives data bytes, signs via KMS, and returns the signature. The private key never reaches application code. See the FAQ below for the full pattern.
The certificate chain is stored separately in AWS Secrets Manager, referenced by ARN. The MediaConvert equivalent references both stores by ARN in the job definition — neither key nor certificate touches application code.
Hardware HSM (AWS CloudHSM) provides FIPS 140-2 Level 3 key storage for organisations with regulatory requirements mandating physical key control. For most cloud-native pipelines, AWS KMS is sufficient.
The Nikon incident is worth knowing about. In September 2025, a researcher demonstrated that Nikon’s C2PA-enabled cameras could fraudulently sign content via a firmware vulnerability. A proof of concept signed an AI-generated image with a valid C2PA certificate despite having zero photographic provenance. Nikon revoked every C2PA certificate it had ever issued, invalidating every previously signed asset.
Key compromise means full certificate revocation. Design your incident response before deployment:
Once you have your key management architecture sorted, the next step is getting the certificates that let you sign content in the first place.
For identity assertion privacy design, see privacy design considerations when implementing identity assertions.
Production C2PA signing requires a certificate from a CA enrolled in the official C2PA Trust List. This is the step most developers underestimate when planning a timeline.
The C2PA Conformance Programme is the gateway — a risk-based governance process holding Claim Generator products, validators, and Certificate Authorities accountable to the Content Credentials specification. Three phases:
spec.c2pa.org/conformance-explorer/.The distinction between the Interim Trust List (ITL) and the official C2PA Trust List (TL) matters in practice. As of January 1, 2026 the ITL is frozen — no new entries, no updates, no new certificates. Existing ITL certificates remain valid for legacy support until they expire. New implementations must target the official Trust List.
If you implement against ITL certificates, verifiers using the official Trust List model will not recognise your signed content. That is a hard operational constraint.
Plan for weeks to months from implementation decision to first signed production asset, depending on CA processing time and organisational readiness. Build that lead time into your project plan before announcing implementation dates.
After obtaining a certificate, see verifying your implementation’s trust posture against the C2PA Trust List for ongoing trust validation guidance.
Standard transcoding strips C2PA manifests. The manifest contains a cryptographic hash of the exact asset bytes — transcode changes those bytes, the hash no longer matches, the credential is invalid. This is a frequent operational failure mode in media pipelines.
Three ways to handle it:
Path 1: Use AWS Elemental MediaConvert’s native C2PA support. For MP4, DASH, and CMAF HLS outputs, MediaConvert handles credential embedding as part of the transcode job. This is the simplest path for AWS-centric workflows — no signing code required in Lambda or Fargate.
Path 2: Re-sign after transcoding with an ingredient relationship. For custom pipelines, position the signing Lambda or container after the transcode stage. Treat the transcoded output as a new asset that inherits provenance via a parent manifest reference. In c2pa-node-v2, use the 'edit' Builder intent with addIngredientFromReader(sourceReader) and relationship: 'parentOf'. This creates a linked provenance chain — a verifier can trace back through the ingredient relationship to the original signed asset.
Path 3: Soft binding for distribution channels that strip metadata. Social platforms and many CDN pipelines strip embedded file metadata on upload. Soft binding embeds the credential in the content signal via watermarking or fingerprinting, surviving format conversion and metadata stripping. Use it as a complementary strategy for distribution channels outside your direct control.
For more on making credentials durable through distribution, see adding watermarking and fingerprinting to make credentials durable.
C2PA 2.3 records AI-edited content through edit assertions — each C2PA-aware tool adds an assertion recording what action was performed, by which tool, and when. The Manifest Store accumulates a linked chain of manifests representing the full provenance history.
When the chain survives. Several major tools already maintain the chain in practice. Adobe Lightroom records c2pa.color_adjustments, c2pa.cropped, and other actions. ChatGPT-generated images carry DigitalSourceType: trainedAlgorithmicMedia. Google Pixel 10 Magic Eraser outputs carry compositeWithTrainedAlgorithmicMedia. The chain holds as long as every tool passes the manifest forward.
The EU AI Act, effective August 2026, requires machine-readable disclosure labels for AI-generated content. C2PA’s DigitalSourceType assertions are the technical mechanism for that compliance.
When the chain breaks. Non-C2PA-aware tools strip or ignore the manifest. Social platforms strip all photo metadata on upload. Even Lightroom’s current implementation records AI-assisted edits only as “Color or Exposure” or “Cropping” — the AI involvement is not disclosed. Implementation-dependent disclosure is a real constraint.
For pipeline architects: treat every AI editing stage as a potential chain-break point. For stages lacking native C2PA support, design a re-signing step that creates a new manifest referencing the prior one as an ingredient. The chain is technically broken at the non-aware tool, but the downstream provenance record is preserved.
C2PA 2.3 (December 2025) introduces segment-level signing for live streaming. As Irdeto noted in their January 2026 analysis, “the bump from C2PA 2.2 to 2.3 may suggest only a small adjustment. For the video ecosystem, however, the latest version introduces a major capability: support for live streaming.”
The architectural difference from VOD is fundamental. VOD uses a Merkle tree — the originator computes the tree once the complete asset is known. For live streaming, the complete asset is never known in advance. Signing occurs per segment.
How CMAF segment-level signing works:
C2PA 2.3 uses the Verifiable Segment Info method. Each live segment includes a small Event Message Box (emsg) carrying the segment’s signature and position within the track. Session keys — a new class of asymmetric intermediate keys — generate per-segment signatures and can be rotated frequently across tracks. Compromise of a session key affects only a bounded window of segments.
Compatibility is straightforward: both HLS and DASH use CMAF segments as the underlying transport format. No changes to M3U8 or MPD manifests, no codec changes, compatible with DRM and CDN infrastructure, and safely ignored by non-C2PA-aware players.
The real complexity is integration with existing DRM key management systems. The Irdeto analysis details the interaction between DRM key management and C2PA session key management as the primary non-trivial integration challenge.
Delayed/asynchronous signing is available for workflows where real-time signing would introduce unacceptable latency — capture first, sign post-capture before distribution.
Current maturity: there are limited production deployments as of early 2026. Sony demonstrated C2PA-enabled professional video cameras at IBC 2025. For broadcast and streaming platforms the emphasis right now is preparedness, not immediate production deployment.
At 10,000 signing operations per day, your total cloud infrastructure cost runs approximately US$35–40/month.
Here is what makes up that number:
SDK costs: zero. c2pa-rs and c2pa-node-v2 are MIT licenced.
AWS KMS: approximately US$1.00 per 10,000 asymmetric signing requests. At 10,000 operations/day that comes to roughly US$30/month.
AWS Lambda: approximately US$3–5/month for 10,000 invocations/day at around 500ms per invocation.
AWS Elemental MediaConvert native signing: no incremental cost beyond standard transcode pricing.
Amazon S3 and Secrets Manager: approximately US$2–4/month for storage and retrieval at this scale.
Certificate Authority costs: not publicly documented by either DigiCert or SSL.com. Engage directly with your Trust List CA for a quote.
Total infrastructure at 10,000 images/day: approximately US$35–40/month, plus one-time certificate procurement costs.
The ARD team summed it up well: “confirming content provenance doesn’t require massive infrastructure investments… the solution is both scalable and economically viable, making it accessible to broadcasters and content providers of all sizes.”
The dominant cost is engineering time. Infrastructure at this volume is a rounding error. Budget for the build, not the running.
c2pa-rs (Rust) is the reference SDK for performance-critical pipelines and direct Lambda deployment. c2pa-node-v2 provides Node.js bindings via Neon for TypeScript Lambda functions — precompiled binaries available for Linux x86_64, Linux aarch64, macOS, and Windows. c2pa-python is available for ML and AI pipeline environments. c2patool is for testing and validation only, not production signing. AWS Elemental MediaConvert offers a zero-code path for teams already using it for transcoding.
A CallbackSigner delegates the signing operation to an external function — typically an AWS KMS SignAsync API call. The callback receives data bytes, signs via KMS, and returns the signature. The private key never leaves KMS. This is the standard cloud key management integration pattern in C2PA signing libraries.
The ITL was frozen on January 1, 2026 — no new entries, no updates, no new certificates. Existing ITL certificates remain valid for legacy support until expiry. New implementations must target the official Trust List via the C2PA Conformance Programme. Using an ITL certificate means verifiers using the official Trust List model will not recognise your signed content.
Standard MP4 signing hashes the entire file as a single unit. fMP4 signing requires computing a hash per fragment and storing all segment hashes in the manifest. During playback, the validator recomputes hashes per segment — a mismatch on any segment fails validation. All validation occurs during playback and must complete without introducing latency.
Store your certificate chain in AWS Secrets Manager referenced by ARN. Upload the new certificate as a new Secrets Manager version when the current certificate nears expiry. Lambda functions and MediaConvert jobs reference the ARN — not the certificate content — so no redeployment is required.
At minimum: creation timestamp, generating tool identity, and a DigitalSourceType assertion — trainedAlgorithmicMedia for wholly AI-generated content, compositeWithTrainedAlgorithmicMedia for composites. The EU AI Act (August 2026) requires machine-readable disclosure labels for AI-generated content. Include action history if the content passed through multiple processing stages.
The chain breaks. Non-C2PA-aware tools strip or ignore the manifest. The content must be re-signed as a new origin asset downstream. Design your pipeline to re-sign after any stage lacking native C2PA support, creating a new manifest that references the prior manifest as an ingredient.
Yes. MediaConvert supports C2PA manifest embedding for progressive MP4, DASH, and CMAF HLS outputs. Enable it with C2paManifest: "INCLUDE" in Mp4Settings. Certificates are stored in Secrets Manager; the signing key is referenced by KMS ARN in the job definition.
For a full overview of what C2PA is and how content provenance infrastructure works — including the trust model, ecosystem governance, and the other implementation considerations in this series — see the complete C2PA content provenance guide.
Durable Content Credentials How Provenance Survives Metadata StrippingC2PA content credentials are cryptographically tamper-evident. But tamper-evidence is not the same as persistence. Almost every major social platform strips embedded file metadata on upload — including C2PA manifests. And this is not a bug. It is a structural consequence of how upload pipelines work: recompression, format conversion, thumbnail generation. It is unlikely to change.
The architectural response to this is Durable Content Credentials — a three-pillar approach combining hard binding, invisible watermarking, and perceptual fingerprinting so provenance can survive stripping and be recovered. Described by Collomosse et al. in IEEE Computer Graphics and Applications (2024), it is the canonical technical reference for resilient provenance design. For foundational context, the article on content provenance infrastructure and C2PA covers the full C2PA architecture.
It is not targeted at provenance data. Social platforms strip all embedded metadata — EXIF, XMP, IPTC, and C2PA manifests — as a standard, automatic step in their upload pipeline. The C2PA manifest lives in the file container alongside EXIF and XMP, travels through the same stripping pipeline, and is discarded before content hits storage.
Which platforms strip, and which preserve?
Know this before you design anything.
Stripping: Facebook and Instagram, Twitter/X, WhatsApp. Preserving: LinkedIn displays a CR icon and lets users click through to a provenance summary; Cloudflare Images preserves credentials through CDN transformations; TikTok has a partial preservation pathway via its CAI partnership.
Behaviour can vary by upload type and file format, so test empirically. For a fuller breakdown, see the article on which platforms preserve or strip metadata in practice.
A Durable Content Credential is a credential for which at least one soft binding exists that enables its discovery in a manifest repository — even after a stripping pipeline has processed the file.
The architecture is built on three mutually reinforcing mechanisms.
Pillar 1 — Hard Binding (C2PA Manifest): The standard C2PA approach. A signed manifest embedded in the file container. It is the authoritative, tamper-evident record of provenance assertions and signer identity. It is also the pillar that metadata stripping destroys.
Pillar 2 — Soft Binding (Invisible Watermarking): An imperceptible identifier embedded into the image’s pixel data, not its file header. The watermark points to the full manifest stored in a cloud-based C2PA Manifest Store. Because it lives in the pixels rather than the container, it survives the recompression and format conversion that strips the manifest.
Pillar 3 — Perceptual Fingerprinting: A content-based hash derived from the image’s visual features, stored in the manifest at signing time. Stable across compression and resizing, it provides a second lookup mechanism and an anti-spoofing function — it prevents a valid watermark from being copied from one image to another.
The three pillars function as a system. Without the manifest (Pillar 1), there is no authoritative signed record to retrieve. Without the watermark (Pillar 2), a stripped image has no recovery path. Without the fingerprint (Pillar 3), a watermark can be copied from image A to image B, passing off a different image as having verified provenance.
The C2PA Manifest Store is the backend that makes Pillar 2 functional. When content is signed, the manifest is registered in the store. When a verifier encounters a stripped image, it extracts the watermark identifier and queries the store to retrieve the original manifest. Adobe’s Content Credentials Cloud is the reference implementation; self-hosted alternatives are also valid.
Invisible watermarking embeds a recoverable identifier into an image’s pixel values in a way that is imperceptible to the human eye and survives format conversion, compression, and resizing.
Worth stating clearly: a visible watermark is a translucent logo or text overlay. It is protective against casual copying but trivially defeated by cropping, and it carries no machine-readable data for credential recovery. For provenance recovery, invisible watermarks are the only viable approach.
TrustMark is Adobe’s open-source implementation, designed specifically for the C2PA Soft Binding use case. It embeds a compact identifier that, when decoded, queries the C2PA Manifest Store for the full provenance record. TrustMark is available on GitHub under the MIT licence — commercial use is permitted without royalties or attribution requirements. The watermarking algorithm itself costs nothing; a production deployment also requires a manifest store.
TrustMark survives platform processing because it distributes the encoded identifier across many pixels, making it robust to JPEG compression, format conversion, and moderate cropping. It does have a removal mode, so a determined adversary can strip it deliberately — and that is the attack vector Pillar 3 closes.
Two related systems worth distinguishing: Google SynthID marks AI-generated content for detectability, not provenance chain recovery. Digimarc is the enterprise-scale alternative, with demonstrated interoperability with TrustMark — a verifier can retrieve a manifest from either watermark type.
Perceptual fingerprinting generates a content-based hash from an image’s visual features. It is intentionally tolerant of minor transformations — resizing, recompression, format conversion — so the same image produces the same fingerprint even after platform processing. Unlike a cryptographic hash, which changes completely with any pixel-level change, a perceptual hash is stable across visually insignificant transformations.
The anti-spoofing function is Pillar 3’s primary value. Without it, Pillar 2 has a known attack vector: watermark copying. An adversary extracts a valid TrustMark from image A and embeds it in image B. The watermark resolves to the manifest store and returns valid credentials — for the wrong image. The fingerprint closes this. If the watermark was copied from a different image, the fingerprint will not match.
One real limitation: fingerprint lookup depends on the Manifest Store being online. Design for degraded-mode behaviour. Log that credentials were expected but unavailable rather than flagging content as fraudulent.
The absence of C2PA credentials tells a verifier exactly one thing: provenance was either never attached or has been stripped. It does not establish that content is inauthentic or AI-generated.
The reasons credentials may be absent are numerous — and mostly benign. Content pre-dates C2PA adoption. The capture device does not support C2PA signing. The image travelled through a stripping platform. Credentials were never added. In 2026, the majority of authentic images in circulation have no C2PA credentials. Treating absence as suspicion would generate an implausible false positive rate.
C2PA’s evidentiary value is asymmetric: the presence of valid credentials is meaningful; the absence is not. And even the presence only tells you who signed and what they stated — not whether what they stated is true. That is first-mile trust, and it is a structural limitation of any attestation-based system, not a flaw unique to C2PA.
AI detection classifiers and C2PA take fundamentally different approaches to authenticity. Neither can substitute for the other.
AI detection classifiers identify generated or manipulated content by analysing statistical patterns in the image itself. C2PA records and cryptographically attests to where content came from and how it was handled — a chain of custody, not a content analysis.
The “arms race” problem is the compelling reason provenance is architecturally more robust long-term. Classifiers are trained on known manipulation techniques; as generation models improve, classifiers become less accurate. Detection is perpetually reactive. C2PA is not — a manifest signed by a verified camera device cannot be retroactively forged by improving a generative model.
That said, C2PA requires a signing event to have occurred. The vast majority of AI-generated content in circulation was created without C2PA signing. In those cases, detection classifiers are the only available forensic tool. The two approaches are complementary: provenance for signed content; detection as a fallback for the rest.
Glass-to-glass provenance is the aspirational standard: a continuous, unbroken chain of signed C2PA actions from initial capture — the camera lens, “first glass” — through all edits, format conversions, and distribution steps, to final display on screen.
In practice, gaps are the norm. Upload to a stripping platform, passage through a non-C2PA-aware editing tool, re-encoding by a CDN, screenshot and repost — any of these break the chain. The three-pillar architecture exists because gaps are inevitable. It is designed for recovery, not prevention.
Glass-to-glass is achievable today in controlled professional contexts: broadcast journalism pipelines, legal evidence workflows, high-value commercial production. For social distribution, the realistic goal is glass-to-verified-origin — establish where content originated and who first signed it, and accept that the chain may have gaps downstream.
The building blocks are expanding. Google Pixel 10 provides consumer-level hardware C2PA signing; Cloudflare Images preserves credentials through CDN transformations; LinkedIn surfaces the CR icon; Photo Mechanic has confirmed C2PA support is in development.
For implementation detail, see the article on implementing the three-pillar durable credentials approach in a pipeline.
All file-container metadata: EXIF (camera model, GPS, timestamps), XMP (editing history, copyright), IPTC (caption, rights), and C2PA manifests. Pixel data is preserved but recompressed. The C2PA manifest travels through the same stripping pipeline as everything else.
Yes — but only if soft binding was applied before the strip event and the manifest was registered in a cloud manifest store. The watermark identifier survives in the pixels; a verifier extracts it, queries the manifest store, and retrieves the original manifest. Without soft binding, a stripped manifest is unrecoverable.
Yes. TrustMark is released under the MIT licence and is freely available on GitHub — commercial use permitted, no royalties required. A production deployment also requires a manifest store: either Adobe’s Content Credentials Cloud or a self-hosted equivalent.
A visible watermark is a translucent overlay — trivially defeated by cropping, and it carries no machine-readable data for credential recovery. An invisible watermark embeds data into pixel values imperceptibly and survives format conversion and compression. For provenance recovery, only invisible watermarks are useful.
TrustMark embeds a recoverable identifier for provenance chain recovery and links to a manifest store. SynthID marks AI-generated content for detectability — “this was made by a Google AI model” rather than “this was signed by a named party.” Complementary, not substitutes.
Preserving: LinkedIn, Cloudflare Images, TikTok (via CAI partnership). Stripping: Facebook/Meta (including Instagram), Twitter/X, WhatsApp. Test empirically — behaviour varies by upload type and file format.
Stripping can be intentional (obscuring origin) or incidental (platform processing for storage efficiency). The perceptual fingerprint in Pillar 3 detects whether a valid watermark has been moved from one image to another — spoofing, which is distinct from simple removal.
Blockchain stores the manifest in a distributed ledger, making it immune to metadata stripping. The tradeoff is query latency, ledger availability dependency, and the same “how do I link this image to that ledger entry?” problem that watermarking and fingerprinting solve. The C2PA approach and blockchain provenance are architecturally compatible.
No. C2PA signing proves who signed the content and what provenance assertions they made — not whether those assertions are true. C2PA provides transparency, not verification of accuracy.
Soft binding recovery fails. Perceptual fingerprint lookup also fails. Hard binding is unaffected for manifests that were never stripped. Log that credentials were expected but unavailable rather than flagging content as fraudulent.
A manifest can be fabricated, but it cannot be cryptographically signed with a certificate the fabricator does not hold. Forging requires compromising a legitimate signing key or obtaining fraudulent certificates from a CA on the C2PA Trust List. Certificate revocation — as demonstrated by Nikon’s 2025 revocation — is the mechanism for invalidating compromised certificates.
First-mile trust is the gap between what a signer asserts and what actually happened at capture time. C2PA can verify the signer’s identity, not the truthfulness of their assertions. If a camera operator signs a manifest asserting “photographed in Kyiv on 3 March 2026” but photographed a different location, the signature is valid — it attests a false claim. Combine C2PA with editorial verification and trusted-source accreditation.
For foundational context on C2PA fundamentals and content provenance infrastructure and for implementation guidance on building C2PA signing into a cloud media pipeline, the related articles in this series cover each in depth.
EU AI Act and Content Provenance Regulations Making C2PA Urgent in 2026August 2, 2026 is the date the EU AI Act’s Article 50 transparency obligations become enforceable. If your business generates or distributes AI content to EU markets, you are already behind. Penalties for transparency violations start at €7.5 million or 1.5% of global turnover.
C2PA is not explicitly named in the EU AI Act. But it is the most technically mature pathway to satisfying the regulation’s machine-readable content labelling requirement. This article maps which regulations create real urgency in 2026, what C2PA implementation actually delivers, and where separate legal and compliance work still needs to happen. For a foundational overview of the C2PA ecosystem and how content provenance infrastructure works, see our complete C2PA and content provenance guide.
This article provides strategic context, not legal advice. Assess your specific compliance obligations with qualified legal counsel.
Article 50 of the EU AI Act requires providers and deployers of AI systems generating synthetic content to implement machine-readable marking before placing systems on the EU market. The obligation applies regardless of where you’re headquartered — if EU users can access your AI-generated content, you are likely in scope.
The EU Code of Practice on AI-Generated Content specifies a multilayer approach: visible disclosures, machine-readable metadata manifests, invisible watermarking, and content fingerprinting. C2PA addresses the metadata manifest layer. The remaining layers require separate implementation.
There are three obligation tiers depending on where you sit in the value chain. AI model providers carry the highest burden. GPAI system providers — companies integrating third-party models via API into user-facing products — carry compliance obligations that many organisations have not yet assessed. AI deployers face lighter technical requirements but must still make transparency disclosures.
Here’s where it gets important. Many companies building on OpenAI, Anthropic, or Google DeepMind APIs fall into the GPAI system provider category. If you’ve assumed you’re a deployer, that assumption is worth checking carefully. Misclassifying carries fines up to €15 million or 3% of global revenue.
EU Member State market surveillance authorities will inspect compliance documentation, marking evidence, and robustness testing records. Signing the EU Code of Practice demonstrates good faith but does not substitute for your own compliance programme.
C2PA satisfies the metadata manifest layer of the multilayer compliance requirement. It produces cryptographically signed, tamper-evident provenance records and machine-readable AI-generated content flags via Content Credentials. Specifically, C2PA delivers three things Article 50 requires: auditable provenance records with a cryptographic chain of custody; tamper-evident origin assertions that survive inspection; and machine-readable AI-generated content flags that meet the marking specification.
The C2PA Conformance Programme is the compliance artefact. A conformance-certified implementation gives you documented evidence of standard adherence that an auditor can verify. The programme is in early enrolment as of 2026 — companies entering now get certified before demand surges as August approaches.
C2PA is not sufficient on its own, though. The Code of Practice’s multilayer requirement means invisible watermarking and fingerprinting layers are also needed alongside the C2PA manifest. For a deeper look at how C2PA’s trust model holds up under real-world conditions, see the C2PA trust layer in 2026 — where it works and where it breaks.
C2PA handles the technical marking layer. It does not constitute compliance with the EU AI Act by itself.
Value chain classification has to come first. The distinction between GPAI system provider and deployer carries real compliance and cost consequences. Many organisations integrating third-party AI APIs have not completed this assessment — and since GPAI provider obligations became applicable August 2, 2025, some are already operating without that clarity.
Audit readiness is a separate requirement. The Code of Practice requires internal testing frameworks, robustness documentation, monitoring of marking pipelines, and contractual prohibitions on label removal. The gap between “we implemented C2PA” and “we can demonstrate compliance to an auditor” is exactly where companies get caught.
And C2PA addresses content outputs, not model input data provenance. Consent management for training data is a separate obligation under GDPR. Build monitoring into your pipeline rather than assuming signing handles everything. For the privacy side of this, see C2PA identity assertions and the privacy risks of content credentials.
The US Digital Authenticity and Provenance Act, enacted 2025, requires organisations to be transparent about their digital content verification and provenance practices. It creates a federal-level framework focused on disclosure rather than mandating specific technical implementations — less prescriptive than the EU AI Act.
California fills that prescriptive gap. California SB 942 (AI Transparency Act), effective January 1, 2026, applies to any company whose AI systems are used by California residents. It requires visible labelling at generation, imperceptible machine-detectable watermarking, a free publicly accessible detection tool, and provenance data including AI system name, version, and date. That watermarking specification maps directly to C2PA capabilities. California AB 853 aligns explicitly with C2PA as a compliance mechanism.
If you’re operating in both the US and EU: meeting the EU AI Act’s more prescriptive requirements generally positions you well for US obligations, but the reverse does not hold. Engineer to the most demanding requirement.
ITSP.10.005 is jointly authored by the Canadian Centre for Cyber Security and the UK’s National Cyber Security Centre. It is the only government-authored content provenance guidance framework co-produced by two Five Eyes cyber security agencies. It describes C2PA as “a relatively new but major standard in the provenance space.”
ITSP.10.005 is not a regulation. But when two national cyber security agencies frame content provenance as enterprise security practice — alongside cybersecurity, transparency, and auditability — C2PA investment shifts from a media-industry question to a security architecture decision. It also gives you a government-authored reference you can use to justify C2PA investment with boards, legal teams, or procurement reviewers.
The distinction between mandates and guidance matters when you’re making the budget case. The EU AI Act and California SB 942 create legal obligation. ITSP.10.005 creates best-practice defensibility. Both support the same decision. For foundational context on the C2PA ecosystem, see C2PA and content provenance infrastructure.
Blockchain provenance offers decentralised, tamper-proof content registration with strong immutability guarantees. For archival and supply chain logging, it is a credible tool. For EU AI Act compliance in 2026, it is not the right choice.
What matters for regulatory compliance is whether your approach has a documented pathway an auditor can verify. C2PA has the Conformance Programme, a published Certificate Policy, a publicly accessible trust list, and direct reference in the EU Code of Practice, California AB 853, and ITSP.10.005. Blockchain provenance has none of these.
C2PA also produces Content Credentials in the machine-readable format Article 50 specifies. Blockchain provenance does not produce this format without additional integration — and it adds transaction cost and latency for every content asset.
Where additional immutability is required beyond what cryptographically signed manifests provide, blockchain can complement a C2PA implementation. It cannot replace it for regulatory purposes. For the technical implementation side, see the architecture required to satisfy EU AI Act C2PA obligations.
Work backwards from August 2. Value chain classification: one month minimum, and it has to happen before any technical scoping. Pipeline integration: two to four months. Robustness testing: one to two months. Conformance certification: one to two months. Total: three to six months minimum.
For most companies, the window to certify before August 2026 is now extremely narrow.
California SB 942 enforcement began January 1, 2026. The EU AI Act deadline compounds that exposure in August. If you are generating AI content for US or EU audiences without compliance in place, you are accumulating regulatory exposure across both jurisdictions simultaneously. Both deadlines are real. Neither replaces the other.
On the penalty stakes: the minimum EU AI Act tier — €7.5 million or 1.5% of turnover — is a serious financial exposure for a mid-size company. Getting implementation right is considerably cheaper than a single enforcement action. Start with value chain classification. Everything else follows from that. For the full content provenance framework and how all the pieces connect, see the C2PA content provenance overview.
What penalties does a company face for EU AI Act Article 50 non-compliance? Three tiers: up to €35 million or 7% of global revenue for prohibited practices; up to €15 million or 3% of turnover for high-risk non-compliance; up to €7.5 million or 1.5% of turnover for transparency violations including Article 50 labelling failures.
Do I need to comply with the EU AI Act if my company is based outside the EU? Yes. Article 50 applies to any AI system placed on the EU market or put into service in the EU, regardless of where the provider is headquartered.
Is C2PA explicitly mandated by the EU AI Act? No. The EU AI Act does not name C2PA. However, the EU Code of Practice specifies multilayer marking requirements that align directly with C2PA’s technical capabilities, making it the most technically mature compliance pathway.
What is the difference between an AI model provider, GPAI system provider, and AI deployer? AI model providers carry the highest obligation. GPAI system providers integrate third-party models via API into user-facing products — many companies building on third-party AI APIs fall here and carry obligations they have not yet assessed. AI deployers face a lighter technical burden but must still make transparency disclosures.
Does California SB 942 require C2PA specifically? SB 942 does not mandate C2PA by name, but its watermarking specification maps directly to C2PA capabilities. SB 942 also requires a free public detection tool — a specific engineering obligation beyond C2PA implementation alone.
What is ITSP.10.005 and why does it matter? A content provenance guidance framework co-authored by the Canadian Centre for Cyber Security and NCSC UK. Not a regulation, but a government-endorsed reference architecture that frames C2PA adoption as enterprise security practice and supports internal justification with boards, legal teams, and procurement reviewers.
Does signing the EU Code of Practice mean compliance? No. Signing demonstrates good faith but does not substitute for implementing technical measures, maintaining documentation, and building audit readiness.
Can blockchain provenance satisfy the EU AI Act labelling requirement? Not directly. Blockchain does not produce the machine-readable format Article 50 specifies, does not satisfy C2PA Trust List requirements, and has no conformance pathway equivalent to C2PA’s programme.
Is there a grace period for EU AI Act Article 50 enforcement? A possible grace period may apply to systems already on market before August 2026, but this is unconfirmed and reportedly will not cover new systems. Treat it as a risk assessment decision requiring legal counsel, not a compliance strategy.
How long does C2PA implementation take? Value chain classification (one month), pipeline integration (two to four months), robustness testing (one to two months), conformance certification (one to two months): three to six months minimum depending on pipeline complexity.
C2PA Adoption in 2026 Hardware Platforms and Verification RealityThe C2PA Conformance Program launched in mid-2025, and it brought something the ecosystem sorely needed: a public registry of products that have actually passed conformance testing. That distinction — between marketing claims and verified conformance — is where any honest look at this space has to begin. Hardware adoption is real. Consumer smartphones, professional broadcast cameras, and major social platforms now support Content Credentials. C2PA 2.3, released December 2025, extended provenance to live streaming via CMAF segment signing. But signing content is only the first step, and the chain breaks the moment platforms strip metadata during upload and transcoding.
This article maps where C2PA adoption is genuine in 2026, where the verification gaps remain, and what the current state of the ecosystem means if your organisation is evaluating content provenance right now. For foundational context on how the standard actually works, start with how C2PA content provenance works.
The consumer milestone is the Google Pixel 10. It uses hardware-backed keys via the Titan M2 chip and on-device timestamping via the Tensor G5, signing every photo by default — not just AI-edited images. First mainstream smartphone to do this at scale.
The professional broadcast milestone is the Sony PXW-Z300, announced at IBC 2025 as the world’s first camcorder with native C2PA signing. Sony’s Camera Authenticity Solution lets news organisations issue sharing URLs for provenance verification, making the chain of custody verifiable all the way from camera to audience.
On the photography side, Leica, Nikon, Canon, Fujifilm, and Panasonic all have C2PA-capable products. Leica was the first mover with the SL3-S in January 2025. Panasonic was the latest major manufacturer to join the CAI in April 2025.
Samsung is the notable exception. The Galaxy S25 has C2PA credentials, but only for photos created or edited with generative AI — not for standard captures. As the world’s largest smartphone manufacturer by volume, that limited scope tells you the ecosystem is narrower than the press coverage implies.
Camera Bits confirmed C2PA support in development for Photo Mechanic in February 2026 with no public release timeline. They also flag that timestamps are missing from most cameras that sign photos with C2PA — a real limitation if you need a cryptographically verified “when” for legal or investigative use.
Nikon is also a caution. The Z6 III had C2PA added via firmware in August 2025, then suspended after a signing vulnerability. All certificates were revoked and as of early 2026 the service hasn’t been restored.
For implementation patterns, how to build C2PA into your own pipeline covers the architecture in detail.
LinkedIn displays a CR icon on images carrying Content Credentials that users can click to see the provenance summary. TikTok adopted Content Credentials in partnership with CAI for AI-generated content labelling at consumer scale. Both are genuine adoptions — not just announcements.
The structural problem is what happens next. Social media pipelines strip embedded metadata — including C2PA manifests — during upload, transcoding, and re-encoding. A platform can support Content Credentials and still strip them in practice. The CAI’s documentation on this describes Durable Content Credentials — combining watermarking and fingerprinting with metadata — as the answer to the stripping problem.
Currently fewer than 1% of news images or videos published globally include C2PA metadata, according to the Reuters Institute — and that’s among the most motivated publishers. The adoption infrastructure exists. Routine use of it doesn’t yet.
Cloudflare became the first major CDN to implement Content Credentials in February 2025, bringing C2PA to around 20% of the web’s infrastructure. When a site owner opts to preserve Content Credentials, Cloudflare Images can maintain them through transformations.
C2PA credentials attach to the content itself and can be verified by any conforming validator, regardless of distribution platform. YouTube’s AI-generated content labels and Meta’s AI disclosures work within those platforms but aren’t portable. For the trust gaps behind the adoption headline, the metadata stripping problem is the operational failure mode worth understanding in detail.
That gap between what platforms claim to support and what they’ve actually implemented is exactly what the Conformance Programme was designed to measure. The C2PA Conformance Programme launched in mid-2025 as a risk-based certification process testing generator products, validator products, and Certificate Authorities against the C2PA specification. The Conforming Products List is publicly accessible at spec.c2pa.org/conformance-explorer/ — a live JSON registry of products that have passed formal conformance testing.
Here’s the distinction that matters. A product claiming C2PA support is making a marketing claim. A product on the Conforming Products List has been independently verified against the spec. CAI membership (more than 6,000 organisations by late 2025) is a commitment to the mission — not a conformance credential. When you’re evaluating whether a product does what it claims, the Conforming Products List is the evidence base. The gap between claimed support and certified conformance is wide in early 2026.
The governance mechanism is the C2PA Trust List, which replaced the Interim Trust List in mid-2025. The ITL was frozen January 1, 2026. Content signed with valid ITL certificates during the ITL’s validity period remains valid under the legacy trust model — existing signed assets aren’t invalidated. Going forward, products must use certificates from CAs certified under the new Trust List to be considered conformant.
The programme is still in early enrolment. As the CAI put it: “This is how an ecosystem earns confidence. Not through claims of utility, but through verifiable behaviour.”
Until C2PA 2.3, provenance could only be added to static assets — pre-recorded content and video-on-demand. The Merkle tree approach used for VOD doesn’t work for live content because the full asset isn’t known in advance.
C2PA version 2.3 (December 2025) closed that gap by defining a protocol for signing live and broadcast media at the CMAF segment level. Each segment — typically 1–10 seconds of video — is independently signed. The practical payoff for broadcasters is that this works with existing HLS, DASH, CDN, and DRM infrastructure. No changes to manifests or codecs are required, and non-C2PA-aware players simply ignore the provenance data. As Irdeto noted in their January 2026 technical analysis: “The bump to the minor version of the C2PA specification, from 2.2 to 2.3, may suggest only a small adjustment. For the video ecosystem, however, the latest version introduces a major capability: support for live streaming.”
The Project Origin Alliance shaped C2PA’s requirements for news and broadcast workflows — live streaming support directly addresses their use case. BBC and Microsoft lead the initiative.
ARD (Germany’s leading public broadcaster) implemented C2PA in a serverless cloud pipeline announced in November 2025. The MIT-licensed code is on GitHub, which shows that this doesn’t require large infrastructure investment. For implementation patterns, see how to build C2PA into your own pipeline.
It’s not really a choice. Proprietary labelling and C2PA serve different needs, and most platforms will implement both.
Proprietary labelling — YouTube’s AI-generated content labels, Meta’s AI disclosures — is visible and immediate within those platforms. But those labels don’t travel with content. They can’t be independently verified. If content ends up somewhere other than where it was labelled, the provenance record is gone.
C2PA Content Credentials attach to the content itself, verifiable by any conforming validator without a network call to the original signer — all required certificates travel inside the manifest. That offline-verifiable design is what makes it useful for newsrooms, courts, and any scenario where the original platform may be unavailable or untrusted.
In practice, platforms can implement both. LinkedIn displays proprietary labels and supports C2PA. The regulatory environment is also pushing organisations toward open standards. The EU AI Act‘s Article 50 obligations enter into force in August 2026, requiring AI content labelling in machine-readable formats that are “effective, interoperable, robust, and reliable” — explicitly favouring a multilayered approach. Proprietary platform-only labelling doesn’t satisfy that across jurisdictions. C2PA does.
For foundational context, see the C2PA infrastructure overview.
Here’s the gap adoption metrics don’t address: even where Content Credentials are displayed, user engagement with them is very low. The infrastructure can work correctly, the badge can appear, and users still won’t interact with it.
Fstoppers’ January 2026 industry analysis frames it directly: “public apathy and learned scepticism may be the largest hurdles to C2PA adoption, bigger than any technical challenge.” The volume of synthetic imagery has conditioned people to assume everything might be fake regardless of what metadata says. That’s a human behaviour problem, and C2PA does not address it directly.
Microsoft’s February 2026 Media Integrity and Authentication report found that no single method — C2PA provenance, watermarking, or fingerprinting — can prevent digital deception on its own. “Preventing every attack or stopping certain platforms from stripping provenance signals isn’t possible,” said Jessica Young, Microsoft’s director of science and technology policy.
There’s also an expectation gap worth spelling out. C2PA does not detect AI-generated content — it records what was declared at the time of signing. It’s a transparency layer, not an authentication verdict. That distinction gets lost regularly.
Meeting EU AI Act compliance requirements and changing how audiences evaluate content are separate problems. C2PA addresses the first clearly; the second depends on user behaviour that the technology can’t control. For B2B contexts — photojournalism, regulated industries, high-liability advertising — provenance documentation has real value in contract negotiations and legal review. For more on metadata stripping, see why metadata stripping still undermines platform-distributed content.
JPEG Trust is a provenance standard developed by JPEG.org that partially overlaps with C2PA’s scope. C2PA 2.3 explicitly added compatibility support for JPEG Trust, and CAI treats it as part of the broader content provenance extensions ecosystem alongside CAWG — not as a competitor.
For implementers, here’s the practical read: JPEG Trust addresses provenance within the JPEG ecosystem specifically, while C2PA defines the full signing, manifest embedding, and verification pipeline across all media types. You don’t need to choose. They serve different parts of the ecosystem and are designed to coexist.
Use the CAI verification tool at verify.contentauthenticity.org — upload or link any media file and the tool checks the C2PA manifest, validates the certificate chain against the Trust List, and displays provenance information if valid. No account needed. All required certificates travel inside the manifest, so verification requires no network call to the original signer.
The Conforming Products List is publicly accessible at spec.c2pa.org/conformance-explorer/. It’s a live JSON registry of products that have passed formal conformance testing under the C2PA Conformance Programme launched mid-2025.
Sony, Nikon, Canon, Leica, Fujifilm, and Panasonic all have C2PA-capable models. Google Pixel 10 added default signing for all photos. Samsung Galaxy S25 supports C2PA only for AI-edited images. Check the Conforming Products List for verified conformance — not all “C2PA-capable” products have passed formal testing. Note that Nikon’s certificate programme was suspended after a signing vulnerability and as of early 2026 hasn’t been restored.
Social media platforms strip embedded metadata — including C2PA manifests — during upload, transcoding, and re-encoding. Durable Content Credentials (combining manifests, watermarking, and fingerprinting) are designed to survive this process using TrustMark invisible watermarking to embed a recoverable identifier.
No. C2PA records what was declared at the time of signing — it provides provenance, not detection. It cannot determine whether content is AI-generated after the fact. AI detection tools use probabilistic classification — a different approach with different limitations.
CAI membership (6,000+ organisations) means joining the Content Authenticity Initiative coalition. C2PA conformance means a product has been independently tested against the C2PA specification and appears on the Conforming Products List. Membership is a commitment; conformance is a verified technical achievement.
Yes, as of C2PA version 2.3 (December 2025). The specification defines CMAF segment-level signing for live media, compatible with existing HLS, DASH, CDN, and DRM infrastructure. No changes to manifests or codecs required.
The EU AI Act requires AI content labelling effective August 2026. The Article 50 obligations require machine-readable labelling that is effective, interoperable, and robust — favouring open standards over proprietary platform labelling. C2PA is the leading open standard that satisfies this requirement across jurisdictions and platforms.
Hardware signing at capture records what the device observed at the moment of creation — the strongest provenance signal. Software-applied credentials are added after capture and can’t provide the same chain-of-custody guarantee. Content that carries valid C2PA credentials doesn’t need to be detected as real; it cryptographically proves its origin.
Camera Bits confirmed that timestamps are missing from most cameras that sign photos with C2PA, calling it “crucial” and the final piece being refined before Photo Mechanic reaches public beta. Without a trusted timestamp, the provenance record lacks a cryptographically verified “when” — which limits its usefulness in forensic contexts.
The C2PA Trust Layer in 2026 Where It Works and Where It BreaksC2PA adoption is picking up speed. LinkedIn, TikTok, and Sony are signing content. Google Pixel 10 ships with Content Credentials built in. Adobe’s Content Authenticity tools are in production. But here’s the thing — signing content and having a verified trust infrastructure are two very different things. In early 2026, they haven’t come together yet.
The Interim Trust List that held C2PA together during its early years was frozen on January 1, 2026. The official C2PA Trust List exists, but the Conformance Programme that populates it only launched in mid-2025 and is still in early enrolment. Then in September 2025, the Nikon Z6 III incident showed everyone what happens when hardware key management fails inside a signing pipeline.
This article looks at the C2PA trust layer honestly — where the governance infrastructure holds, where the gaps are, and what you need to check before committing. For foundational context on how content provenance infrastructure works, see content provenance infrastructure explained. For technical detail on how manifests work, see what C2PA manifests contain and how signing works.
The C2PA Trust List is a publicly maintained list of Certificate Authorities (CAs) whose certificates have been certified under the C2PA Conformance Programme for signing Content Credentials. Here’s what that actually means in practice.
A C2PA manifest is cryptographically signed. Anyone with a valid-looking certificate can produce a structurally correct manifest with a mathematically valid signature. The Trust List is what separates “this was signed by an entity certified against the C2PA specification” from “this was signed by someone with a self-signed certificate they made themselves.”
Without Trust List validation, a validator only confirms a mathematically valid signature. The cryptography checks out. The structure is valid. But there’s no independent accountability — none at all.
The Trust List is published in the conformance-public GitHub repository and browseable via the C2PA Conformance Explorer. You can also use C2PA Verify — the reference verification site — to check whether a specific piece of content has a Trust List-backed signature or an ITL-based one. As of early 2026, confirmed conformant CAs include DigiCert and SSL.com, which joined in September 2025. The list is not yet fully populated. Its completeness depends entirely on how many CAs and product vendors have completed Conformance Programme enrolment.
The Interim Trust List (ITL) was a transitional list of CAs used during C2PA’s early adoption phase from 2021 through 2025. It allowed C2PA Verify and other validators to distinguish valid, known certificates from unknown ones before the formal Conformance Programme existed. It was always meant to be temporary.
On January 1, 2026, the ITL was frozen. No new entries. No updates. Done.
Content signed during an ITL certificate’s validity period stays valid against the legacy trust model — those signatures aren’t invalidated. But no new ITL-based certificates will be issued, and existing ITL certificates are now a legacy trust signal, not a conformance signal. The official Trust List aligns with C2PA 2.x. If your implementation went live between 2021 and 2025, you’re running on ITL-era certificates. That’s not a crisis — but your validator needs to distinguish ITL-based credentials from Trust List credentials, and you need a migration path worked out.
Most mainstream C2PA coverage hasn’t touched the ITL freeze. Worth knowing when you’re reading adoption-positive takes.
The C2PA Conformance Programme is the accountability layer that gives the Trust List any meaning at all. It launched in mid-2025 and certifies three categories of entity: generator products, validator products, and Certificate Authorities.
The idea is sound. The programme checks whether implementations actually behave consistently with the C2PA specification and its security requirements. A conformance programme for certification authorities opened in June 2025 — SSL.com satisfied those requirements and became an authorised issuer of C2PA-conformant signing certificates in September 2025.
But early enrolment is not a populated ecosystem. The Trust List is only as trustworthy as the certification process behind it, and that process hasn’t covered most of the CAs and vendors in the market yet. C2paviewer puts it plainly: the Certificate Trust List “is still maturing; revoked or expired certificates require timely infrastructure updates.” You can see who’s been certified via the C2PA Conformance Explorer, but the list will keep growing as enrolment proceeds.
The governance framework exists. Most of the ecosystem hasn’t been through it yet. See privacy risks embedded in the conformance model for additional gaps that affect identity assertions.
In September 2025, a photographer known as Horshack discovered a vulnerability in the Nikon Z6 III’s C2PA signing implementation. He demonstrated it by encoding an AI-generated image of a pug flying an airplane into a NEF file, then using the camera’s multiple exposure mode to get that image signed with valid C2PA credentials. The resulting JPEG had a conformant signature from a Trust List-backed camera. The content was fabricated.
Nikon revoked all certificates issued for the Z6 III and suspended the Nikon Authenticity Service. Nikon’s notice to users stated: “the authenticity credentials attached to these images are no longer valid and cannot be used as proof of provenance.”
The incident exposes three things. First, hardware key management failure — the signing key wasn’t adequately isolated from untrusted input, and the multiple exposure mode processed NEF files from a non-C2PA body without validating the source. Second, retroactive invalidation — every piece of content signed with those Nikon certificates, legitimate or otherwise, lost its trust signal the moment Nikon revoked them. Third, the revocation checking gap — the default behaviour of available validation tools, including c2patool, is not to check revocation status. Horshack filed a GitHub issue to change that default. When revocation checking is forced on, images from the compromised Nikon certificates fail validation correctly. Out of the box, they don’t.
Horshack noted that Nikon responded as quickly and comprehensively as they could. The remaining problem sits with the open-source validation tools, not Nikon’s firmware. Resolution depends on the broader C2PA tooling ecosystem catching up — which is not something Nikon can fix on their own.
For how to avoid similar key management failures in a signing pipeline you operate, see key management and signing pipeline security.
Yes. And understanding how is useful for working out what your validator actually needs to do.
The common forgery vector is a self-signed certificate. Security researcher Neal Krawetz demonstrated this: using c2patool, he produced a forged manifest with Leica’s correct German address and valid C2PA signatures. Every validation tool reported the signatures as correct. The only flag came from the Content Credentials reference site, which noted the certificate was from an unknown source — but only because that site checks the Trust List. And Krawetz noted that for anyone wanting to bypass even that flag, a trusted signing certificate costs $289. The barrier is low.
The second vector is hardware signer exploitation — using a legitimate Trust List-backed certificate to sign illegitimate content, as demonstrated in the Nikon Z6 III incident.
There’s also a misconception worth clearing up. C2PA does not detect deepfakes. The World Privacy Forum’s report is explicit: AI detection “does not fall within C2PA’s scope of content provenance.” C2PA records provenance — who signed what, using what tools. It says nothing about whether content was manipulated after signing.
The absence of a C2PA signal is also ambiguous. Content without Content Credentials isn’t necessarily untrustworthy. Content with Content Credentials from a certificate not on the Trust List doesn’t mean anything was verified.
Validators need to check three things: Trust List membership, revocation status via CRL or OCSP, and the full certificate chain. Not just the signature. For the structural basis of these attack vectors, see what C2PA manifests contain and how signing works.
The World Privacy Forum (WPF) published a technical analysis of C2PA titled “Privacy, Identity and Trust in C2PA,” authored by Kate Kaye and Pam Dixon. It’s one of the few independent technical reviews of the framework that doesn’t come from an organisation with a stake in C2PA adoption.
The WPF’s assessment of the conformance model is measured. The report confirms that the Conformance Programme opened in June 2025 and that it “has not been reviewed for this report” — the programme was too new at the time of analysis to assess independently. That sentence alone tells you something about the gap between the governance framework’s existence and its maturity.
On the absence-of-signal problem, the WPF puts it bluntly: “C2PA metadata is meant to be used as signals for measuring content trustworthiness; indeed, the very absence of C2PA metadata can negatively affect C2PA-based interpretations of trust.” Once content provenance becomes expected infrastructure, the lack of a C2PA manifest becomes a negative signal — an implicit requirement to participate.
The WPF report also covers metadata stripping as a durability problem. Content Credentials that get stripped during platform ingestion or processing never reach the end user. The provenance signal is real at the point of signing and invisible at the point of consumption. The gap between what the standard promises and what actually happens in real-world media pipelines is wider than the adoption-positive narratives suggest.
The report’s framing of goal expansion is also worth reading. It notes that “some who have implemented C2PA or evaluated it from a policy perspective suggest the goals of the framework have morphed over time in an effort to solve an expanding and expansive set of problems.” That’s a useful check against any single source describing C2PA as a solution to problems it wasn’t designed to address.
See privacy risks embedded in the conformance model for detailed coverage of the identity assertion risks the WPF identifies separately.
The C2PA Conformance Explorer is the canonical public tool. Browse and filter the live C2PA Conforming Products List, Trust List, and TSA Trust List. The Trust List JSON itself is in the conformance-public GitHub repository — programmatic access is available if you want to query it in your tooling.
C2PA Verify validates Content Credentials against the Trust List and distinguishes between ITL-based and Trust List-based signatures. Use it as a reference check.
The confirmed conformant CAs as of early 2026 are DigiCert and SSL.com. If your signing certificates were issued by a CA not on the Trust List, the first step is checking whether your CA is in the Conformance Programme enrolment pipeline. If they’re not, you need a migration plan. The migration guidance is published in the conformance-public repository — it covers timeline expectations and backward compatibility for previously signed content.
For your validator configuration, you need to be checking three things: Trust List membership, revocation status via CRL or OCSP, and the full certificate chain. Not just the signature. If your validator isn’t doing all three, the trust chain has gaps.
For the regulatory dimension and why migration timelines matter in that context, see regulatory deadlines creating trust posture urgency. For foundational infrastructure context, see content provenance infrastructure explained.
No. C2PA records provenance — it documents who signed content and what tools were used to create or edit it. It doesn’t analyse content to detect whether it was AI-generated or manipulated. C2PA tells you where content came from, not whether it is authentic.
Content signed with ITL-era certificates remains valid — the signatures are not invalidated by the freeze. Those certificates are now a legacy trust signal. Validators that distinguish ITL-based from Trust List-based signatures will flag ITL certificates as non-conformant going forward.
Yes, if the validator does not check the Trust List. A self-signed certificate can produce a syntactically valid C2PA manifest with a mathematically correct signature. Trust List validation is what distinguishes credible provenance from self-assertion.
Revocation checking adds network latency (querying CRL or OCSP endpoints) and potential failure modes if the endpoint is unavailable. The current default prioritises availability over security. Horshack filed a GitHub issue on c2patool to change this behaviour after demonstrating that the Nikon revocation would be invisible to default-configured validators.
The C2PA Trust List covers Certificate Authorities that issue signing certificates. The TSA Trust List covers Time Stamping Authorities that provide timestamps for C2PA manifests. Both are governed by the Conformance Programme and published via the Conformance Explorer.
The Trust List is in early population. DigiCert and SSL.com are confirmed conformant CAs as of early 2026. The number will grow as more CAs complete Conformance Programme enrolment.
Not necessarily. Platform adoption of signing and display is genuine progress. But most user-generated content remains unsigned and Trust List maturity remains an open challenge. Signing adoption is not the same as end-to-end verified trust.
Check whether your CA is in the Conformance Programme enrolment pipeline. If not, plan migration to a conformant CA — DigiCert or SSL.com as of early 2026. Review the migration guidance in the conformance-public GitHub repository for timeline and backward compatibility.
All content signed with the revoked Nikon certificates lost its trust signal. Photographers had to synchronise their cameras with Nikon Imaging Cloud to remove the invalidated certificates. A single signer’s key management failure retroactively invalidated an entire body of signed work.
The governance infrastructure is real and the direction is clear. But the trust layer is not yet fully operational. Organisations facing regulatory deadlines (EU AI Act) may need to implement now. Others may choose to monitor Conformance Programme enrolment and implement when Trust List coverage reaches a threshold that matches their risk tolerance.
The Conformance Explorer is a public web tool at spec.c2pa.org/conformance-explorer/ that allows browsing and filtering the live C2PA Conforming Products List, Trust List, and TSA Trust List. It shows which CAs, generator products, and validator products have been certified under the Conformance Programme.
How C2PA Content Credentials Work and What Their Limits AreDeepfake incidents went from roughly 500,000 cases in 2023 to over 8 million in 2025. That’s a 900% increase in two years. Detection-only approaches can’t keep pace because generative models improve faster than detectors can catch up. C2PA — the Coalition for Content Provenance and Authenticity — takes a different angle: attach a cryptographically signed record of origin, tools, and edit history directly to a media file at the point of creation, so any post-signing alteration is immediately detectable. That record is called a C2PA Manifest or Content Credential. Think of it as a digital passport for content. This article covers how C2PA works at an architecture level, what a manifest actually contains, how it differs from traditional metadata, and — critically — what it cannot prove. For the broader ecosystem context, see the content provenance infrastructure overview.
Here’s the problem C2PA is solving: there’s no standardised way to confirm who created a photo, video, or document, or whether it’s been altered since creation. Synthetic content is projected to account for up to 90% of online media by 2026. Already, 74% of consumers doubt photos or videos even from trusted news outlets. That’s a trust crisis.
Traditional metadata — EXIF, IPTC — doesn’t help. Anyone can edit those fields with free tools. There’s no cryptographic binding between the metadata and the content itself. It was never built to be trusted.
C2PA is an open, royalty-free technical specification published under the Joint Development Foundation, currently at v2.3 (December 2025). The coalition includes Adobe, Microsoft, Google, Intel, Arm, BBC, Sony, and Truepic.
Two things C2PA is not: it does not detect fakes or classify content as real. It asserts positive provenance — “this content was signed by this entity at this time with these claims.” The absence of a C2PA credential says nothing definitive about authenticity.
A C2PA Manifest is the core data structure. It’s a digitally signed record embedded inside a media file that documents the content’s origin, creation tools, and complete edit history.
Every manifest has a three-layer hierarchy:
Here’s something worth noting: all assertions are optional by specification. No single assertion is mandatory. A valid manifest can make very few actual claims. That matters a lot when you’re evaluating what a “verified” credential actually tells you — more on that below.
The manifest also includes the full X.509 certificate chain, so verification can happen offline without contacting the original signer. Manifests are embedded using the JUMBF container format (JPEG Universal Metadata Box Format), which supports JPEG, PNG, MP4, PDF, WebP, AVIF, HEIC, and other file types.
When content is edited, the original manifest becomes an ingredient reference in the new manifest, creating a traceable provenance chain.
The key difference is tamper-evidence. EXIF stores descriptive information about a file — camera model, shutter speed, GPS — without any cryptographic binding. What’s written can be changed without a trace. EXIF was never built to be trusted.
C2PA binds provenance to content via cryptographic hard binding. The content is hashed using SHA-256, and that hash is included in the signed manifest. Any pixel-level change to the asset invalidates the hash. The signing format uses COSE (CBOR Object Signing and Encryption) — a well-established standard also used in passports, IoT devices, and web authentication.
“Tamper-evident” is the correct term — not “tamper-proof.” C2PA reveals if tampering occurred but cannot prevent it. As the NCSC puts it: “The smallest modification to a file creates a completely different hash value that makes changes instantly detectable.”
The C2PA workflow has three stages: signing, embedding, and verification.
Stage 1 — Signing. The Claim Generator assembles assertions into a claim, signs it using the signer’s private key via COSE format, and includes the signer’s X.509 certificate from a trusted Certificate Authority. The content is hashed to create the hard binding.
Stage 2 — Embedding. The signed manifest is packaged into a JUMBF container and embedded in the media file. For unsupported formats, a sidecar file carries the manifest alongside the asset.
Stage 3 — Verification. A validator reads the manifest, checks the cryptographic signature against the certificate chain, validates the certificate against the C2PA Trust List, and verifies the hard binding hash against the current file. If the file has been altered, the hash won’t match. All required certificates travel inside the manifest, so verification requires no network call — which makes it well suited to newsrooms and low-connectivity environments.
Hardware vs cloud signing. Hardware signing at capture is the strongest trust scenario. The private key is protected inside a hardware security module and extraction is extremely difficult. The Leica M11-P (October 2023) was the first consumer camera with C2PA built in. The Google Pixel 10 signs every photo by default via the Titan M2 chip. Cloud signing enables post-capture credential attachment but introduces chain-of-custody questions about what happened between capture and the signing event.
The Trust List governs which certificates are accepted — the full story is covered in where the trust layer currently falls short. For architecture patterns on integrating C2PA signing into a cloud media pipeline, see architecture patterns for pipeline integration.
These three terms get used interchangeably. They refer to different things.
C2PA (Coalition for Content Provenance and Authenticity) is the standards body and technical specification. It defines how manifests work, what signing means, and what verification requires.
Content Credentials is the user-facing name for C2PA Manifests — what LinkedIn and TikTok display when showing a provenance badge. When a platform says it “supports Content Credentials,” it means it reads and displays C2PA Manifests.
The Content Authenticity Initiative (CAI) is an Adobe-led coalition focused on adoption, open-source tooling, and developer education. CAI maintains the c2pa-rs Rust library (MIT licence) and the c2patool CLI.
In practice: C2PA writes the spec, CAI builds the tools, Content Credentials is the name consumers see. The most common confusion is misattributing governance — expecting CAI tooling to define the standard. The spec lives at c2pa.org. For the full ecosystem picture, see the C2PA ecosystem guide.
C2PA has structural limitations that vendor communications frequently underplay. You need to understand these before evaluating adoption.
The first-mile trust gap is the most significant structural limitation. C2PA can confirm that a specific device or software made a specific claim at a specific time. It cannot confirm the underlying content is authentic. The canonical example: a camera can sign a photo of a screen displaying a deepfake. The manifest will be cryptographically valid, the hard binding will pass, the certificate chain will verify — and the content is still fabricated. The trust model ultimately rests on signer honesty.
Metadata stripping is a practical everyday limitation. Standard C2PA manifests embedded via JUMBF are lost when a non-C2PA-aware tool resaves the file. WhatsApp, iMessage, and Facebook all re-encode images on upload, silently removing any embedded credentials. The stripped file gives no signal that credentials ever existed.
This stripping limitation is what leads to Durable Content Credentials, which combine metadata, invisible watermarking (soft binding via TrustMark), and image fingerprinting to recover provenance even after stripping. That approach is covered in how durable credentials address metadata stripping.
Additional structural limitations:
If you’re evaluating C2PA — whether for a camera product, a media pipeline, or a publishing platform — here’s what you’d reach for.
The C2PA specification (v2.3, December 2025) is the normative reference, published royalty-free at c2pa.org.
The CAI developer learning hub at learn.contentauthenticity.org has tutorials, guides, and implementation walkthroughs for developers integrating C2PA.
c2pa-rs is the open-source Rust library (MIT licence) for reading, creating, and validating manifests — the primary reference implementation, available via the CAI GitHub organisation. c2patool is the CLI tool built on top of it for signing and inspecting manifests without writing code.
Content Credentials Verify at contentcredentials.org/verify inspects whether a file carries a valid manifest — processed entirely in-browser using WebAssembly, so files never leave the user’s device. C2PA Viewer at c2paviewer.com exposes the raw JSON manifest for technical debugging.
For architecture patterns on integrating C2PA signing into a cloud media pipeline, see architecture patterns for pipeline integration. For a complete overview of the C2PA ecosystem and how content provenance infrastructure works, see the C2PA content provenance infrastructure overview.
No. C2PA records a signer’s assertions about origin, tools, and edit history. If those assertions are false, the manifest is still cryptographically valid. C2PA proves who signed a claim, not whether the claim is true.
Standard C2PA manifests are lost when a non-C2PA-aware tool resaves the file — WhatsApp, iMessage, and Facebook all re-encode images on upload. Durable Content Credentials address this by adding invisible watermarks (TrustMark) and image fingerprinting as backup recovery mechanisms.
C2PA is fundamentally different from watermarking. A watermark alters pixel data and can be cropped away. A C2PA manifest is structured metadata that doesn’t change the visual content and provides cryptographic tamper detection that watermarks can’t offer. They’re complementary — Durable Content Credentials use watermarks as a soft binding mechanism — but they solve different problems.
A trusted signing certificate from DigiCert or SSL.com costs approximately $289 per year. Self-signed certificates are flagged as untrusted by verifiers. There’s currently no free CA equivalent to Let’s Encrypt for C2PA, which creates a cost barrier for individual creators and smaller organisations.
As of early 2026: Leica M11-P (hardware signing since October 2023); Google Pixel 10 (hardware-backed via Titan M2 chip, every photo signed by default); Sony α9 III and α1 II (cloud signing via Sony Imaging Edge, opt-in per shoot); Samsung Galaxy S25 (AI-edited photos only). The Nikon Z6 III had C2PA support but suspended it after a signing vulnerability led to full certificate revocation in September 2025.
OpenAI (DALL-E 3, Sora), Adobe Firefly, and Google Imagen embed C2PA credentials identifying AI-generated content. Midjourney does not embed C2PA credentials as of early 2026 — a notable gap in AI-generation coverage.
Hard binding uses a SHA-256 hash computed over the file’s bytes — any change breaks it. Soft binding uses a perceptual hash or invisible watermark designed to survive minor edits and transcoding. Hard binding provides stronger tamper-evidence; soft binding provides durability at the cost of reduced security guarantees.
The EU AI Act (effective August 2026) requires transparency labelling for AI-generated content; C2PA’s AI assertion type directly satisfies this. The U.S. Digital Authenticity and Provenance Act (2025) mandates content provenance disclosure in federally regulated media contexts.
Yes — and this has been demonstrated empirically. The Hacker Factor created a valid forged manifest attributed to a named individual using publicly available c2patool. Separately, they demonstrated that an AI-generated image could be signed by a Nikon C2PA-enabled camera, producing a valid manifest with no photographic provenance. The C2PA Trust List and certificate governance are designed to limit retroactive signing risk, but both vectors are documented and active threats.
It means nothing definitive. The vast majority of content today carries no C2PA manifest. Absence of credentials does not indicate the content is inauthentic or manipulated — it simply means no signer has attached provenance metadata. Treating absence as a negative signal is a known misinterpretation.
Yes. All required certificates travel inside the manifest, so no network call is needed. Content Credentials Verify and C2PA Viewer process files entirely in-browser using WebAssembly — files never leave the user’s device. This makes offline and air-gapped verification straightforward for newsrooms and legal workflows.
What You Need to Know About Open AI Supply-Chain Licensing RiskYou’ve picked a model from Hugging Face. The metadata says MIT. Your team integrates it, ships the feature, and moves on. Except that MIT tag is a metadata field, not a legal instrument. A February 2026 audit of 124,278 AI supply chains found that 95.8% of models on Hugging Face are missing the licence text, copyright notice, and attribution records required to make that tag enforceable. Without those three components, the model defaults to all rights reserved.
This page maps the full landscape and links to the articles that cover each topic in depth.
Permissive-washing is the practice of labelling an AI artefact with a permissive licence tag — MIT, Apache 2.0, BSD-3-Clause — while omitting the documentation that makes that label enforceable. The tag is metadata. The actual grant requires the full licence text, a copyright notice, and upstream attribution records. Without all three, the artefact reverts to default copyright.
Research by Jewitt et al. coined the term after auditing 124,278 supply chains. They found 96.5% of datasets and 95.8% of models lack the required licence text. Only 3.2% of models satisfy both text and copyright requirements — the bare minimum for a valid grant.
For the full explanation, read Permissive-Washing in AI Explained — Why Open Source Labels Cannot Be Trusted.
A licence tag on a Hugging Face model card is a metadata field, not a signed legal instrument. For a permissive licence like MIT to be enforceable, the repository must also include the full licence text, a copyright notice identifying the rights holder, and attribution records for any upstream components incorporated into the model or its training data. If any of these are absent, the licence grant is legally incomplete and the underlying copyright applies.
Hugging Face does not currently enforce that a matching licence file exists before a tag is applied. Developers moving quickly on model selection routinely rely on the repository metadata tag without inspecting the actual licence file — if one exists at all. The consequence is not a grey area: without the required documentation, reuse of the artefact is legally equivalent to reproducing a fully copyrighted work without permission.
For the full treatment, read Permissive-Washing in AI Explained — Why Open Source Labels Cannot Be Trusted.
The AI supply chain has three layers: training datasets, models, and applications. Licence obligations flow upward — if a dataset’s compliance documentation is missing, every model trained on it inherits that gap, and every application deploying that model inherits it again.
Attribution is where this breaks down. Only 27.59% of models preserve compliant dataset notices. Only 5.75% of applications preserve compliant model notices. And 76% of companies that prohibit AI coding tools acknowledge their developers use them anyway — ungoverned adoption bypasses whatever verification your workflow includes. For the full structural analysis, read How AI Licence Risk Compounds Across Your Dataset Model Application Stack.
An AI Bill of Materials (AI-BOM) extends the familiar Software Bill of Materials (SBOM) to cover artefacts that standard formats miss: model weights, training data sources, fine-tuning history, and the compliance documentation at each layer. SPDX 3.0 and CycloneDX have both added AI-specific profiles, and the EU Cyber Resilience Act‘s SBOM mandate is accelerating enterprise adoption.
Only 54% of organisations currently evaluate AI-generated code for IP and licensing risks — an AI-BOM process directly closes that gap. For the governance framework and procurement checklist, read What an AI Bill of Materials Is and What to Demand From Vendors.
None of the four dominant open-weight families — LLaMA, Mistral, DeepSeek, and Qwen — carry a straightforwardly permissive licence. LLaMA 2 blocks use above 700 million monthly active users. Qwen, with over 113,000 derivatives on Hugging Face, carries Alibaba Cloud commercial restrictions in some versions. DeepSeek’s MIT tag does not substitute for reviewing the actual licence file.
Open-weight means downloadable weights. It does not mean disclosed training data or unencumbered rights. Treat licence selection as a procurement variable with the same weight as capability benchmarks. For the side-by-side comparison, read Llama Mistral DeepSeek and Qwen Licence Terms Compared for Commercial Use.
Three misconceptions cause the most damage. First, that a metadata licence tag constitutes a valid licence grant — Hugging Face does not validate tags against actual licence files. Second, that open-weight models are legally equivalent to open-source software. Third, that the EU AI Act open-source exemption covers any model labelled as open — it requires a genuinely open licence, public parameters, and no monetisation.
For the foundational explainer, read Permissive-Washing in AI Explained — Why Open Source Labels Cannot Be Trusted.
AI licence compliance slots into your existing DevSecOps workflow as an extension of software composition analysis (SCA). The additions: model card review before adoption, AI-BOM generation, snippet scanning for AI-generated code, and licence validation gates in CI/CD.
The 2026 Black Duck OSSRA report found 68% of audited codebases contain licence conflicts — partly driven by AI coding assistants generating snippets from copyleft sources without the original licence information. For the implementation playbook, read Adding AI Licence Compliance to Your Existing Engineering Workflow.
Both regulations apply if you use AI components from open repositories and sell into the EU market. The AI Act’s GPAI obligations — training data summaries and copyright compliance policies — have applied since August 2025. The CRA’s SBOM mandate becomes binding in 2027.
The AI Act’s open-source exemption is narrower than most people assume — it requires a genuinely open licence, public parameters, and no monetisation. LLaMA 3’s commercial terms likely disqualify it in many contexts. For the full regulatory translation, read EU AI Act and Cyber Resilience Act Supply Chain Obligations Explained.
Permissive-Washing in AI Explained — Why Open Source Labels Cannot Be Trusted: Defines permissive-washing, explains the licence metadata vs. licence grant distinction, and presents the quantitative evidence from arXiv:2602.08816. Start here if you are new to the topic.
How AI Licence Risk Compounds Across Your Dataset Model Application Stack: Maps the three-tier AI supply chain and explains how attribution failures at the dataset layer propagate through to your application. Covers shadow AI as the invisible risk multiplier.
What an AI Bill of Materials Is and What to Demand From Vendors: Defines the AI-BOM, explains what it must capture that an SBOM misses, and provides a practical procurement checklist for vendor conversations.
Llama Mistral DeepSeek and Qwen Licence Terms Compared for Commercial Use: Side-by-side licence comparison across the four dominant open-weight model families, including EU AI Act open-source exemption eligibility for each.
Adding AI Licence Compliance to Your Existing Engineering Workflow: Practical implementation guide for extending your existing SCA and DevSecOps practices to cover AI artefacts, coding assistant output, and AI-BOM generation.
EU AI Act and Cyber Resilience Act Supply Chain Obligations Explained: Translates EU AI Act GPAI obligations and CRA SBOM requirements into procurement and engineering terms. Explains how to use regulatory requirements as leverage in vendor negotiations.
An open-weight model makes its trained parameters publicly downloadable but may not disclose training data, full methodology, or carry a licence that grants unrestricted use. An open-source AI model — in the strictest interpretation — discloses weights, architecture, training data, and grants clear rights to use, modify, and distribute. The practical consequence: open-weight does not mean legally unencumbered. Before commercial deployment, the actual licence text must be reviewed, not just the repository metadata tag. The model-by-model licence comparison shows exactly where each major family falls on this spectrum.
No. Hugging Face metadata tags are self-reported by the model publisher and are not validated against an actual licence file. Research from arXiv:2602.08816 found that 95.8% of models on Hugging Face are missing the required licence text. A licence verification workflow must check for the presence of a complete licence file, a copyright notice, and upstream attribution records — not just the metadata tag. The open source label trust guide explains why the tag gap exists and what a compliant licence payload requires.
If the compliance payload (licence text, copyright notice, and attribution records) is absent, the licence grant is legally incomplete. The artefact’s effective status reverts to the underlying copyright, which means reuse is legally equivalent to reproducing a fully copyrighted work without permission. This creates exposure to claims for licence violation or copyright infringement, depending on jurisdiction and the rights holder’s willingness to enforce. Understanding how that risk compounds through your dataset, model, and application stack is essential before evaluating your real exposure.
No. Fine-tuning creates a derivative work that inherits the licence obligations of the upstream model. If the upstream model carries incomplete or restricted licensing, those obligations carry through to your fine-tuned version. The EU AI Act provides a partial boundary condition: if your fine-tuning compute exceeds one-third of the original training compute, you may be treated as a GPAI provider with the associated documentation obligations. The EU AI Act and CRA supply chain obligations guide translates these thresholds into practical procurement terms.
Not in most commercial deployment contexts. The EU AI Act open-source exemption applies to models that are genuinely open, non-monetised, and carry complete licence documentation. LLaMA 3 carries Meta’s commercial use terms — not a standard permissive licence — which may disqualify it from the exemption when used in monetised applications. DeepSeek’s MIT tag does not substitute for a review of the actual licence file and training data provenance. Exemption eligibility is model-specific and use-case-specific. See the complete open-weight model licence comparison for a per-family exemption eligibility breakdown.
Check whether a licence file exists in the repository root — not just the metadata tag — and that it contains the full licence text, a copyright notice, and any attribution requirements. Review the model card for training data disclosure. If either the licence file or training data disclosure is missing, treat the model as requiring legal review before integration into any production system. For a repeatable process, the AI licence compliance engineering workflow guide covers how to embed these checks into your existing CI/CD pipeline. For teams adopting models at scale, the AI-BOM procurement framework provides a vendor checklist that turns this verification into a structured requirement.
EU AI Act and Cyber Resilience Act Supply Chain Obligations ExplainedTwo EU regulations now create binding supply chain transparency obligations for any company shipping AI-augmented products into the EU market. The EU AI Act (Regulation (EU) 2024/1689) covers AI systems and general-purpose AI models. The Cyber Resilience Act (Regulation (EU) 2024/2847) covers all products with digital elements — which in practice means virtually every commercial software product with a network connection.
Here’s the useful bit: these regulations give you concrete, externally validated grounds to demand specific documentation from your AI vendors. The training data summary, the copyright policy, the SBOM — these are legal requirements your vendor is either meeting or not. That’s a much stronger position than asking nicely.
And even if your company isn’t based in the EU, you’re probably still in scope. Both instruments have extraterritorial reach. If your product touches the EU market, this applies to you.
For the broader context on AI supply chain licensing risk, see the overview article. This one focuses on what both regulations actually require — and how to use those requirements in your next vendor conversation.
Both regulations apply based on where products reach the market, not where the company is headquartered. Non-EU companies placing GPAI models on the EU market must comply and must appoint an authorised representative in the EU. The CRA carries fines of up to 2.5% of worldwide annual turnover. The EU AI Act adds fines of up to 3% of global annual revenue for GPAI violations. These are not small numbers.
Even if you serve no EU customers today, your vendors may. Their compliance obligations create documentation requirements that flow into your supply chain regardless of where you are. You’ll feel it eventually.
The US is heading the same direction anyway — Executive Order 14028 mandated SBOMs for federal vendors, and CISA updated minimum SBOM elements in 2025. The destination is the same. The EU is just moving faster.
Enforcement timeline: GPAI obligations under the EU AI Act became effective 2 August 2025. CRA reporting obligations begin September 2026. Full CRA compliance is required by December 2027.
GPAI stands for General-Purpose AI — the EU AI Act’s regulatory category for what most people call a foundation model or large language model.
In practice, models trained with more than 10²³ FLOPs that can generate language, images, or video are treated as GPAI-scoped. That covers GPT-4, Llama, Mistral, DeepSeek, Qwen, Claude, and effectively every commercially available foundation model. If you are using any off-the-shelf LLM in your product, you are working with GPAI-scoped technology.
There’s a higher tier — GPAI with Systemic Risk (GPAISR) — for frontier models above 10²⁵ FLOPs with enhanced obligations. For most downstream product teams, those obligations fall primarily on the model provider, not you.
The category that matters most for your business is “downstream provider” — companies integrating GPAI models into their own products. What your upstream GPAI providers owe you as a downstream operator is where the practical leverage lies.
The GPAI Code of Practice, published in July 2025, is the compliance reference. OpenAI, Anthropic, Google, and Mistral have all committed to it. That commitment creates the basis for requesting the specific documentation the Code requires.
For a breakdown of individual model licence terms and GPAI classification implications, see the model licence comparison.
The EU AI Act provides a partial exemption from Article 53 for open-source GPAI models. But it’s not automatic — you need to satisfy all three conditions simultaneously: public model weights, a recognised free and open-source licence, and no monetisation of the model.
That third condition is where most commercial deployments fall apart. Monetisation includes charging for access, bundling with paid services, and collecting user data as a condition of access. “Research-only” or “no-commercial-use” licences do not qualify as FOSS for EU AI Act purposes.
OLMo (AI2) is the canonical qualifying example — fully open weights, genuinely permissive licence, non-commercial entity. Llama (Meta) does not qualify for commercial deployers. Meta’s custom licence imposes commercial use restrictions that take it outside the open-source definition. If you are building a commercial product on Llama, you are a downstream provider with full GPAI obligations.
One more thing the exemption doesn’t give you: even models that do qualify remain subject to Article 53(1c) (copyright policy) and Article 53(1d) (training data summary). The exemption is partial, not total.
This is where permissive-washing becomes a regulatory risk. A model carrying an Apache 2.0 or MIT label in repository metadata may have entirely separate commercial use restrictions in the actual licence text. Research analysing 760,460 ML models on Hugging Face found only 37.8% declared licensing information in a standardised, machine-readable way. Repository labels are not a reliable basis for assessing open-source exemption eligibility.
For per-model exemption eligibility, see the model licence comparison. For the broader AI supply-chain licensing picture this regulation sits within, see the open AI supply-chain licensing risk overview.
Article 53 applies regardless of whether a model qualifies for the open-source exemption. Full stop.
Article 53(1c) requires GPAI providers to implement and document a copyright policy — specifically compliance with the EU Copyright Directive’s text and data mining (TDM) Article 4, including respecting robots.txt opt-out mechanisms.
Article 53(1d) requires GPAI providers to publish a training data summary: a mandatory public document describing the data sources used at each training stage and what copyright compliance measures were taken. The European Commission published a template for this in July 2025.
So here’s what you ask your upstream GPAI vendor for: (1) the copyright policy referencing TDM Article 4 compliance, and (2) the training data summary following the Commission template. If a vendor cannot produce these, they are either non-compliant or operating outside GPAI scope. Ask them which one it is.
The AI Bill of Materials operationalises Article 53(1d) — extending the training data summary into machine-readable form with full data lineage and licence chain documentation. For a full explanation, see the AI BOM article.
Fine-tuning is the activity most likely to convert your company from a downstream provider into a GPAI provider with full Article 53 obligations.
The trigger is the one-third compute rule. If the compute used for your modification exceeds one-third of the compute originally used to train the base model, you are presumed to have become a GPAI provider.
The good news: standard LoRA and PEFT fine-tuning typically does not approach this threshold. The AI Office “currently expects only few modifiers to become GPAI model providers.” The modifications more likely to trigger provider status are further pre-training on large datasets, large-scale continued training, and model distillation. RAG, custom system prompts, and hyperparameter adjustment do not trigger reclassification.
If fine-tuning does trigger provider status, your Article 53 obligations are limited to the modifications you made — not the entire upstream model’s compliance history.
Track your fine-tuning compute relative to base model training compute and treat this as a compliance input. For integrating this into your engineering workflow, see the engineering workflow article.
The CRA requires all products with digital elements placed on the EU market to include an SBOM in SPDX or CycloneDX format, retained for at least ten years (Article 13). That obligation affects vendor contracts and internal archiving infrastructure.
In practice, virtually all commercial software with a network connection qualifies. An AI model embedded in a commercial product is a software component subject to CRA SBOM requirements — foundation models, fine-tuned variants, and their dependencies all belong in the product SBOM.
Beyond the SBOM, the CRA requires secure-by-design engineering standards, conformity assessments, CE marking, vulnerability management, and reporting of severe security incidents to ENISA within 24 hours.
Enforcement timeline: Chapter IV conformity notification obligations apply from June 2026. Vulnerability reporting begins 11 September 2026. Full compliance is required by 11 December 2027. Building SBOM processes takes pipeline changes across your engineering organisation. Starting now means having working systems before the September 2026 reporting window.
On format: SPDX (ISO/IEC 5962) has broad tooling support and dedicated AI profiles in SPDX 3.0. CycloneDX (OWASP) has native ML-BOM support and stronger CI/CD integration. Both are accepted under CRA guidance.
For what AI-specific component documentation looks like, see the AI BOM article.
These regulations convert vague best-practice expectations into documented, enforceable vendor requirements. Here’s your checklist.
1. Training data summary (Article 53(1d), EU AI Act) Ask for the public training data summary covering datasets used at each training stage and copyright compliance measures. Reference the European Commission template. If the vendor cannot produce it, ask whether they are non-compliant or outside GPAI scope.
2. Copyright policy (Article 53(1c), EU AI Act) Ask for the written copyright policy referencing TDM Article 4 compliance, including the approach to robots.txt. This applies to open-source providers too — the exemption does not remove this obligation.
3. SBOM for AI components (CRA Article 13) Require an SBOM in SPDX or CycloneDX format with the CRA’s ten-year retention requirement in your contract. For AI components, the SBOM should extend to an AI BOM covering training data sources, fine-tuning history, and licence chains.
4. Full licence text (permissive-washing risk mitigation) Request the complete licence text — not just the repository label. This applies to the model licence, any training data licences, and licences for any fine-tuned variants.
5. Open-source exemption status confirmation Ask the vendor to confirm in writing whether their model qualifies for the open-source exemption — specifically the three-part test: public weights, FOSS licence, no monetisation.
6. Incident notification commitment (CRA Chapter IV) Require a contractual commitment to notify downstream users of severe security incidents. The CRA requires initial notification to CSIRT and ENISA within 24 hours — your contract should flow that obligation into your vendor relationship.
7. Vulnerability management and patching cadence Require a defined support period and patching commitment for AI model components. AI models are not always versioned or supported with defined lifecycles. A contractual commitment protects you from silent model deprecations.
The AI Bill of Materials consolidates items 1 through 4 into a single machine-readable document. Requesting an AI BOM is the most efficient way to ask for all four at once.
For non-EU companies: apply these requirements as your procurement baseline regardless. Your EU-based customers may require this documentation from you downstream.
Does the EU AI Act apply to companies outside the European Union? Yes. It applies to any company whose AI systems or GPAI models reach the EU market, regardless of headquarters. If your product uses AI and has EU customers, you are within scope. Non-EU GPAI model providers must also appoint an authorised representative in the EU, unless the open-source exemption applies.
What is the difference between the EU AI Act and the Cyber Resilience Act? The EU AI Act regulates AI systems and general-purpose AI models specifically — transparency, documentation, and risk classification. The CRA regulates all products with digital elements — SBOMs, secure-by-design engineering, vulnerability management, and CE marking. Both apply simultaneously to AI-augmented commercial software.
What happens if a vendor cannot provide a training data summary? They are either non-compliant with the EU AI Act or their model does not fall within GPAI scope. Treat the inability to produce this document as a red flag in your procurement assessment.
What is the GPAI Code of Practice and is it mandatory? It is a voluntary compliance framework published by the European Commission in July 2025, translating Article 53 into actionable chapters covering transparency, copyright, and safety. Providers who do not sign it must still meet the underlying obligations through other means.
Which SBOM format should we use for CRA compliance — SPDX or CycloneDX? Both are accepted under CRA guidance. SPDX has strong licensing and provenance metadata and dedicated AI profiles in SPDX 3.0. CycloneDX has native ML-BOM support and stronger CI/CD integration. For AI-heavy products, CycloneDX may offer more granular component representation.
Does using an open-source AI model exempt my company from EU AI Act obligations? Not automatically. The open-source exemption requires public weights, a recognised FOSS licence, and no commercial monetisation. Most commercial deployments do not satisfy the monetisation condition. Even qualifying models remain subject to copyright policy and training data summary obligations.
What is the one-third compute rule for fine-tuning under the EU AI Act? If your fine-tuning uses compute exceeding one-third of the base model’s original training compute, you are presumed reclassified as a GPAI provider with full Article 53 obligations. Standard LoRA and PEFT fine-tuning typically does not reach this threshold.
What is the CRA’s 10-year SBOM retention requirement? The CRA requires SBOM documentation to be retained for at least 10 years after a product is placed on the market (Article 13). This affects vendor contracts, product delivery requirements, and internal archiving infrastructure.
When do the EU AI Act and CRA enforcement deadlines overlap? GPAI provider obligations became effective 2 August 2025. CRA Chapter IV conformity notification obligations apply June 2026. CRA incident reporting begins 11 September 2026. Full CRA compliance is required by 11 December 2027. September 2026 is your first hard deadline.
What is an AI Bill of Materials and how does it relate to the CRA SBOM mandate? An AI BOM extends traditional SBOM concepts to model provenance, training data lineage, fine-tuning history, and licence chains. SPDX 3.0’s AI profile and CycloneDX’s ML-BOM type both support AI BOM documentation. The AI BOM is the natural mechanism for meeting CRA SBOM requirements for AI components while operationalising Article 53(1d) obligations.
Can I use EU AI Act requirements as procurement leverage even if my company is not in the EU? Yes. The regulations provide a concrete, externally validated framework for demanding vendor transparency regardless of your domicile. Your EU-based customers may require this documentation from you downstream — using these requirements as your procurement baseline means you are ready when they ask.
This article is part of our complete open AI supply-chain licensing risk series. The series covers the full landscape from foundational misconceptions through procurement tools and operational implementation — if you are still mapping your exposure, that is the place to start.
Adding AI Licence Compliance to Your Existing Engineering WorkflowYour CI/CD pipeline already handles Software Composition Analysis (SCA) for open-source dependencies. It scans packages, evaluates licences, flags violations, and blocks bad merges before they reach production. That capability took years to build, and it works.
The problem is that AI artefacts — models, datasets, AI-generated code — now sit entirely outside those existing gates. When your team pulls a model from Hugging Face, adds a fine-tuned weights file to a container, or uses GitHub Copilot to write a function, none of that activity passes through the licence compliance checks you already have in place.
This matters because open AI supply-chain licensing risk is not theoretical. Permissive-washing — where a model carries an Apache 2.0 or MIT label in its repository metadata but the actual permissions granted fall short — means you cannot trust repository metadata alone. A 2026 empirical study of 760,460 Hugging Face models found that 52% of LLM supply chains exhibit at least one licence conflict.
This article walks you through extending what you already have. A four-step pre-integration model review. Snippet scanning for AI-generated code. AI-BOM generation in your CI/CD pipeline. And SBOM lifecycle management. We cover both the enterprise tooling path (FOSSA, Sonatype) and the open-source path (GUAC, BomCTL, Protobom).
SCA already does what you need — for traditional code. It identifies open-source components, evaluates licences, flags violations, and integrates into your CI/CD pipeline. The principles are identical for AI artefacts. The gap is not philosophical; it is a coverage gap.
Standard SCA tools scan declared dependencies in package manifests. They do not scan model weights in a container, dataset licence files in a model card, or inline code that an AI coding assistant generated from its training data. There are three specific gaps to understand:
Model licences: RAIL, OpenRAIL, Llama Community Licence, and custom addenda to Apache 2.0 are not in the licence databases traditional SCA tools use.
Training data provenance: An AI model is trained on datasets that may carry copyrighted material or conflicting licences. The model inherits those obligations, but manifest-based scanning cannot surface them.
AI-generated code snippets: When a developer uses GitHub Copilot or Cursor, the resulting code has no package manifest entry. If Copilot reproduced a GPL-licensed snippet, your pipeline will never know — until someone else’s counsel does.
The 2026 OSSRA report tells you how widespread this is: 97% of organisations use open-source AI models in development, yet only 54% evaluate AI-generated code for IP and licensing risks. That 46% gap accumulates legal exposure that surfaces at the worst moments — M&A due diligence, a regulatory audit, a copyright claim. For a complete overview of the full landscape of AI licensing risk — including the commercial, legal, and regulatory dimensions beyond what SCA tools address — the pillar article covers each domain in full.
One prerequisite before you start: address shadow AI — undeclared model usage already deployed in production. You cannot govern what you have not discovered.
Before automated tooling runs, a manual four-step review catches licence risks that machines cannot assess. This gate runs before engineering time is invested — 15 to 30 minutes per model, preventing weeks of remediation downstream.
Model cards document intended uses, limitations, training data, and licence declaration. In practice, model card quality remains low — ambiguity and incompleteness are the norm. Look for: the intended use statement (does it cover your use case?), out-of-scope uses (is your application listed?), training data description (are sources named and documented?), and the licence declaration.
This is where permissive-washing surfaces most commonly. A model’s repository metadata may show MIT or Apache 2.0, but the Acceptable Use Policy (AUP) — a separate document — may prohibit commercial distribution, restrict specific industries, or cap maximum active users. A model can carry an Apache 2.0 label while the AUP makes it unusable for commercial production. The label tells you nothing about the AUP. Read it separately. Every time.
Verify the actual licence file. A model labelled Apache 2.0 must have an Apache 2.0 licence file with no modifications — company-released models frequently add custom addenda that negate the permissive grant. Do not rely on the metadata tag.
Assess whether the training datasets have documented origins and licences that support commercial use downstream. Look for named datasets, licence terms for each, and whether those terms permit commercial derivative works. A model trained on datasets that prohibit commercial use passes that restriction downstream to you. Where training data is entirely undisclosed, that is a risk signal that belongs in your integration decision.
Both implementation paths produce the same outcome: visibility into AI artefact licences and automated policy enforcement in your existing CI/CD pipeline.
FOSSA supports AI model licence scanning as an extension of its existing SCA platform. Declare AI model artefacts alongside traditional package dependencies, set licence policies to include AI-specific licence types (RAIL, OpenRAIL, Llama Community Licence), and configure policy-driven model approval gates using FOSSA’s three-tier classification (approve / flag / deny). A January 2026 partnership with SCANOSS means snippet detection now operates within the FOSSA policy gate — AI-generated code goes through the same policy enforcement as any other dependency.
Sonatype provides shadow AI detection — scanning container registries and dependency manifests to identify undeclared model usage deployed without approval. Worth running before you do anything else.
GUAC (Graph for Understanding Artifact Composition) ingests SBOM, SLSA, and vulnerability data into a queryable graph database, providing supply chain tracing and dependency visibility for AI artefacts at no licence cost. BomCTL and Protobom are OpenSSF command-line utilities built for CI/CD integration — BomCTL handles generation, validation, and transformation; Protobom handles format-agnostic conversion between SPDX, CycloneDX, and emerging schemas.
If your team uses GitHub Copilot or Cursor, snippet scanning is not optional. It is required to maintain the validity of your existing licence compliance programme.
Here is the legal risk: if Copilot generates code that reproduces a GPL-licensed function from an open-source project in its training data, your codebase contains copyleft-licensed code with no licence notice and no compliance activity. The 2026 OSSRA report identifies “licence laundering” — AI assistants generating snippets derived from copyleft sources without retaining original licence information — as a key driver of year-over-year increases in licensing conflicts.
Heather Meeker, open-source licence expert, puts it plainly: “Snippet scanning is becoming essential because of the need to identify small, matching fragments of code that might originate from open source projects. The defensible path is to choose paid tools, enable guardrails, use snippet scanning, and apply your existing licence policies to AI outputs.”
The integration point is the pull-request workflow: snippet scanning triggers on every PR and blocks merge if copyleft-licensed snippets are detected without appropriate licence handling.
When scanning detects a violation, you have four remediation paths:
An AI Bill of Materials (AI-BOM) is the governance artefact this compliance workflow produces — a structured, machine-readable inventory of all AI artefacts with their provenance, licence terms, and version metadata. Unlike a traditional SBOM, it captures model identity and version, training data sources and their licences, fine-tuning parameters, framework dependencies (PyTorch, TensorFlow, LangChain), and the relationships between models, data, services, and infrastructure. For a deeper treatment, see What an AI Bill of Materials Is and What to Demand from Vendors.
Two machine-readable standards are available. SPDX 3.0 has ISO backing (ISO/IEC 5962) and dedicated AI and Data profiles — that carries significant weight in procurement and regulatory audits. CycloneDX ML-BOM (version 1.7) has broader OWASP ecosystem support, standardised as ECMA-424. OpenSSF’s Protobom provides lossless conversion between both.
For open-source AI-BOM generation in CI/CD: BomCTL handles generation and validation; OWASP AIBOM Generator supports Hugging Face-hosted models and produces CycloneDX output. The CI/CD integration pattern is straightforward — generate the AI-BOM for any AI artefact in the build, validate it against the appropriate schema, and store the signed artefact. Schema validation is the gate. If the AI-BOM fails validation, the build fails. GUAC then ingests the generated AI-BOMs alongside traditional SBOMs into a unified dependency graph.
An AI-BOM generated once at integration time has a short useful life. Models update, datasets change, CVEs are disclosed. This is where many teams stop short of a defensible compliance posture.
Five events require AI-BOM regeneration outside the normal build cycle: upstream model version bump, dataset update announcement, CVE disclosure affecting a model dependency, regulatory requirement change, and a fine-tuning or retraining event.
Tamper-evident AI-BOMs require cryptographic signing — ECDSA or Ed25519 produces a verifiable artefact that cannot be altered without invalidating the signature. SBOMit (OpenSSF) manages the end-to-end lifecycle. Dependency-Track (OWASP) provides continuous SBOM analysis with real-time findings.
For any product with digital elements sold in the EU, the Cyber Resilience Act mandates that AI-BOMs be kept up to date and retained for at least ten years. For a full treatment of EU AI Act and Cyber Resilience Act supply chain obligations, there is a dedicated article that covers the detail.
This is where an engineering decision directly determines regulatory status. Worth understanding before you start.
The EU AI Act‘s GPAI provisions include a one-third compute threshold: fine-tune a General-Purpose AI model using more than one-third of the original training compute, and your organisation is reclassified as a GPAI provider under Article 53, triggering transparency, documentation, and training data copyright summary obligations. Compare your planned fine-tuning FLOPs to the original model’s published training FLOPs. Where training compute is undisclosed, use the fallback threshold: one-third of 10^23 FLOPs for standard GPAI models.
Domain adaptation fine-tuning — continued training on large domain-specific datasets — is the scenario most likely to cross the threshold. Calculate your fine-tuning FLOPs before assuming you are below the line.
LoRA (Low-Rank Adaptation) uses far less compute — orders of magnitude less in typical deployments. It is the safer approach for teams wanting model customisation without triggering provider obligations. Document the FLOPs calculation regardless — it belongs in the AI-BOM and serves as your defence against future regulatory inquiry.
A compliance workflow that exists only in the heads of the people who built it does not survive a team reorganisation. Document it as an internal engineering standard — a version-controlled policy that survives team turnover, vendor changes, and audit scrutiny.
The standard should cover: the four-step pre-integration review procedure (who performs it, decision criteria, documentation produced); approved and denied licence classifications including AI-specific types (RAIL, OpenRAIL, Llama Community Licence); the snippet scanning policy (PR-level blocking, remediation workflow); AI-BOM generation and lifecycle requirements (pipeline step, output format, signing process, five update triggers); the fine-tuning threshold assessment procedure; and a quarterly review cadence.
Also specify what to request from upstream AI model providers contractually: training data source disclosure, update notification obligations, indemnification terms, and data retention commitments aligned with your CRA obligations. The Cyber Resilience Act already requires this kind of contractual arrangement with software suppliers — extend it to AI model providers.
Permissive-washing is when an AI model carries a permissive open-source licence label (Apache 2.0, MIT) in its repository metadata, but the underlying legal rights grant is insufficient for commercial downstream use. ML-specific licences often impose additional restrictions — limiting commercial use, prohibiting use of model output to train competing models — that fall well outside what the label implies.
No. Standard SCA tools scan declared dependencies in package manifests — they cannot detect code fragments that AI coding assistants like GitHub Copilot or Cursor generate by reproducing open-source training data. Snippet scanning is the separate capability you need.
An SBOM inventories traditional software components. An AI-BOM extends this to cover AI-specific artefacts: models, datasets, training parameters, fine-tuning configurations, and their provenance and licence terms. Both use SPDX or CycloneDX formats, but AI-BOMs require additional fields for model and data lineage.
Both are valid. SPDX 3.0 has ISO backing (ISO/IEC 5962) and dedicated AI and Data profiles. CycloneDX ML-BOM (v1.7) has broader OWASP ecosystem support, standardised as ECMA-424. Choose based on your existing tooling. OpenSSF’s Protobom enables lossless conversion between both, so you are not locked in.
FOSSA Snippet Scanning completes most scans in under five minutes — comparable to existing linting or static analysis steps.
It triggers an SBOM lifecycle update event. Regenerate your AI-BOM, re-evaluate your fine-tuning compute ratio, and verify that the licence terms have not changed.
LoRA uses significantly less compute — typically orders of magnitude less — and in most practical scenarios will not cross the one-third threshold. Still, document your FLOPs calculation and retain it for audit purposes. The threshold applies to cumulative compute across all fine-tuning rounds.
GUAC (Graph for Understanding Artifact Composition) is an OpenSSF open-source tool that ingests SBOM, SLSA, and vulnerability data into a queryable graph database, providing supply chain tracing and dependency visibility for AI artefacts at no cost.
Scan container registries for model weight files, review dependency manifests for AI framework imports (PyTorch, TensorFlow, LangChain), and check network egress logs for calls to external model APIs. Sonatype offers automated shadow AI detection as a platform feature. Run this audit before implementing the compliance workflow — you cannot govern what you have not discovered.
Request: training data source disclosure, update notification obligations, indemnification terms for downstream licence claims, and data retention commitments aligned with your CRA obligations.
Yes. Automated SCA tools operate after integration. The pre-integration review runs before engineering time is invested and catches risks automated tools cannot assess: AUP restrictions that exist outside machine-readable licence files, and training data provenance gaps that no structured automated tooling can process.
Regenerate on every build that changes an AI component. Five event-based triggers also require out-of-cycle regeneration: upstream model version bump, dataset update announcement, CVE disclosure affecting a model dependency, regulatory requirement change, and fine-tuning or retraining event. Retain signed copies for a minimum of ten years under CRA Article 13 requirements.
Legal ambiguity in AI licences is now an engineering dependency risk. When your team selects a model, the licence terms it carries propagate into your product, your supply chain obligations, and potentially your regulatory status. Repository metadata alone cannot be trusted to surface that risk — permissive-washing ensures it will not.
The workflow in this article — pre-integration review, snippet scanning, AI-BOM generation, and lifecycle management — extends what your team already knows. The principles are the same as SCA. The tooling integrates with your existing pipeline. The engineering standard codifies the process so the knowledge does not walk out the door when people do.
For the broader context on the open AI supply-chain licensing risk this workflow addresses — including the regulatory and commercial landscape driving these obligations — start with the pillar article. The workflow described here is the operational implementation of the governance framework it establishes.