In October 2025, Australia’s Attorney General Michelle Rowland drew a line in the sand – Australia won’t be introducing a text and data mining (TDM) exception that lets AI companies train on copyrighted material without paying for it. This puts Australia in a different camp from the UK, EU, Japan, and Singapore, all of which have adopted some form of TDM exception.
Here’s your problem. Who’s on the hook for copyright liability when you deploy AI tools that might have been trained on content the AI company didn’t have permission to use? With the Bartz v. Anthropic settlement hitting $1.5 billion and statutory damages potentially going up to $150,000 per work, the risk is real money.
Then add in the fact that different countries are taking completely different approaches – Australia rejecting TDM while the US is relying on the uncertain fair use doctrine – and you’ve got a compliance puzzle that’s going to affect which vendors you pick, how you negotiate contracts, and how you manage risk. This article is part of our broader AI compliance picture that covers the full regulatory landscape.
So this article is going to walk you through what Australia and the US have decided, what the Anthropic settlement means when you’re making procurement decisions, the questions you need to ask vendors, the contract terms that actually matter, and the practical steps you can take to protect your own content. Understanding these copyright issues is crucial to the broader AI governance context that shapes your organisation’s AI adoption strategy.
What Did Australia Decide About AI Training Data Copyright in October 2025?
Australia said no to the text and data mining exception. Full stop. The Attorney General stated “we are making it very clear that we will not be entertaining a text and data mining exception” to give creators certainty and make sure they get compensated.
Now, the UK, EU, Japan, and Singapore have all gone the other way. They’ve adopted TDM exceptions that let you copy copyrighted works for computational analysis without asking permission. Australia’s Productivity Commission even recommended a TDM exception in August 2025, but the government knocked it back. Instead, they’re signalling that you’ll need a licensing regime – permissions and compensation.
If you’re operating in Australia, this means higher compliance requirements compared to other places. AI vendors can’t just claim a blanket exception for their training activities. Which makes vendor due diligence and contract terms that specifically address Australian law much more important. To understand the full picture of regional copyright positions, see our detailed jurisdiction comparison.
The Copyright and AI Reference Group is going to look at collective or voluntary licensing frameworks, improving certainty about copyright for AI-generated material, and establishing a small claims forum for lower value copyright matters. But the core principle is settled – no TDM exception means training on copyrighted content is going to require licensing.
What Is the US Copyright Office Position on Fair Use for AI Training?
The US is going down a different path, using existing fair use doctrine. The US Copyright Office put out its Part 3 report in May 2025 saying that fair use requires case-by-case analysis of four factors: the purpose and character of use, the nature of the copyrighted work, how much was used, and what effect it has on the market.
Fair use is a legal defence. It’s not blanket permission. The Copyright Office got over 10,000 comments on this – which tells you how contentious the whole thing is.
AI companies are arguing that training is transformative use – it creates new functionality instead of just substituting for the originals. But the Copyright Office pushed back on this. The report made the point that transformative arguments aren’t inherently valid, noting that “AI training involves creation of perfect copies with ability to analyse works nearly instantaneously,” which is nothing like human learning that only retains imperfect impressions.
What this means for you is that US-based AI vendors are operating under legal uncertainty that’s going to get resolved through settlements and court cases. Fair use is a defence you use in litigation – it doesn’t stop you from getting sued in the first place.
What Does the Bartz v. Anthropic Settlement Mean for AI Adoption?
Three authors sued Anthropic claiming the company downloaded over 7 million books from shadow libraries LibGen and Pirate Library Mirror to train Claude, all without authorisation.
Judge William Alsup ruled that using legally acquired books for AI training was “quintessentially transformative” fair use. But downloading pirated copies? That wasn’t. The class covered about 482,460 books. If Anthropic had lost, potential statutory damages could have exceeded $70 billion.
Anthropic settled for $1.5 billion – the biggest copyright settlement in US history. That works out to roughly $3,100 per work after legal fees. And they have to destroy the pirated libraries within 30 days.
Here’s what this tells you. Even well-funded AI companies with strong legal arguments would rather settle than face litigation costs and risks. And note – the settlement only lets Anthropic off the hook for past conduct before 25 August 2025. It doesn’t create an ongoing licensing scheme.
For your procurement decisions, what the settlement shows is that training data provenance is a material business risk that vendors take seriously. When you’re evaluating AI vendors, ask them about their training data sources, whether they’ve been in copyright litigation, and what indemnification they’ll provide.
How Do Copyright Risks Differ Between Australia and the US for AI Tools?
Australia rejecting the TDM exception creates strict liability risk. Using copyrighted content for training is infringement. There’s no specific defence. The US fair use doctrine gives you a potential defence, but it needs case-by-case analysis and it doesn’t stop you from being sued in the first place.
If you’re operating in both jurisdictions, the stricter Australian standard should be what guides your risk assessment and vendor selection. Australian companies can’t lean on vendors’ US fair use arguments. You need explicit licensing or indemnification that covers Australian law.
The practical approach? Apply the strictest standard – Australia’s licensing requirement – as your baseline for global operations. That way you’re covered no matter where your customers or operations are.
What Should AI Vendor Contracts Include for Copyright Protection?
You need a copyright indemnification clause. This is where the vendor agrees to defend you and cover costs if you get sued for the vendor’s training practices. It’s the foundation of your contractual protection.
Explicit warranties about training data sources matter. The vendor needs to represent that the data was lawfully obtained and used. Get this in writing.
Liability allocation provisions should spell out who bears the risk for input infringement – that’s training data issues – versus output infringement, which is generated content. Generally vendors should accept input infringement risk, while you’re responsible for how you use the outputs.
Enterprise-grade licences offer clearer terms regarding IP ownership, enhanced security, and specific provisions for warranties, indemnification, and confidentiality. Don’t settle for consumer terms of service.
Jurisdictional coverage is particularly important now that Australia’s rejected TDM. Make sure indemnification applies in all the regions where you operate. US-focused indemnification won’t protect you in Australia where the licensing requirement applies.
Notification requirements should make the vendor tell you about copyright litigation, settlements, or regulatory changes. You need to know when the vendor’s risk profile changes so you can reassess your exposure.
Insurance or financial backing demonstration makes sure the vendor can actually pay if indemnification gets triggered. A strong indemnification clause from a vendor that goes bankrupt isn’t going to help you.
How Can Companies Protect Their Content from AI Training?
Put a robots.txt file in place to block AI crawler bots from accessing your website content. The catch? Not all AI companies actually respect robots.txt. Only 37% of top 10,000 domains on Cloudflare have robots.txt files, and even fewer include directives for the top AI bots.
GPTBot is only disallowed in 7.8% of robots.txt files, Google-Extended in 5.6%, and other AI bots are each under 5%. Robots.txt compliance is voluntary – it’s like putting up a “No Trespassing” sign. It’s not a physical barrier.
Update your terms of service to explicitly prohibit scraping and AI training use of your website content without permission. This creates legal grounds for enforcement even if the technical controls get bypassed.
Use API restrictions and rate limiting to stop bulk data extraction. If you’re providing APIs, implement throttling to prevent dataset-scale extraction.
Consider DMCA takedown notices if your content shows up in AI outputs. Monitor for unauthorised use – check whether your proprietary documentation or code is appearing in AI-generated responses.
For high-value IP, explore proactive licensing arrangements with AI vendors rather than playing enforcement whack-a-mole. If the major AI companies are going to use your content regardless, getting compensated through licensing beats fighting endless enforcement battles.
How Do You Conduct IP Due Diligence on AI Vendors?
Request disclosure of training data sources. Are they using public domain content, licensed content, fair use claims, or sources they won’t disclose?
Ask about current and past copyright litigation, including settlements like Bartz v. Anthropic. You need to understand what happened and how it turned out.
Review the indemnification terms for how comprehensive they are, what jurisdictions they cover, and whether there’s financial backing. Does it cover all your operating jurisdictions? Can the vendor actually pay if it gets triggered?
Evaluate the vendor’s copyright compliance practices. Do they respect robots.txt? Do they have licensing agreements in place? Do they publish transparency reports? For a comprehensive approach to vendor IP due diligence, see our detailed vendor evaluation guide.
Check vendor financial stability. A startup’s indemnification promise carries different risk than Microsoft’s Copilot Copyright Commitment. Enterprise vendors like Microsoft and Google often have stronger indemnification than AI-native companies like OpenAI and Anthropic.
Request evidence of copyright insurance or legal reserves. This shows the vendor has actually planned for potential copyright exposure instead of just hoping the issue goes away.
What Is the Risk If Your Company Uses AI Trained on Pirated Content?
Legal liability typically lands on the AI vendor for input infringement – that’s the training data issues. Customer liability comes into play for output infringement – using AI-generated content that violates copyright.
But without strong indemnification, you could still face discovery costs, litigation participation, and reputational risk even if you’re not ultimately liable. Courts don’t require intent to establish copyright infringement. You can’t defend yourself by saying the AI created the content.
Statutory damages of up to $150,000 per work create huge vendor exposure. When datasets have hundreds of thousands of copyrighted works, liability can threaten the vendor’s viability.
For regulated industries like FinTech and HealthTech, using AI with questionable training provenance creates compliance and audit risk. What happens if your AI vendor goes bankrupt from copyright damages? You need contingency plans for switching providers.
Practical risk mitigation follows a hierarchy. Vendor selection with transparent data sourcing. Contractual protections through indemnification. And usage governance that makes sure you’re not creating output infringement exposure through how you deploy the tools.
FAQ Section
Can AI companies legally use copyrighted content to train their models?
It depends where you are. In the US, AI companies are arguing that fair use doctrine lets them train without permission if it’s transformative. Australia rejected the TDM exception, which means training on copyrighted content is probably going to require licensing. Courts are still working through these questions in litigation.
Who is liable if an AI tool I use was trained on pirated content?
Generally the AI vendor carries liability for input infringement – the training data issues – not the customer. But without strong indemnification clauses, you might still face litigation costs and reputational risk. You’re on the hook for output infringement if you use AI-generated content that violates copyright.
What is the difference between fair use and the TDM exception?
Fair use – the US approach – is a legal defence that requires four-factor case-by-case analysis. It doesn’t prevent lawsuits but you might prevail in court. TDM exception – used in the EU, UK, Singapore, Japan – is a statutory permission that allows training without authorisation. Australia rejected TDM, creating stricter requirements than comparable jurisdictions.
How do I know if an AI vendor’s training data sources are legitimate?
Ask vendors directly about their data sources and review any transparency reports they publish. Check for copyright litigation history. Look at whether they respect robots.txt and whether they have licensing agreements. Vendors with strong indemnification typically have more confidence in their data sourcing.
What should I ask AI vendors about copyright protection before purchasing?
Request comprehensive indemnification that covers your operating jurisdictions. Ask about training data sources and licensing. Review their litigation history. Verify they have the financial ability to honour indemnification. Confirm notification requirements for legal developments. Document all their responses for your compliance records.
Can I block AI companies from scraping my website content?
Yes, through robots.txt files, API restrictions, and terms of service updates. However, not all AI companies respect these technical controls and enforcement can be difficult. Legal mechanisms like DMCA takedowns give you additional remedies if unauthorised use happens.
What happened in the Bartz v. Anthropic settlement?
Authors sued Anthropic for allegedly training Claude on copyrighted books from shadow libraries without authorisation. Anthropic settled for $1.5 billion rather than litigating the fair use question. The settlement shows that copyright risk is something vendors take seriously, but it doesn’t establish legal precedent.
How much are statutory damages for copyright infringement in AI cases?
Up to $150,000 per wilfully infringed work under US copyright law, and you don’t need to prove actual financial harm. Given that training datasets might have hundreds of thousands of copyrighted works, the potential exposure is massive and that’s what drives settlement behaviour.
Does using enterprise AI tools like Microsoft Copilot reduce copyright risk?
Enterprise vendors like Microsoft often provide stronger indemnification compared to smaller AI-native companies. But review the specific contract terms because coverage varies. Larger vendors also have more financial capacity to honour indemnification if it gets triggered.
What is the difference between input infringement and output infringement?
Input infringement happens during training when copyrighted works get copied into datasets without authorisation – that’s primarily a vendor liability issue. Output infringement happens when AI-generated content substantially replicates copyrighted material – that’s typically a customer liability issue based on how you use the tool.
Should I wait to adopt AI until copyright issues are resolved?
You don’t need to wait indefinitely, but choose vendors with transparent data sourcing, strong indemnification, and litigation management experience. Put AI governance policies in place and do ongoing compliance monitoring. Use AI for lower-risk internal applications before customer-facing deployments if you’re concerned about exposure.
How do I protect my company if our AI vendor gets sued for copyright infringement?
Make sure you have comprehensive copyright indemnification in your vendor contracts that covers defence costs and damages. Verify the vendor’s financial strength to honour their commitments. Maintain documentation of your due diligence and what the vendor represented. Consider copyright insurance as additional protection. Monitor vendor litigation and have contingency plans if the vendor’s viability gets threatened.