Is training AI on copyrighted books considered fair use?

Two California federal judges have ruled that training AI models on copyrighted works can qualify as fair use, describing it as 'exceedingly transformative.' However, courts have not protected the acquisition of training data from pirate sources or the redistribution of copyrighted files during that process.

What was the outcome of Thomson Reuters v. Ross Intelligence?

Judge Stephanos Bibas ruled in early 2025 that Ross Intelligence's use of Westlaw's editorial headnotes to train a competing legal research tool was not fair use. It was the first significant decision to reject the AI training fair-use defense, and the appeal is now before the Third Circuit Court of Appeals.

What is the market substitution argument in AI copyright cases?

The market substitution argument holds that if an AI model trained on a genre of creative work can generate endless competent imitations, the real harm to authors is not the initial copying but the flood of substitute content that competes with their work. Judge Chhabria flagged this issue in Kadrey v. Meta, and it is the central argument being built into the next wave of AI copyright cases.

Dispatch · AI & the Law

The Fair-Use Frontier

Q: How much did Anthropic pay to settle the AI copyright lawsuit?

Anthropic paid $1.5 billion to settle Bartz v. Anthropic, the largest publicly reported copyright recovery in American history. The settlement covered roughly 482,000 works, implying about $3,113 per book.

Q: Can AI-generated content be copyrighted?

No. The Supreme Court declined to hear Thaler v. Perlmutter in March 2026, leaving in place a ruling that material generated entirely by a machine with no meaningful human author cannot be copyrighted. Copyright law protects human authorship, so purely AI-generated works fall into the public domain at creation.

Every large model was trained on a library no one was asked to lend. After a $1.5 billion settlement and the first appellate argument of its kind, the courts are finally deciding whether that was allowed.

June 26, 2026 • Lisa Pedrosa • 10 min read AI & Law

On the afternoon of June 11, 2026, three judges of the U.S. Court of Appeals for the Third Circuit sat down to consider a question that quietly underpins the entire artificial-intelligence economy: when a company copies millions of books, articles, and lines of code to teach a machine to write, is that theft or is it learning? It was the first time an American appeals court had ever taken up the question directly. The room was small. The stakes were not.

For three years, generative AI advanced on an unspoken assumption: that the open internet, and a good deal of the closed one, was fair game for training. Models read the world's writing the way a student reads a library, the argument went, and producing something new from what you have read is the oldest creative act there is. That assumption was never tested in court because the technology moved faster than the docket. Now the docket has caught up. More than seventy copyright suits are active against AI companies in the United States and abroad, and the combined damages being claimed have passed $50 billion. The legal foundation of the most valuable industry of the decade is being poured while the building already stands on it.

$1.5B

Anthropic settlement, largest copyright recovery in U.S. history

~482K

Books covered by the settlement

70+

Active AI copyright suits, U.S. and abroad

$50B+

Damages claimed across active cases

The settlement that put a price on a library

The number that focused everyone's attention arrived in the autumn of 2025 and was finalized through 2026: $1.5 billion, paid by Anthropic to settle Bartz v. Anthropic, a class action brought by authors whose books had been swept into the company's training corpus. It is the largest publicly reported copyright recovery in American history. Spread across roughly 482,000 works, it implies a figure of about $3,113 per book — a number that turned an abstract principle into a line item.

What made Bartz consequential was not only the price but the logic behind it. In a June 2025 opinion, Judge William Alsup of the Northern District of California drew a distinction that has shaped every case since. Training a model on copyrighted books, he reasoned, was "exceedingly transformative" — closer to a person learning to write by reading than to piracy — and therefore protected as fair use. But the way Anthropic had obtained some of those books, downloading them from pirate libraries and keeping a permanent copy, was a different act entirely, and that was not protected.

The court did not punish the learning. It punished the warehouse. The model could read the books; the company could not keep a stolen copy of the shelf.

That split — lawful purpose, unlawful provenance — is the hinge on which the whole field now turns. It tells AI companies that the act of training may be defensible, but that how the data was acquired and stored can sink them anyway. A great deal of the early training-data era was conducted with a shrug toward provenance. The bill for that shrug is now arriving.

Two California judges, two different instincts

A month before the Bartz reasoning landed, a parallel case offered a slightly different temperature. In Kadrey v. Meta Platforms, a separate group of authors sued over the training of Meta's Llama models. Judge Vince Chhabria granted Meta a partial victory, finding that the training itself could qualify as fair use — but he was pointedly unenthusiastic about it, suggesting the authors had simply failed to build the right evidentiary case about market harm. As in Bartz, the claims that survived concerned the "seeding" of pirated files during torrent downloads, where Meta may have redistributed copyrighted works rather than merely consuming them.

Read together, the two California rulings sketch an emerging consensus and its limits. Courts are increasingly willing to treat model training as transformative. They are far less forgiving about piracy, redistribution, and the question that Kadrey flagged most sharply: whether AI output competes directly with the very works it was trained on. A model that helps you write is one thing. A model that floods the market with substitutes for the authors it learned from is another, and that argument has not yet had its day.

"The act of training may be transformative. The act of competing with the people you trained on is the question no court has finished answering."

The unresolved core of the 2026 AI copyright fight

Why the Third Circuit changes the math

District-court rulings, however quotable, bind almost no one beyond their own courtrooms. That is why the June 11 argument matters more than the headlines it generated. The Third Circuit's case traces back to Thomson Reuters v. Ross Intelligence, in which a legal-research startup trained a competing tool on Westlaw's editorial headnotes. In early 2025, Judge Stephanos Bibas ruled that Ross's use was not fair use — the first significant decision to reject the AI-training defense outright, in part because Ross's product competed head-to-head with the source it had copied.

The appeal of that decision is the first time a federal appellate court has been asked to decide the fair-use question for AI training. Whatever the three judges conclude will carry the force of binding precedent across an entire circuit, and it will be read closely by every other court in the country still weighing the same issue. A clean affirmation for Ross's opponents would harden the rule that training a market substitute on a rival's data is infringement. A reversal would hand the AI industry its first appellate shield. Either way, the era of district-court improvisation is ending.

From the first filings to the first appeal: how three years compressed the law into one argument.

The other question: can a machine own anything?

While the courts decide what AI may consume, a separate line of cases asks what AI may create — and whether anyone owns it. On March 2, 2026, the Supreme Court quietly declined to hear Thaler v. Perlmutter, leaving in place a ruling that material generated entirely by a machine, with no meaningful human author, cannot be copyrighted at all. The principle is old, dressed in new clothes: copyright protects human authorship. A picture made by an autonomous system, absent a human's creative hand, falls into a kind of public domain at birth.

The two questions are mirror images. One asks whether the inputs to AI are protected; the other asks whether the outputs are. Together they define the legal shape of the technology — a system that may be allowed to learn from everything yet may struggle to own what it makes. For working writers, artists, and musicians, that combination is unsettling in both directions.

"A machine may be permitted to learn from everything humans have written, and yet may not own a single thing it writes back."

On Thaler v. Perlmutter and the limits of machine authorship

What this means if you make things for a living

It is tempting to read this as a fight between large companies — corporate plaintiffs against corporate defendants, billion-dollar settlements that never reach an individual author's mailbox. But the principles being set will reach much further than the parties named in the captions. They will determine whether licensing markets emerge, in which AI companies pay to train on creative work the way radio stations pay to play music. Several major settlements and licensing deals signed during 2026 suggest that market is already forming. A clear rule — even a rule that favors the AI companies — tends to create one, because certainty is what lets money change hands.

The harder possibility is the one the Kadrey court gestured toward and no one has yet litigated to conclusion: market substitution. If a model trained on a genre of fiction can generate endless competent imitations of that genre, the harm to its original authors may not be the copying at all. It may be the flood. That is the argument that could, eventually, narrow the fair-use defense even where the training itself is deemed transformative — and it is the argument the next wave of cases is being built around.

For now, the law is doing what law does when a technology outruns it: catching up unevenly, one opinion at a time, drawing lines that will look obvious in hindsight and arbitrary today. The June 11 argument will not settle everything. But for the first time, the question of what AI was built on is being answered not by engineers or executives but by judges — and their answer will be binding. The library was borrowed without asking. The bill is finally being read aloud, in open court.

Sources

Norton Rose Fulbright — AI in litigation series: An update on AI copyright cases in 2026
The Authors Guild — Bartz v. Anthropic Settlement: What Authors Need to Know
aibusiness.com — AI Lawsuits in 2026: Settlements, Licensing Deals, Litigation
Mishcon de Reya — Generative AI Intellectual Property Cases and Policy Tracker
Sustainable Tech Partner — Generative AI Lawsuits Timeline
U.S. District Court, N.D. Cal. — Bartz v. Anthropic, opinion of Judge William Alsup (June 2025)
U.S. District Court, N.D. Cal. — Kadrey v. Meta Platforms, opinion of Judge Vince Chhabria (June 2025)
U.S. District Court, D. Del. — Thomson Reuters v. Ross Intelligence, opinion of Judge Stephanos Bibas (2025)
U.S. Supreme Court — denial of certiorari, Thaler v. Perlmutter (March 2, 2026)
U.S. Court of Appeals, Third Circuit — oral argument, AI training fair-use appeal (June 11, 2026)

The Fair-Use Frontier

The settlement that put a price on a library

Two California judges, two different instincts

Why the Third Circuit changes the math

The other question: can a machine own anything?

What this means if you make things for a living

Sources

The Safety Reckoning

Thirty Day

Autonomous Weapons

The Income Floor

After the Job

The Crown Changes Hands