Anthropic judge rejects $1.5B AI copyright settlement

(news.bloomberglaw.com)

267 points | by nobody9999 a day ago ago

262 comments

jawns 17 hours ago

I'm an author, and I've confirmed that 3 of my books are in the 500K dataset.

Thus, I stand to receive about $9,000 as a result of this settlement.

I think that's fair, considering that two of those books received advances under $20K and never earned out. Also, while I'm sure that Anthropic has benefited from training its models on this dataset, that doesn't necessarily mean that those models are a lasting asset.

[-]

shermozle 14 hours ago

It's far from fair given that if _I_ breach copyright and get caught, I go to jail, not just pay a fine.

[-]

dragonwriter 13 hours ago

> It's far from fair given that if _I_ breach copyright and get caught, I go to jail, not just pay a fine.

This settlement has nothing to do with any criminal liability Anrhropic might have, only tort liability (and it doesn’t involves damages, not fines.)

[-]

stingraycharles 13 hours ago

Also, you can’t put a business in jail.

[-]

echoangle 12 hours ago

But you can put the people that made the decision or are responsible for it in jail (or prison).

[-]

Draiken 12 hours ago

Isn't this wishful thinking? This basically never happens. Theory vs reality is very real.

[-]

mihaic 2 hours ago

Weather the population thinks that's wishful thinking or not, it's generally right.

zzzeek 12 hours ago

Huh ? Ask Sam Bankman-Fried, ask Enron, people go to jail for corporate crime all the time, are you meaning just for copyright infringement?

[-]

nerdsniper 7 hours ago

The examples you mention are all of someone stealing from the rich. But otherwise, even the most blatant obstruction of justice goes unanswered.

“Greyball”: https://www.nytimes.com/2017/03/03/technology/uber-greyball-...

My uncle went to jail for picking up someone in an airport in his taxi. He didnt have the airport permit (could only drop off, not pick up). Travis Kalanick industrialized that crime on a grand scale and got billions of dollars instead of jail.

tomrod 12 hours ago

Relative the amount of corporate malfeasance that occurs? Hardly anyone.

digdugdirk 11 hours ago

Please, name 5 more "big name" examples.

[-]

OJFord 11 hours ago

Can you name 7 people that went to prison for non-corporate crimes that easily too?

[-]

immibis 4 hours ago

John Doe McDrugUser 1

John Doe McDrugUser 2

John Doe McDrugUser 3

John Doe McDrugUser 4

John Doe McDrugUser 5

John Doe McDrugUser 6

John Doe McDrugUser 7

YetAnotherNick 9 hours ago

Asked AI:

- Sam Bankman-Fried (FTX): Sentenced to 25 years in prison in 2024 for orchestrating a massive fraud involving the misappropriation of billions in customer funds.

- Elizabeth Holmes (Theranos): Began an 11-year prison sentence in 2023 after being convicted of defrauding investors with false claims about her blood-testing technology.

- Ramesh "Sunny" Balwani (Theranos): The former president of Theranos was sentenced to nearly 13 years in prison for his role in the same fraud as Elizabeth Holmes.

- Trevor Milton (Nikola Corporation): Convicted of securities and wire fraud, he was sentenced to four years in prison in 2023.

- Ippei Mizuhara: The former translator for MLB star Shohei Ohtani was charged in April 2024 with bank fraud for illegally transferring millions from the athlete's account.

- Sergei Potapenko and Ivan Turogin: Convicted in February 2025 for a $577 million cryptocurrency fraud scheme.

- Bernard Madoff: Sentenced to 150 years in prison in 2009 for running the largest Ponzi scheme in history. He died in prison in 2021.

- Jeffrey Skilling (Enron): The former CEO of Enron was sentenced to 24 years in prison in 2006 for fraud and conspiracy. His sentence was later reduced, and he was released in 2019.

- Dennis Kozlowski (Tyco International): The former CEO served over six years in prison after being convicted in 2005 for looting millions from the company.

- Bernard "Bernie" Ebbers (WorldCom): Sentenced to 25 years in prison for orchestrating an $11 billion accounting fraud. He was granted early release in 2019 and died shortly after.

Apart from this list I know Nissan's ex CEO was put into solitary confinement for months.

[-]

nobody9999 8 hours ago

Who went to prison from Union Carbide for the Bhopal disaster[0]?

Who went to prison from Exxon for the Valdez oil spill[1], or from BP for the Deep Water Horizon[2] debacle?

Who went to prison from Norfolk-Southern for the East Palestine train derailment[3]?

Who went to prison from Boeing for the 737Max debacle[4]?

[0] https://en.wikipedia.org/wiki/Bhopal_disaster

[1] https://en.wikipedia.org/wiki/Exxon_Valdez

[2] https://en.wikipedia.org/wiki/Deepwater_Horizon_oil_spill

[3] https://en.wikipedia.org/wiki/East_Palestine%2C_Ohio%2C_trai...

[4] https://en.wikipedia.org/wiki/Boeing_737_MAX_groundings

[-]

myrmidon 3 hours ago

I do agree with you that corporate accountability is often quite poor, but I think the notion that every serious incident should conclude with someone going to prison is simply wrong.

Overly punitive handling of accidents does not lead to better safety-- it primarily leads to people playing the blame game, obfuscating and stonewalling investigations.

This is extremely likely to make the overall situation worse instead of better.

I also think punishment based on outcome is ethically extremely iffy. If you do sloppy work handling dangerous chemicals, your punishment should be for that, and completely independent of factors outside your control that lead to (or prevent) an actual accident.

[-]

nobody9999 2 hours ago

>I do agree with you that corporate accountability is often quite poor, but I think the notion that every serious incident should conclude with someone going to prison is simply wrong.

If someone puts a shopping cart filled with lead-acid batteries on the train tracks causing a derailment and toxic chemicals spill all over the area, poisoning and endangering the people nearby, the person responsible should not go to prison?

Or if someone takes an action knowing that it could crash an airliner with hundreds of people aboard, they should not be imprisoned?

By that logic, if I beat you over the head with a tire iron, I should just walk away. Possibly paying an inconsequential fine?

What's that? The individuals involved in poisoning hundreds/thousands or killing hundreds of airline passengers or beating you to death should be prosecuted and made accountable for their actions?

If that's the case, why should folks who knowingly take steps that create the same results not be treated exactly the same way? Because they were "just following orders" from management? Because their only responsibility is to maximize shareholder value?

Having a limited liability corporation is a privilege, not a right. As such, whether it be knowingly risking the lives and/or environments of others or making the cost/benefit analysis that paying fines/settlements will cost less than operating safely and putting others at risk is acceptable is behavior that should not be acceptable in a civilized society.

As I mentioned in another comment, businesses are strongly motivated by the incentives in their marketplace. If we make knowingly and/or negligently putting others at risk of harm both a death sentence for the corporation and criminal liability for those responsible (which includes management, the board and shareholders), we create the appropriate incentives for corporations to do the right thing.

As it stands now, willful, knowing negligence will usually only result in fines/lawsuits that are a pittance and not much of a drag on earnings. Those are not the right incentives.

[-]

myrmidon 23 minutes ago

> Or if someone takes an action knowing that it could crash an airliner with hundreds of people aboard, they should not be imprisoned?

If the risk is not excessive, my answer would be no. If the behavior is only realistically punishable when it actually results in an accident, then the answer would also be no.

I think that neither air travel nor chemical plants pose an excessively elevated risk to human lives right now, thus increasing punishments for infractions would be disproportionate, nto very helpful and potentially even detrimental for safety long-term.

Your analogy (beating someone with a tire iron) also clearly features intent; this is not typical for accidents and makes punishments less justifiable and much less useful.

If you actually want to make a strong case for increasing (shareholder) liability, it needs to be clear that those additional punishments and enforcement overhead would actually save lifes, and that very critical point is absolutely not obviouis to me right now.

maxbond 6 hours ago

Notice that everything on GP's list is fraud (except Gohn of Nissan who was accused of embezzlement and failure to report income). It's very difficult for an executive to go to prison any other way.

[-]

nobody9999 6 hours ago

>Notice that everything on GP's list is fraud (except Gohn of Nissan who was accused of embezzlement and failure to report income). It's very difficult for an executive to go to prison any other way.

I did notice. Which is why my list included mass deaths and massive pollution/ecological destruction, some of which we still don't know what the eventual damage/death toll will be.

And that's the bigger issue: property crimes are considered more serious than mass murder and poisoning our world. Just as with the fraudsters, the corporate veil should have been pierced for the murderers and despoilers of our environment, with harsh prison sentences for those whose avarice and sociopathy allowed them to murder and despoil.

Civil liability is fine, and the "corporate death penalty" (revoking charters, barring directors/managers from future employment, etc.) should be invoked with extreme prejudice in those circumstances as well.

But we don't do that. Because corporations are, in the above circumstance, not "people", but a legal fiction protecting its owners from liability. But when it benefits the corporation and its owners/managers, a corporation is a "person."

I'd say we should work it the other way -- if a corporation is responsible for deaths and despoilation, all the owners should have a share in the punishment.

That way, after a few thousand wealthy individual investors and the owners of a few dozen hedge funds/investment houses are put in SuperMax for a decade or two for the misdeeds of the companies in which they've invested. And let's not make the boards of directors, C-Suite and any others directly involved feel left out either. They can commiserate with their fellow scumbags in the prison yard.

That does sound pretty harsh doesn't it? Perhaps too harsh? I don't think so. Because as we're constantly reminded, business responds strongly to incentives.

And if businesses are strongly incentivized to not poison our citizens, kill airplane passengers and destroy our environment with the threat of long prison sentences and a stripping of their assets, I'd expect they'd respond to such incentives.

But, as it is now, when the incentives are to privatize profit and hold harmless those who kill us, make us sick and destroy our environment, those are the incentives to which corporations will respond.

[-]

maxbond 6 hours ago

To be clear "notice" wasn't really directed at you specifically, more commenters in general. I'm sorry for wording that confusingly, originally I'd replied to GP with a similar comment to yours but your comment was more comprehensive than mine so I deleted it and replied as a sort of footnote.

I'm not really big on incarceration but I broadly agree.

[-]

nobody9999 4 hours ago

>To be clear "notice" wasn't really directed at you specifically, more commenters in general. I'm sorry for wording that confusingly, originally I'd replied to GP with a similar comment to yours but your comment was more comprehensive than mine so I deleted it and replied as a sort of footnote.

I wasn't confused. I was on exactly the same page as you.

You comment just prompted me to respond with my own thoughts.

It's all good.

>I'm not really big on incarceration but I broadly agree.

I'm not generally huge on it either (I think we over-incarcerate in the US), but as I mentioned, having strong incentives is important to guide corporate behavior. Besides, if an individual (and especially a poor one) caused a train derailment or dumped battery acid in the drinking water causing sickness or death, or sabotaged a plane so that it crashed, you bet your ass they'd be incarcerated.

Why shouldn't we have the same standards for corporations and the wealthy?

YetAnotherNick 5 hours ago

Not sure why you rebutting my post or why it is getting downvoted. I just answered the question that asked of list of 5 people who went to jail for corporate crime. I never commented they go to jail every time they deserve(or even most of the time for that matter).

[-]

maxbond 4 hours ago

People generally downvote large blocks of generated text. Perhaps it wasn't your intention to argue for a particular position, but given the context, it's the natural inference. So some downvotes may be because they disagree with the position you appear to be arguing for.

If you want to neutrally answer a rhetorical question in the context of a debate, you're going to have to disclaim that somehow. Otherwise, there's no way for us to know, and the comment walks and talks like an argument.

nobody9999 4 hours ago

>Not sure why you rebutting my post or why it is getting downvoted. I just answered the question that asked of list of 5 people who went to jail for corporate crime. I never commented they go to jail every time they deserve(or even most of the time for that matter).

It wasn't a rebuttal of your comment so much as I saw it as an opportunity to show the double standard in play WRT the consequences of ripping off wealthy folks vs. destroying the environment and/or outright maiming and killing people.

I didn't downvote your post either. Although if I'd noted that you "Asked AI", I might well have done so. To be clear, that's not a jab at you personally. Rather, I come to HN to discuss stuff with the other users, not read LLM generated text. If that's what I wanted, I don't need to come here, do I?

Sadly there's more and more of that here, with many folks not even saying they used an LLM (I use that term because "AI" doesn't actually exist) to generate their comment. I appreciate that you did so. Thanks!

mapt 11 hours ago

But we almost never do. Have you seen the legal code? Every large corporation commits criminal acts many times a day. Even crimes so serious or offensive that they become politically relevant are almost always dealt with in a totally hands-off manner.

To actually get convicted of anything as a corporate officer, you have to have substantially defrauded your own shareholders, who are senior to the public's interest in justice. Most such crimes involve financial malfeasance.

bfdm 8 hours ago

Yea we should change that. Corporate life without parole: sorry, you don't get to be a business anymore, bye.

[-]

victorbjorklund 5 hours ago

And all those employees and customers are punished for the crimes of a few.

[-]

jjani an hour ago

For egregious cases, yes. Absolutely. That very short term pain is almost instantly offset by the societal gain brought about by companies' better adherence to the law. It's incredible just how much good it would do, and how quickly this would happen.

And please don't assume a "you wouldn't if it was your own employer" - no, I very much would, despite the struggles it would cause.

nickpsecurity 9 hours ago

There's multiple options:

1. Hit them with fines or punitive damages high enough to wipe out all their operating profit and executive pay for as many years as a person would be in prison.

2. Seize the company (retainership?), replace its executives, and make the new leaders sign off to not do that thing again. That's in addition to a huge fine.

3. Dissolve it. Liquidate its assets.

They usually just let the big companies off while throwing everything they have at many individuals who aren't corporations.

For settlement-type deals, maybe see if they'll give all authors they ripped off free access to Claude models, too. They reap the benefits of what was produced. At cost with certain amount of free credits.

boredatoms 10 hours ago

Thats fixable

ra 13 hours ago

No, but you can jail directors if a company has committed a crime.

mcv 13 hours ago

Yeah, but this is a corporation. They don't go to jail. They're only people when it's beneficial to them.

[-]

_carbyau_ 7 hours ago

They can be caught out. Such as being at a Coldplay concert...

weird-eye-issue 12 hours ago

No you wouldn't

singpolyma3 12 hours ago

You don't though

YetAnotherNick 9 hours ago

You wont be put to jail if you breach copyright in almost any country, at least not just for downloading content from libgen or torrent. If you are talking about Swartz, he was going to jail for wire fraud and hacking not breaching copyright.

stevage 14 hours ago

What? Who goes to jail over copyright infringement?

[-]

hmmokidk 13 hours ago

…Aaron Swartz?!

[-]

Lerc 13 hours ago

He was charged with wire fraud, computer fraud, unlawfully obtaining information from a protected computer, and recklessly damaging a protected computer.

Granted, the motivation was the copyright infringement, but to do what they did they needed to dress it up.

[-]

lcnPylGDnU4H9OF 3 hours ago

> Granted, the motivation was the copyright infringement

And this is why it is correct to say that he was persecuted for copyright infringement. Noting that he wasn't charged with anything related to copyright doesn't change the story, it only makes it less agreeable.

kylecazar 13 hours ago

The case against Aaron was more farcical than copyright infringement, which they couldn't/didn't bring against him.

s3graham 13 hours ago

kg 14 hours ago

> Penalties to be applied in cases of criminal copyright infringement (i.e., violations of 17 U.S.C. § 506(a)), are set forth at 18 U.S.C. § 2319. Congress has increased these penalties substantially in recent years, and has broadened the scope of behaviors to which they can apply. See this Manual at 1847.

> Statutory penalties are found at 18 U.S.C. § 2319. A defendant, convicted for the first time of violating 17 U.S.C. § 506(a) by the unauthorized reproduction or distribution, during any 180-day period, of at least 10 copies or phonorecords, or 1 or more copyrighted works, with a retail value of more than $2,500 can be imprisoned for up to 5 years and fined up to $250,000, or both. 18 U.S.C. §§ 2319(b), 3571(b)(3).

If you broaden it to include DMCA violations you could spend a lot of time in jail. It's even worse in some other countries.

[-]

stevage 6 hours ago

Are there any examples of small-time infringers actually going to jail?

[-]

sokoloff 3 hours ago

Given the topic at hand, would you consider Anthropic’s actions “small-time infring[ing]”?

Lerc 13 hours ago

Does the $2500 count if it is 25 $100 instances? Similarly does the 10 copies cover 10 items copied once or does it need to be one item copied at least 10 times?

[-]

lazide 13 hours ago

If you piss off someone enough to care, they’ll do the maximum and see if you plea down - or the judge agrees.

With a typical torrenter, it would be straightforward to make some truly monumental penalties.

The reality is, they rarely care.

DyslexicAtheist 13 hours ago

arent the US trying to extradite Kim Dotcom for years now? (or were at least in the past)

[-]

Terr_ 3 hours ago

Gosh, now you've got me feeling old, as I remember inveterate fraudster "Kim Kimble" circa 2001, trying to convince everyone that he was a glorious visionary general of anti-Al-Queda army of hackers...

decremental 14 hours ago

You don't go to jail for copyright infringement lol

[-]

dragonwriter 13 hours ago

You can, but criminal copyright infringement has narrower scope as well as more stringent standard of proof compared to civil copyright infringement.

koolala 13 hours ago

Even if you don't pay the exorbitant fines?

[-]

weird-eye-issue 7 hours ago

Technically at that point it would be for something like contempt of court if you had a judgment against you and you just ignored it

jonplackett 14 hours ago

Will you actually get the mo ey or will your publisher finally earn out the advances?

gpm 9 hours ago

Just a FYI that it's closer to $6750 (Anthropic pays $9000, but 25% is likely to go to the attorneys - the exact number here is up to the court).

Can't help but feel the reporting about $3000/work is going to leave a lot of authors disappointed when they receive ~$2250 even if they'd have been perfectly happy if that was the number they initially saw.

tartoran 16 hours ago

> I think that's fair, considering that two of those books received advances under $20K and never earned out.

It may be fair to you but how about other authors? Maybe it's not fair at all to them.

[-]

terminalshort 14 hours ago

Do they sell their books for more than $3000 per copy? In that case it isn't fair. Otherwise they are getting a windfall because of Anthropic's stupidity in not buying the books.

[-]

seanhunter 2 minutes ago

If you read the copyright text on the back of the title page of a book, buying it doesn’t give you the right to “mechanically reproduce” the book. I would be very surprised if there was a court ruling that didn’t either A)completely strike that notice and say it’s fair game to photocopy or scan books you have bought (which is not what courts have held in the past, so it would be a big shift) or B)uphold it and say it also applies to scraping the content of a book for training.

…especially given the US “fair use” doctrine takes into account the effect that a particular might have on the market for similar works, so the authors are bound to argue that the existence of AI that can reproduce fanfiction-like facsimiles of works at scale is going to poison the well and reduce the market for people spending actual money on future works (whether or not that’s true is another question).

So in my view the court is going to say that buying a book doesn’t give them the right to train on the contents because that is mechanical reproduction which is explicitly disallowed by the copyright notice and they don’t fall under the “fair use” carveout because they affect the future market. There isn’t anywhere else where they were granted the right to use the authors’ works so the work is disallowed. Obviously no court finding is ever 100% guaranteed but that really seems the only logically-consistent conclusion they could come to.

paulryanrogers 14 hours ago

Some judgements are punitive, to deter future abuse. Otherwise why pay for anything when you can just always steal and pay only what's owed whenever you're caught?

[-]

terminalshort 14 hours ago

Yes, in this particular case the damages are statutory, which means they are specifically punitive and not in compensation to the author. This is why it is definitely not unfair to the author. It is a lucky win for them.

[-]

godelski 14 hours ago

I think you are using a naïve model. You're making the comparison based on "price of book" vs "compensation". Do you think thats all the costs here? Who knows about OP, but I'm willing to bet many of those authors taught legal council, which costs money. Opportunity costs are also difficult to measure. Same with lost future incomes.

I don't think $3k is likely a bad deal, but I still think you're over simplifying things.

[-]

terminalshort 13 hours ago

This is a class action suit, so the legal fees are almost certainly being paid on contingency and not out of pocket. And there is no opportunity cost or lost future income here because this is piracy not theft. The authors were never deprived of any ability to continue to sell their work through normal channels. They only lost the revenue from the sale of a single copy.

[-]

godelski 7 hours ago

  > the legal fees are almost certainly being paid on contingency and not out of pocket.

The legal fees for this lawsuit. Not the legal feels for anyone who went and talked to a lawyer suspecting their material was illegitimately used.

You're treating the system as isolated when it is not.

  > no opportunity cost or lost future income here because this is piracy not theft.

I think you are confused. Yes, it is piracy but not like the typical piracy most of us do. There's no loss in pirating a movie if you would never have paid to see the movie in the first place.

But there's future costs here as people will use LLMs to generate books, which is competition. The cost of generating such a book is much cheaper, allowing for a much cheaper product.

  > They only lost the revenue from the sale of a single copy.

In your effort to simplify things you have only complicated them.

[-]

terminalshort 7 hours ago

You are not entitled to protection from future competition, only from loss of sales of your current work. You are not ever entitled to legal fees you pay if you don't file a suit.

[-]

godelski 6 hours ago

  > You are not entitled to protection from future competition

What do you think patents, copyright, trademarks, and all this other stuff is even about?

There's "Statutory Damages" which account for a wide range of things[0].

Not to mention you just completely ignored what I argued!

Seriously, you've been making a lot of very confident claims in this thread and they are easy to verify as false. Just google some of your assumptions before you respond. Hell, ask an LLM and they'll tell you! Just don't make assumptions and do zero amount of vetting. It's okay to be wrong, but you're way off base buddy.

[0] https://en.wikipedia.org/wiki/Statutory_damages

[-]

vidarh 3 hours ago

[-]

jimmydorry an hour ago

Copyright protects you from market substitutions (e.g. someone taking your IP, and offering an alternative to your work). Being trained on your IP, it could certainly be argued that users would no longer need to purchase your book.

"Future competition" is a loosely worded way of saying this.

[-]

vidarh 32 minutes ago

"It could be argued" but the judge in this case has already ruled that the training does not violate copyright. Market substitution only comes into play to determine fair use if copyright has already been infringed.

iamsaitam 4 hours ago

"The authors were never deprived of any ability to continue to sell their work through normal channels" this isn't exactly true is it? If the "AI" used their books for training, then it's able to provide information/value/content from them, lowering the incentive for people to buy these books.

[-]

vidarh 3 hours ago

However, the judge does not appear to believe they have any legal right to protection from that in this case. The settlement is over their use of pirated copies instead of buying one copy of each of the works in question.

[-]

jimmydorry an hour ago

I haven't read this particular case, but typically judges will keep the judgement as narrow as possible... so it may entirely be the case that these IP owners or in similar future cases may also have legal right to protection from it.

[-]

vidarh 42 minutes ago

The judge has already ruled that using books to train AI does not in itself violate US copyright law, and so the surviving claims from plaintffs were relating to Anthropic pirating books.

f33d5173 6 hours ago

Supposing a book is usually $30, then this would be a factor of 100 above that. That seems fairly punitive to me.

godelski 14 hours ago

  | Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.[0]

Please don't be disingenuous. You know that none of the authors were selling their books for $3k a piece, so obviously this is about something more

  > because of Anthropic's stupidity in not buying the books.

And what about OpenAI, who did the same thing?

What about Meta, who did the same thing?

What about Google, who did the same thing?

What about Nvidia, who did the same thing?

Clearly something should be done because it's not like these companies can't afford the cost of the books. I mean Meta recently hired people giving out >$100m packages and bought a data company for $15bn. Do you think they can't afford to buy the books, videos, or even the porn? We're talking about trillion dollar companies.

It's been what, a year since Eric Schmidt said to steal everything and let the lawyers figure it out if you become successful?[1] Personal I'm not a big fan of "the ends justify the means" arguments. It's led to a lot of unrest, theft, wars, and death.

Do you really not think it's possible to make useful products ethically?

[0] https://news.ycombinator.com/newsguidelines.html

[1] https://www.theverge.com/2024/8/14/24220658/google-eric-schm...

[-]

janalsncm 13 hours ago

This isn’t a deal to sell their books. The authors are getting $3k per book while maintaining the rights to their IP. The settlement is to avoid statutory damages which are between $750 and $30k or more per infringement.

One of the consequences of retaining their rights is that they can also sue Meta and Google and OpenAI etc for the same thing.

[-]

godelski 13 hours ago

I think we are in agreement[0]. I was just focusing on a different part

[0] https://news.ycombinator.com/item?id=45190232

kelnos 12 hours ago

> And what about $OTHER_AI_COMPANY, who did the same thing?

If there's evidence of this that will stand up in court, they should be sued as well, and they'll presumably lose. If this hasn't happened, or isn't in the works, then I guess they covered their tracks well enough. That's unfortunate, but that's life.

[-]

godelski 6 hours ago

I mean they are being sued? I provided a long list of HN links in the sibling comment. But you know... you can also check Google[0]

[0] https://gprivate.com/6ib6y

terminalshort 13 hours ago

Where is your evidence that Meta, Google, and OpenAI did the same thing? (As for NVIDIA, do they even train models?) Because if they did, why haven't they been sued? This is a garden variety copyright infringement case and would be a slam dunk win for the plaintiffs. The only novel part of the case is the claim that the plaintiffs lost on, which establishes president that training an LLM is fair use.

> Clearly something should be done because it's not like these companies can't afford the cost of the books

Yes indeed it should, and it has. They have been forced to pay $3000 per book they pirated, which is more than 100x what they would have gained if they had gotten away with it.

IMO a fine of 100x the value of a copy of the pirated work is more than sufficient as a punishment for piracy. If you want to argue that the penalty should be more, you can do that, but it is completely missing my point. You are talking about what is fair punishment to the companies, and my comment was talking about what is fair compensation to the authors. Those are two completely different things.

[-]

jimmydorry an hour ago

> IMO a fine of 100x the value of a copy of the pirated work is more than sufficient as a punishment for piracy.

Anti-piracy groups use scare letters on pirates where they threaten to sue for tens of thousands of dollars per instance of piracy. Why should it be lower for a company?

vidarh 3 hours ago

> As for NVIDIA, do they even train models?

Yes. Nemotron:

https://www.nvidia.com/en-gb/ai-data-science/foundation-mode...

godelski 6 hours ago

I mean you can Google these... They also have been popping up on HN for the last year, it is even referenced in the article, and there's even another post in the sidebar titled "Anthropic Record AI Copyright Pact Sets Bar for OpenAI, Meta"[0], so I really didn't feel it was necessary to provide links. But sure, if you're feeling lazy, I got your back. I'll even limit it to HN posts so you don't have to even leave the site

  Torrenting:
  Meta Pirating Books[1,2,3]
    - [1] Fun fact, [1] is the most popular post of all time on HN for the search word "torrent" and the 5th ranking for "Meta". [2] is the 16th for "illegal"
  Nvidia [4,5]
  Apple, Nvidia, Anthropic[6]
  GitHub [7,8]
  OpenAI [9,10]
  Google [11]
    - I mean this one was even mentioned in the articled from the Anthropic post from a few days ago[12]

I hope that's sufficient. You can find plenty more if you do a good old fashion search instead of just using the HN search. But most of these were pretty high profile stories so was pretty quick to look.

  > which establishes president that training an LLM is fair use.
                      ~~~~~~~~~
                      precedent

I think you misunderstand. The precedent is over the issue of piracy. This has not made precedence over the issue of fair use. There is ongoing litigation, but there was precedence set in another lawsuit with Meta[13], which is currently going through appeals. I'll give you a head start on that one [14,15]. But the issue of fair use is still being debated. These things take years and I don't think anyone will be surprised when this stuff lands in some of the highest courts and gets revisited in a different administration.

  > IMO a fine of 100x the value of a copy of the pirated work is more than sufficient as a punishment for piracy.

Sure. You can have whatever opinion you want. I wasn't arguing about your opinion. I even agreed with it[16]!

But that is a different topic all together. I still think you've vastly over simplified the conversation and thus unintentionally making some naive assumptions. It's the whole reason I said "probably" in [16]. The big difference being just that you're smart enough to figure out how law works and I'm smart enough to know that neither of us are lawyers.

And please don't ask me for more citations unless they are difficult to Google... I think I already set some kinda record here...

  [0] https://archive.is/3oCg8
  [1] https://news.ycombinator.com/item?id=42971446
  [2] https://news.ycombinator.com/item?id=43125840
  [3] https://news.ycombinator.com/item?id=42772771
  [4] https://news.ycombinator.com/item?id=40505480
  [5] https://news.ycombinator.com/item?id=41163032
  [6] https://news.ycombinator.com/item?id=40987971
  [7] https://news.ycombinator.com/item?id=33457063
  [8] https://news.ycombinator.com/item?id=27724042
  [9] https://news.ycombinator.com/item?id=42273817
  [10] https://news.ycombinator.com/item?id=38781941
  [11] https://news.ycombinator.com/item?id=11520633
  [12] https://news.ycombinator.com/item?id=45142885
  [13] https://perkinscoie.com/insights/update/court-sides-meta-fair-use-and-dmca-questions-leaves-door-open-future-challenges
  [14] https://arstechnica.com/tech-policy/2025/07/meta-pirated-and-seeded-porn-for-years-to-train-ai-lawsuit-says/
  [15] https://torrentfreak.com/copyright-lawsuit-accuses-meta-of-pirating-adult-films-for-ai-training/
  [16] https://news.ycombinator.com/item?id=45190232

giveita 13 hours ago

If I copy your book and sell a million bootleg copies that compete directly with your book is that worth the $30 cover price?

This is what generative AI essentially is.

Maybe the payment should be $500/h (say $5k a page) to cover the cost of preparing a human verified dataset for anthropic.

[-]

aeon_ai 13 hours ago

It’s been determined that training is fair use by the same judge - Anthropic did in fact buy copies of books and train on those as well.

Thus the $3k per violation is still punitive at (conservatively) 100x the cost of the book.

Given that it is fair use, Authors do not have rights to restrict training on their works under copyright law alone.

II2II 12 hours ago

The thing is: you aren't distributing copies with generative AI, in any sensible meaning of the word.

Don't get me wrong: I think this is in incredibly bad deal for authors. That said, I would be horrified if it wasn't treated as fair use. It would be incredibly destructive to society since people would try to use such rulings to chissel away at fair use. Imagine schools who had to pay yearly fees to use books. We know they would do that, they already try to do so (single use workbooks, online value added services). Or look at software. It is already going to be problematic for people who use LLMs. It is already problematic due to patents. Now imagine what would happen if reformulating algorithms that you read in a book was not considered as fair use. Or look at books themselves. A huge chunk of non-fiction consists of doing research and re-expressing ideas in non-original terms. Is that fair use? The main difference between that and a generative AI is we can say a machine did it in the case of generative AI, but is that enough to protect fair use in the conventional sense?

[-]

giveita 9 hours ago

This is parallel to mass surveillance. Surveillance is OK (private eye) so dragnetting is also OK as it is just scaled up private detectives. If 1 is OK then 1+1 is OK. And so is by peano, a Googolplex of the OK thing.

rkagerer 10 hours ago

Imagine schools who had to pay yearly fees to use books

I feel like we aren't far from that. Wouldn't be surprised if new books get published (in whatever medium) that are licensed out instead of sold.

terminalshort 13 hours ago

In that case the damages would be $3000 per copy you sold. Distributing copyrighted work is an entirely different category of offense than just simply downloading and consuming. Anthropic didn't distribute any copies, so the damages are limited to the one copy they pirated. That is not remotely what generative AI is, and it's why the judge ruled that it was perfectly legal to feed the books to the model.

megaman821 12 hours ago

I am not sure what types of books you read, but AI has replaced absolutely no books for me.

jawns 16 hours ago

Then they can opt out of the class.

[-]

gowld 15 hours ago

Or the judge can reject the settlement as insufficient, which is what TFA is about.

[-]

NoahZuniga 14 hours ago

That doesn't seem why the judge rejected the settlement. To me it seems like there judge thought that the details weren't worked out enough to tell if its reasonable.

eschaton 13 hours ago

In my opinion, as a class member you should push for two things:

1. Getting the maximum statutory damages for copyright infringement, which would be something like &250,000 per instance of infringement; you can be generous and call their training and reproduction of your works as a single instance, though it’s probably many more than that. 2. An admission of wrongdoing plus withdrawal from the market and permanent deletion of all models trained on infringed works. 3. A perpetual agreement to only train new models on content licensed for such training going forward, with safeguards to prevent wholesale reproduction of works.

It’s no less than what they would do if they thought you were infringing their copyrights. It’s only fair that they be subject to the same kind of serious penalties, instead of something they can write off as a slap on the wrist.

sh1mmer 10 hours ago

I’m curious about how you confirmed some things you wrote were in the dataset.

Unai 11 hours ago

As I understand, this case is not about training but about illegitimately sourcing the books, so unless you sell your books at $3k per copy, I don't see how it is fair.

thayne 15 hours ago

How much of that $9000 will go to your publisher?

[-]

jawns 15 hours ago

Remains to be seen, but generally the holder of copyright is the author not the publisher.

[-]

jonathanstrange 15 hours ago

That depends on the publishers and your standing with them. Many publishers want a copyright transfer agreement whereas others are fine with exclusive licensing rights. You can't transfer copyright in some countries (e.g. Germany) but you can in the US.

[-]

favorited 13 hours ago

Even though the US allows copyright assignment, none of the Big Five publishing houses in the US require it as part of a standard book deal, even with first-time authors. If you open any book or ebook to the copyright page, unless it's something like a reference book (which are frequently work-for-hire), it will say some variant of "© Author's Name."

Publishers get exclusive print publishing rights for a given market, typically get digital and audio publication rights for the same, and frequently get a handful of other rights like the ability to license it for publication in other markets. But ownership of the work is almost always retained by the author.

pclmulqdq 9 hours ago

I don't think you should work with a publisher who wants a copyright transfer. It is not part of standard book deals.

[-]

vidarh 3 hours ago

Even exclusive licensing rights very often have limitations to them such as a duration or requirements to keep the license, and people should be vary about working with a publisher who wants exclusive licensing without termination clauses that protects them as well.

motbus3 2 hours ago

Who am I to say anything.

It is just another opinion.

It is not about 9k for your knowledge in that book. Is 9k for taking you out. The faster they are able to grab data and process the less chance you have to make money from your work.

The money is irrelevant if we allow them to break the law. They even might pay you 9k for those books, but you might never get anything again because they would have made copyright useless

Suppafly 8 hours ago

>I think that's fair, considering that two of those books received advances under $20K and never earned out.

Doesn't that mean the money should go to your publisher instead of you?

franze 17 hours ago

where can i check if my book was in it?

[-]

pier25 17 hours ago

Maybe here: https://www.anthropiccopyrightsettlement.com/

simonw 9 hours ago

One of the sources is LibGen, you can search that with this tool: https://www.theatlantic.com/technology/archive/2025/03/searc...

nextworddev 14 hours ago

Fair for you maybe

hsaliak 10 hours ago

Might be fair for you, is it fair to JK Rowling?

[-]

iamsaitam 4 hours ago

Does JK Rowling really deserve any fairness? She doesn't seem to think that everyone deserves it

stubish 9 hours ago

Yes. JK Rowling can still sue about her work being used for training. This lawsuit is about the illegal downloading of her works.

k__ 17 hours ago

Cool.

Where can I check if I'm eligible?

[-]

jawns 16 hours ago

https://www.theatlantic.com/technology/archive/2025/03/searc...

[-]

gabriel666smith 16 hours ago

This is a 2025 snapshot of Libgen, rather than the dataset Anthropic trained on, or the 'reduced to 500k books' dataset of books by authors who are deemed legally due some cash from Anthropic because Anthropic downloaded the ebook file.

midnitewarrior 10 hours ago

What's more fair is for Anthropic to put 5% of their preferred shares at their most recent valuation into a pool that the authors of these books can make a claim against. For 18 months, any author in this cache of books can claim their ownership and rights to their proportional amount of the shares within all claimants.

Perhaps tokenize all of the books and assign proportionally for token count of each publication.

[-]

xvector 4 hours ago

What a ridiculous assertion. They're already getting 100-1000x the value of their books. Truly bloodlust knows no bounds.

echelon 14 hours ago

> that doesn't necessarily mean that those models are a lasting asset.

It remains to be seen, but typically this forms a moat. Other companies can't bring together the investment resources to duplicate the effort and they die.

The only reasons why this wouldn't be a moat:

1. Too many investment dollars and companies chasing the same goal, and none of them consolidate. (Non-consolidation feels impractical.)

2. Open source / commoditize-my-complement offerings that devalue foundation models. We have a few of these, but the best still require H100s and they're not building product.

I think there's a moat. I think Anthropic is well positioned to capitalize from this.

SilasX 13 hours ago

Be careful what you wish for.

While I'm sure it feels good and validating to have this called copyright infringement, and be compensated, it's a mixed blessing at best. Remember, this also means that your works will owe compensation to anyone you "trained" off of. Once we accept that simply "learning from previous copyrighted works to make new ones" is "infringement", then the onus is on you to establish a clean creation chain, because you'll be vulnerable to the exact same argument, and you will owe compensation to anyone whose work you looked at in learning your craft.

This point was made earlier in this blog post:

https://blog.giovanh.com/blog/2025/04/03/why-training-ai-can...

HN discussion of the post: https://news.ycombinator.com/item?id=43663941

[-]

marcus_holmes 10 hours ago

LLMs cannot create copyrightable works. Only humans can do that [0]. So LLMs are not making new copyrightable works.

[0] not because we're so amazingly more creative. But because copyright is a legal invention, not something derived from first principles, and has been defined to only apply to human creations. It could be changed to apply to LLM output in the future.

[-]

SilasX 8 hours ago

What is that replying to? I don’t see the relevance to my comment.

simonw 9 hours ago

This settlement isn't about an LLM being trained in your work, it's about Anthropic downloading a pirated ebook of your work. https://simonwillison.net/2025/Sep/6/anthropic-settlement/

brendoelfrendo 13 hours ago

It's a good thing that laws can be different for AI training and human consumption. And I think the blog post you linked makes that argument, too, so I'm not sure why you'd contort it into the idea that humans will be compelled to attribute/license information that has inspired them when creating art.

[-]

SilasX 8 hours ago

Right — laws can be arbitrary, and ignore constraints like consistency! It’s just something sane people try to avoid.

[-]

ch_fr 3 hours ago

The inconsistency you're talking about is only based on the premise that LLMs and humans are "basically the same thing and thus should be treated the exact same way in this kind of situation". But I don't really see why that would be the case in the first place.

Now don't take me wrong, I'm not saying that a rushed regulatory response is a good thing, it's more about the delivery of your reply. I see those arguments a lot: people smugly saying "Well, YOU too learn from things, how about that? Not so different from the machine huh?" and then continuing the discussion based on that premise, as if we were supposed to accept it as a fact.

abtinf 11 hours ago

This is basically the socialist/communist argument for mass expropriation.

visarga 17 hours ago

How is it fair? Do you expect 9,000 from Google, Meta, OpenAI, and everyone else? Were your books imitated by AI?

Infringement was supposed to imply substantial similarity. Now it is supposed to mean statistical similarity?

[-]

jawns 16 hours ago

You've misunderstood the case.

The suit isn't about Anthropic training its models using copyrighted materials. Courts have generally found that to be legal.

The suit is about Anthropic procuring those materials from a pirated dataset.

The infringement, in other words, happened at the time of procurement, not at the time of training.

If it had procured them from a legitimate source (e.g. licensed them from publishers) then the suit wouldn't be happening.

[-]

greensoap 14 hours ago

A point of clarifications and some questions.

The portion the court said was bad was not Anthropic getting books from pirated sites to train its model. The court opined that training the model was fair use and did not distinguish between getting the books from pirated sites or hard copy scans. The part the court said was bad, which was settled, was Anthropic getting books from a pirate site to store in a general purpose library.

  "To summarize the analysis that now follows, the use of the books at issue to train Claude
  and its precursors was exceedingly transformative and was a fair use under Section 107 of the
  Copyright Act. And, the digitization of the books purchased in print form by Anthropic was. 
  also a fair use but not for the same reason as applies to the training copies. Instead, it was a
  fair use because all Anthropic did was replace the print copies it had purchased for its central
  library with more convenient space-saving and searchable digital copies for its central
  library — without adding new copies, creating new works, or redistributing existing copies.
  However, Anthropic had no entitlement to use pirated copies for its central library. Creating a
  permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy."

  "Because the legal issues differ between the *library copies* Anthropic purchased and
  pirated, this order takes them in turn."

Questions

As an author do you think it matters where the book was copied from? Presumably, a copyright gives the author the right to control when a text is reproduced and distributed. If the AI company buys a book and scans it, they are reproducing the book without a license, correct? And fair use is the argument that even though they violated the copyright, they are execused. In a pure sense, if the AI company copied (assuming they didn't torrent back the book) from a "pirate source" why is that copy worse then if they copied from a hard book?

[-]

8note 14 hours ago

> AI company buys a book and scans it, they are reproducing the book without a license, correct

isn't digitizing your own copies as backups and personal use fine? so long as you dont give away the original while keeping the backups. similarly, dont give away the digital copies.

[-]

esrauch 13 hours ago

It is, Google Books did it over a decade ago (bought up physical books and scanned them all). There were some rulings about how much of a snippet they were allowed to show end users as fair use, but I'm fairly sure the actual scanning and indexing of the books was always allowed.

cortesoft 14 hours ago

> If the AI company buys a book and scans it, they are reproducing the book without a license, correct?

No? I think there are a lot more details that need to be known before answering this question. It matters what they do with it after they scan it.

[-]

greensoap 13 hours ago

That is only relevant to whether it is fair use not to whether the copying is an infringement. Fair use is what is called an affirmative defense -- it means that yes what I did was technically a violation but is forgiven. So on technicalities the copying is an infringement but that infringement is "okay" because there is a fair use. A different scenario is if the copyright owner gives you a license to copy the work (like open source licenses). In that scenario the copying was not an infringement because a license exists.

[-]

gpm 6 hours ago

> Fair use is what is called an affirmative defense

Yes

> it means that yes what I did was technically a violation but is forgiven

Not at all. All "affirmative defence" means is that procedural the burden is on me to establish that I was not violating the law. The law isn't "you can't do the thing", rather it is "you can't do the thing unless its like this". There is no violation, there is no forgiveness as there is nothing to forgive, because it was done "like this" and doing it "like this" doesn't violate the law in the first place.

cortesoft 13 hours ago

If I have have an app on my phone that lets me point my phone at a page to scan, OCR, and read the page out loud to me, it wouldn't even require fair use, would it?

mmargenot 16 hours ago

Do foundation model companies need to license these books or simply purchase them going forward?

[-]

sharkjacobs 16 hours ago

> On June 23, 2025, the Court rendered its Order on Fair Use, Dkt. 231, granting Anthropic’s motion for summary judgment in part and denying its motion in part. The Court reached different conclusions regarding different sources of training data. It found that reproducing purchased and scanned books to train AI constituted fair use. Id. at 13-14, 30–31. However, the Court denied summary judgment on the copyright infringement claims related to the works Anthropic obtained from Library Genesis and Pirate Library Mirror. Id. at 19, 31.

https://www.documentcloud.org/documents/26084996-proposed-an...

> reproducing purchased and scanned books to train AI constituted fair use

[-]

greensoap 14 hours ago

Actually, the court really only said downloading a pirated book to store in your "library" was bad. The opinion is intentionally? ambiguous on whether the decision regarding copies used to train an LLM applies only to scanned books or also to pirated books. The facts found in the case are the training datasets were made from the "library" copies of books that included scans and pirated downloads. And the court said the training copies were fair use. The court also said the scanned library copies were fair use. The court found that the pirated library copies was not fair use. The court did not say for certain whether the pirated training copies were fair use.

thaumasiotes 16 hours ago

The usual analysis was that when you download a book from Library Genesis, that is an instance of copyright infringement committed by Library Genesis. This ruling appears to reverse that analysis.

[-]

papercrane 15 hours ago

Do you have a source for that because MAI Systems Corp. v. Peak Computer, Inc established that even creating a copy in RAM is considered a "copy" under the Copyright Act and can be infringement.

[-]

parineum 14 hours ago

It's not an issue of where it's being copied, it's who's doing the copying.

Library Genesis has one copy. It then sends you one copy and keeps it's own. The entity that violated the _copy_right is the one that copied it, not the one with the copy.

[-]

masfuerte 14 hours ago

There are many copies made as the text travels from Library Genesis to Anthropic. This isn't just of theoretical interest. English law has specific copyright exemptions for transient copies made by internet routers, etc. It doesn't have exemptions for the transient copies made by end users such as Anthropic, and they are definitely infringing.

Of course, American law is different. But is it the case that copies made for the purpose of using illegally obtained works are not infringing?

[-]

thaumasiotes 13 hours ago

> But is it the case that copies made for the purpose of using illegally obtained works are not infringing?

Well, the question here is "who made the copy?"

If you advertise in seedy locations that you will send Xeroxed copies of books by mail order, and I order one, and you then send me the copy I ordered, how many of us have committed a copyright violation?

[-]

masfuerte 4 hours ago

Copyright law is literally about the copies. A xeroxed book is exactly one copy. Mailing and reading that book doesn't copy it any further. In contrast, you can't do anything with digital media without making another copy.

> "Who made the copy?"

This begs the question. With digital media everybody involved makes multiple copies.

bhickey 16 hours ago

Probably the latter.

gowld 16 hours ago

I thought that distribution of copyrighted materials was legally encumbered, not reception thereof.

[-]

adrr 14 hours ago

Downloading is making a copy and covered by copyright law. Its also covered by statutory damages clause of up to $150k per violation if willful. I assume Anthropic knew they were using pirated books.

thayne 15 hours ago

Do you have a source for that? My understanding was that both were illegal, although of course media companies have an interest in making people believe that even if it isn't true.

lawlessone 16 hours ago

Did they use a torrent? If they used a torrent isn't it likely they distributed it while downloading it?

[-]

gkbrk 15 hours ago

Surely a state-of-the-art tech company would know how to disable seeding.

[-]

LeoPanthera 14 hours ago

BitTorrent clients will not send data to clients which aren't uploading, as far as I know.

wingspar 16 hours ago

My understanding is this settlement is about the MANNER in which Anthropic acquired the text of the books. They downloaded illegal copies of the books.

There was no issues with the physical copies of books they purchased and scanned.

I believe the issue of USING these texts for AI training is a separate issue/case(s)

Retric 16 hours ago

Penalties can be several times actual damages, and substantial similarity includes MP3 files and other lossy forms of compression which don’t directly look like the originals.

The entire point of deep learning is to copy aspects from training materials, which is why it’s unsurprising when you can reproduce substantial material from a copyrighted work given the right prompts. Proving damages for individual works in court is more expensive than the payout but that’s what class action lawsuits are for.

gruez 16 hours ago

>Were your books imitated by AI?

Given that books can be imitated by humans with no compensation, this isn't as strong as an argument as you think. Moreover AFAIK the training itself has been ruled legal, so Anthropic could have theoretically bought the book for $20 (or whatever) and be in the clear, which would obviously bring less revenue than the $9k settlement.

[-]

visarga 16 hours ago

Copyright should be about copying rights, not statistical similarities. Similarity vs causal link - a different standard all together.

[-]

gruez 16 hours ago

>Copyright should be about copying rights, not statistical similarities

So you're agreeing with me? The courts have been pretty clear on what's copyrightable. Copyrights only protect specific expressions of an idea. You can copyright your specific writing of a recipe, but not the concept of the dish or the abstract instructions itself.

dotnet00 16 hours ago

Those statistical similarities originate from a copyright violation, there's your causal link. Basically the same as selling a game made using pirated Photoshop.

[-]

reissbaker 16 hours ago

Selling a game whose assets were made with a pirated copy of Photoshop does not extend Adobe's copyright to cover your game itself. They can sue you for using the pirated copy of Photoshop, but they can't extend copyright vampirically in that manner — at least, not in the United States.

(They can still sue for damages, but they can't claim copyright over your game itself.)

[-]

dotnet00 15 hours ago

Are the authors claiming copyright over the LLM? My understanding is they were suing Anthropic for using the authors' data in their training product. The court ruled that using the books for training would be fair use, but that piracy is not fair use.

Thus, isn't the settlement essentially Anthropic admitting that they don't really have an effective defense against the piracy claim?

thaumasiotes 13 hours ago

Well, there are damages torts and there's also an unjust enrichment tort. In the paradigm example where you make funding available to your treasurer and he makes an unscheduled stop in Las Vegas to bet it on black, you can sue him for damages. If he lost the bet, he owes you the amount he lost. If he won, he owes you nothing (assuming he went on and deposited the full amount in your treasury as expected).

Or you could sue him on a theory of unjust enrichment, in which case, if he lost, he'd owe you nothing, and if he won, he'd owe you all of his winnings.

It's not clear to me why the same theory wouldn't be available to Adobe, though the copyright question wouldn't be the main thrust of the case then.

gowld 15 hours ago

What is illegal about using pirated software that someone else distributed to you, if you never agreed to a license contract?

[-]

dotnet00 15 hours ago

If you can show that the pirated copy was provided to you without your knowledge, and that there was no reasonable way for you to know that it was pirated, there probably isn't anything illegal about it for you.

But otherwise, you're essentially asking if you can somehow bypass license agreements by simply refusing to read them, which would obviously render all licensing useless.

[-]

thaumasiotes 13 hours ago

Why do you think reading the agreement is notionally mandatory before the software becomes functional?

[-]

dotnet00 10 hours ago

Most paid software generally makes you acknowledge that you have read and accepted the terms of the license before first use, and includes a clause that continued use of the software constitutes acceptance of the license terms.

In the event that you try to play games to get around that acknowledgement: Courts aren't machines, they can tell that you're acting in bad faith to avoid license restrictions and can punish you appropriately.

[-]

thaumasiotes 10 hours ago

>> Why do you think reading the agreement is notionally mandatory before the software becomes functional?

> Most paid software generally makes you acknowledge that you have read and accepted the terms of the license before first use, and includes a clause that continued use of the software constitutes acceptance of the license terms.

Huh. If only I'd known that.

Why do you think that is?

[-]

dotnet00 9 hours ago

How about you directly say what you're trying to say instead of being unnecessarily sarcastic?

terminalshort 14 hours ago

The statistical similarities originate from fair use, just as the judge ruled in this case.

Retric 16 hours ago

The entire purpose of training materials is to copy aspects of them. That’s the causal link.

[-]

visarga 6 hours ago

> That’s the causal link.

But copyright was based on substantial similarity, not causal links. That is the subtle change. Copyright is expanding more and more.

In my view, unless there is substantially similarity to the infringed work, copyright should not be invoked.

Even the substantial similarity concept is already an expanded concept from original "protected expression".

It makes no sense to attack gen-AI for infringement, if we wanted the originals we would get the originals, you can copy anything you like on the web. Generating bootleg Harry Potter is slow, expensive and unfaithful to the original. We use gen-AI for creating things different from the training data.

Dylan16807 16 hours ago

The aspect it's supposed to copy is the statistics of how words work.

And in general, when an LLM is able to recreate text that's a training error. Recreating text is not the purpose. Which is not to excuse it happening, but the distinction matters.

[-]

program_whiz 16 hours ago

In training, the model is trained to predict the exact sequence of words of a text. In other words, it is reproducing the text repeatedly for its own trainings. The by-product of this training is that it influences model weights to make the text more likely to be produced by the model -- that is its explicit goal. A perfect model would be able to reproduce the text perfectly (0 loss).

Real-world absurd example: A company hires a bunch of workers. They then give them access to millions of books and have the workers reading the books all day. The workers copy the books word by word, but after each word try to guess the next word that will appear. Eventually, they collectively become quite good at guessing the next word given a prompt text, even reproducing large swaths of text almost verbatim. The owner of company Y claims they owe nothing to the book owners, because it doesn't count as reading the book, and any reproduction is "coincidental" (even though this is the explicit task of the readers). They then use these workers to produce works to compete with the authors of the books, which they never paid for.

It seems many people feel this is "fair use" when it happens on a computer, but would call it "stealing" if I pirated all the books of JK Rowling to train myself to be a better mimicker of her style. If you feel this is still fair use, then you should agree all books should be free to everyone (as well as art, code, music, and any other training material).

[-]

gruez 16 hours ago

>but would call it "stealing" if I pirated all the books of JK Rowling to train myself to be a better mimicker of her style

Can you provide an example of someone being successfully sued for "mimicking style", presumably in the US judicial system?

[-]

snowe2010 14 hours ago

> Second, the songs must share SUBSTANTIAL SIMILARITY, which means a listener can hear the songs side by side and tell the allegedly infringing song lifted, borrowed, or appropriated material from the original.

Music has had this happen numerous times in the US. The distinction isn’t an exact replica, it’s if it could be confused for the same style.

George Harrison lost a case for one of his songs. There are many others.

https://ultimateclassicrock.com/george-harrison-my-sweet-lor...

program_whiz 16 hours ago

The damages arise from the very process of stealing material for training. The justification "yes but my training didn't cause me to directly copy the works" is faulty.

I won't rehash the many arguments as to why the output is also a violation, but my point was more the absurd view that stealing and using all the data in the world isn't a problem because the output is a lossy encoding (but the explicit training objective is to reproduce the training text / image).

Retric 15 hours ago

Style in an ambiguous term here as it doesn’t directly map to what’s being considered. The case between “Blurred Lines” and “Got to Give It Up” is often considered one of style and the Court of Appeals for the Ninth Circuit upheld copyright infringement.

However, AI has been show to copy a lot more than what people consider style.

Dylan16807 15 hours ago

> In training, the model is trained to predict the exact sequence of words of a text. In other words, it is reproducing the text repeatedly for its own trainings.

That's called extreme overfitting. Proper training is supposed to give subtle nudges toward matching each source of text, and zillions of nudges slowly bring the whole thing into shape based on overall statistics and not any particular sources. (But that does require properly removing duplicate sources of very popular text which seems to be an unsolved problem.)

So your analogy is far enough off that I can't give it a good reply.

> It seems many people feel this is "fair use" when it happens on a computer, but would call it "stealing" if I pirated all the books of JK Rowling to train myself to be a better mimicker of her style.

I haven't seen anyone defend the piracy, and the piracy is what this settlement is about.

People are defending the training itself.

And I don't think anyone would seriously say the AI version is fair use but the human version isn't. You really think "many people" feel that way?

[-]

Retric 15 hours ago

There isn’t a clear line for extreme overfitting here.

To generate working code the output must follow the API exactly. Nothing separates code and natural language as far as the underlying algorithm is concerned.

Companies slightly randomize output to minimize the likelihood of direct reproduction of source material, but that’s independent of what the neural network is doing.

[-]

Dylan16807 15 hours ago

You want different levels of fitting for different things, which is difficult. Tight fighting on grammar and APIs and idioms, loose fitting on creative text, and it's hard to classify it all up front. But still, if it can recite harry potter that's not on purpose, and it's never trained to predict a specific source losslessly.

And it's not really about randomizing output. The model gives you a list of likely words, often with no clear winner. You have to pick one somehow. It's not like it's taking some kind of "real" output and obfuscating it.

[-]

Retric 15 hours ago

> often with no clear winner. You have to pick one somehow. It's not like it's taking some kind of "real" output and obfuscating it.

It’s very rare for multiple outputs to actually be equal so the only choice is to choose one at random. Instead its become accepted practice to make sub optimal choices for a few reasons, one of which really is to decrease the likelihood of reproducing existing text.

Nobody wants a headline like: “Meta's Llama 3.1 can recall 42 percent of the first Harry Potter book” https://www.understandingai.org/p/metas-llama-31-can-recall-...

[-]

Dylan16807 15 hours ago

I will say that picking the most likely word every single time isn't optimal.

[-]

Retric 15 hours ago

I agree there’s multiple reasons to slightly randomize output, but there’s also downsides.

arduanika 16 hours ago

Machines aren't people.

[-]

gruez 16 hours ago

They're not, but that's a red herring given that humans vs machines is not a relevant factor in current copyright statues or case law. Short of new laws being passed or activist judges ruling otherwise, it'll remain this way.

[-]

snowe2010 14 hours ago

But whether or not it is a machine _is_ relevant in current copyright law. https://constitutioncenter.org/blog/federal-court-rules-arti...

rideontime 14 hours ago

Direct link to Judge Alsup's order: https://www.bloomberglaw.com/public/desktop/document/Bartzet...

Name should sound familiar to those who follow tech law; he presided over Oracle v Google, along with Anthony Levandowski's criminal case for stealing Waymo tech for Uber.

[-]

wrsh07 12 hours ago

As someone who has had a passing interest in most of these cases, I've actually come to like Alsup and am impressed by his technical understanding.

His orders and opinions are, imo, a success story of the US judicial system. I think this is true even if you disagree with them

[-]

darkwizard42 12 hours ago

He actually does understand most of what he is ruling on which is a welcome surprise. Not just legal jargon but also the technical spirit of what is at stake.

bsimpson 9 hours ago

He's also the one who called bullshit when Oracle tried to claim that Java's function signatures were so novel they should be eligible for copyright. (Generally, arts are copyrightable and engineering is not - there's a creativity requirement.)

They tried to say `rangeCheck(length, start, end)` was novel. He spat back that he'd written equivalent utility functions as a hobbyist hundreds of time!

[-]

kemitchell 6 hours ago

Art versus engineering is a very dangerous generalization of the law. There is a creativity requirement for copyrightability, but it's an explicitly low bar. Search query "minimal degree of creativity".

The Supreme Court decision in Oracle v Google skipped over copyrightability and addressed fair use. Fair use is a legal defense, applying only in response to finding infringement, which can only be found if material's copyrightable. So the way the Supreme Court made its decision was weird, but it wasn't about the creativity requirement.

anp 12 hours ago

Comments so far seem to be focusing on the rejection without considering the stated reasons for rejection. AFAICT Alsup is saying that the problems are procedural (how do payouts happen, does the agreement indemnify Anthropic from civil “double jeopardy”, etc), not that he’s rejecting the negotiated payout. Definitely not a lawyer but it seems to me like the negotiators could address the rejection without changing any dollar numbers.

[-]

yladiz 11 hours ago

Yes, exactly. The article is pretty clear that it’s rejected without prejudice and that a few points need to be ironed out before he gives a preliminary approval. I suspect a lot of folks didn’t read much/any of TFA.

I do wonder if all of the kinks will be smoothed out in time. Not a lawyer too, but the timeline to create the longer list is a bit tight, and generally feels like we could see an actual rejection or at least a stretched out process here that goes on for a few more months at least before approval.

lxe 14 hours ago

Good. Approving this would have set a concerning precedent.

Edit: My stance on information freedom and copyright hasn't changed since Aaron Swartz's death in 2013. Intellectual property laws, patents, copyright, and similar protections feel outdated and serve mainly to protect established interests. Despite widespread piracy making virtually all media available immediately upon release, content creators and media companies continue to grow and profit. Why should publishers rely on century-old laws to restrict access?

[-]

tene80i 13 hours ago

Because whenever anyone argues that all creative and knowledge works should be freely available, accessible without compensating the creators, they conveniently leave out software and the people who make it.

Moreover, IP law protects plenty of people who aren’t “established interests”. You just, perhaps, don’t know them.

[-]

lxe 13 hours ago

I make the software. I use free software and I contribute to free software. I wish all the software were free from all sorts of restrictions.

[-]

tene80i 12 hours ago

That’s great. But not exclusively, right? What about your salary, assuming you’re a professional in software? Do you still want that? I would argue you deserve it, but I also believe authors and other creators should be compensated. Too many people here argue for the software professional compensation only, conveniently.

okanat 11 hours ago

If you are privileged and talented enough to make money purely out of free software, that's great!

However in most cases that money ultimately comes from being able to sell proprietary software and software-enhanced services. Many employers wouldn't pay for free software, if it wasn't helping their closed-source tech.

If bigger companies can enforce their "right"s as the owner of intellectual property, the smaller ones and individuals should be able to do so as well.

I discussed this rather recently in HN. The timelines for copyright is too long. They need to get shorter around 20-30 years for actual creative work.

I think software needs its own category of intellectual property. It should enjoy at most 10 years. Software is quite akin to machinery and mechanical designs can only get 20 years of patent protection. Considering the fast growing and changing nature, software should get even shorter IP protection. Similar to every other sector, trade secrets can continue to exist and employers can negotiate deals with software engineers for trade secret protection.

alok-g 11 hours ago

Is tbat saying that all software should be free? And extending beyond software, that all books, art, movies, etc., should be free to copy? Likewise, would it be fine for anyone to use any other company's logo?

gabriel666smith 13 hours ago

Would it actually set any kind of legal precedent, or just establish a sort of cultural vibe baseline? I know Anthropic doesn't have to admit fault, and I don't know if that establishes anything in either direction. But I'm not from the US, so I wouldn't want to pretend to have intimate knowledge of its system.

The number of bizarre, contradictory inferences this settlement asks you to make - no matter your stance on the wider question - is wild.

[-]

gpm 8 hours ago

The settlement doesn't set any kind of precedent at all.

The existing ruling in the case establish "persuasive" (i.e. future cases are entirely free to disagree and rule to the contrary) precedent - notably including the part about training on legally acquired copies of books (e.g. from a book store) being fair use.

Only appeals courts establish binding precedent in the US (and only for the courts under them). A result of this case settling is that it won't be appealed, and thus won't establish any binding precedent one way or another.

> The number of bizarre, contradictory inferences this settlement asks you to make - no matter your stance on the wider question - is wild.

What contradictions do you see? I don't see any.

[-]

gabriel666smith 7 hours ago

Thanks! That's really helpful.

> What contradictions do you see? I don't see any.

I guess us seeing very different things is also what a settlement might be for :-).

But I think I was wrong.

I think others in the thread are debating the contradictions I saw. I tried typing them out when I made my earlier comment, but couldn't get them to fix to any kind of logic that made sense to me. They just seemed contradictory, at the time.

I think the same arguments have now been made much more clearly by others - specifically around whether a corporation downloading this work is the same as a human downloading it - and the responses have been very clear also.

The settlement figure was tied implicitly to Anthropic's valuation in the Ars article [0] where I think I originally posted my comment. Those comments were moved here, so I've linked below.

Specifically linking the settlement sum to the valuation of a corporation is what caught me in a loop - that valuation assumes that Anthropic will do certain things in the future. I was thinking too much, maybe, about things like:

"Would a teenager get the same treatment? What about a teenager with a private company? What about a teenager who seemed dumber than that teenager to the person deciding their company's valuation? What about a teenager who had not opened the files themselves, but had spun up a model from them? What about a teenager who had done both?"

Etc. I think I was getting fixated on the idea that the valuation assumes future performance, and downloading the files was possibly necessary for that performance, but I was missing the obvious answers to some of my questions because of that.

I do think that some of the more anthropomorphising language - "training data" is an example - trips people up a lot in the same way. And I think that if the settlement sum reflects anything to do with the valuation of that corporation, that does create some interesting questions, but maybe not contradictions.

[0] https://arstechnica.com/tech-policy/2025/09/judge-anthropics...

stingraycharles 13 hours ago

A settlement means that no legal precedent is set, so I can only assume a cultural precedent.

Sometimes these companies specifically seek out a settlement to avoid setting a legal precedent in case they feel like they will lose.

[-]

lxe 12 hours ago

Hmm my huge concern was that if the settlement were to get approved, it would set a legal precedent for other "settlement approvals" like this one, setting back AI research in the US, paving way for China to win the race.

[-]

impossiblefork 12 hours ago

Nah, I think it's the opposite. If this settlement were approved, then you could screw people over in class action lawsuits.

This settlement was the "AI-friendly" thing.

gabriel666smith 8 hours ago

Thanks!

puppycodes 17 hours ago

I have no empathy for multi-billion dollar companies but intellectual property and copyright does nothing for positive for humanity.

[-]

program_whiz 16 hours ago

In an economy where ideas have value, it seems logical we should have property protection, much like we do for physical goods. Its easy to argue "ideas should be freely shared", but if an idea takes 20 years and $100M dollars to develop, and there are no protections for ideas, then no one will take the time to develop them. Most modern technology we have is due to copyright/patents (drugs, electronics, entertainment, etc.), because without those protections, no one would have invested the time and energy to develop them in the first place.

I believe you are probably only looking at the current state of the world and seeing how it "stifles competition" or "hampers innovation". Those allegations are probably true to some extent, especially in specific cases, but its also missing the fact that without those protections, the tech likely wouldn't be created in the first place (and so you still wouldn't be able to freely use the idea, since the person who invented it wouldn't have).

[-]

8note 13 hours ago

> drugs

this is a kinda strange example, since the discovery tends to be government funded research, and the safety shown by private money

the USSR went to space without those protections. its not like property protections are the only thing that has driven invention.

MIT licenses are also pretty popular as are creative commons licenses.

people also do things that don't make a lot of money, like teaching elementary school. it costs a ton of money to make and run all those schools, but without any intellectual property being created that can be sold or rented out.

i dont believe that nobody would want to build much of the things we have now, if there wasnt IP around them. Making and inventing things is fun

[-]

bhelkey 12 hours ago

> i dont believe that nobody would want to build much of the things we have now, if there wasnt IP around them. Making and inventing things is fun

People write fanfiction without being paid, however, Avatar 2 cost hundreds of millions to produce [1]. The studio didn't spend this money for the heck of it, they spent this money with the hope of recouping their investment.

If no one can make money off of intellectual property, people will continue writing fanfiction. But why would a studio spend hundreds of millions making a blockbuster movie?

[1] https://variety.com/2022/film/news/avatar-2-budget-expensive...

[-]

laggyluke 7 hours ago

> The studio didn't spend this money for the heck of it, they spent this money with the hope of recouping their investment.

I wonder if the world would be a better place if we had fewer financial incentives to do things, in general?

> But why would a studio spend hundreds of millions making a blockbuster movie?

Under this hypothetical scenario, I believe there wouldn't be a "studio" in the first place. There could be a group of people who want to express themselves, get famous or do something just for fun, without any direct financial gain. Sure, they wouldn't be able to pull off Avatar 2, but our expectations as consumers would also be different.

Permit 11 hours ago

> but if an idea takes 20 years and $100M dollars to develop, and there are no protections for ideas, then no one will take the time to develop them

This sounds trivially true but I have some trouble reconciling it with reality. For example the Llama models probably cost more than this to develop but are made freely available on GitHub. So while it’s true that some things won’t be built, I think it’s also the case that many things would still be built.

[-]

xvector 4 hours ago

The Llama models will likely be close sourced if they ever outperform.

tolerance 12 hours ago

I appreciate you giving the parent comment a fair chance.

As a society we’re having trouble defining abstract components of the self (consciousness, intelligence, identity) as is. What makes the legislative notion of an idea and its reification (what’s actually protected under copyright laws) secure from this same scrutiny? Then patent rights. And what do you think may happen if the viability of said economy comes into question afterwards?

netbsdusers 14 hours ago

It's just a fiction to allow something freely copiable - pure information - to be pretended to be a commodity. If the AI firms have only a single redeeming feature, then it is that in them the copyright mafia finally has to face someone their own size, rather than driving little people to suicide, as they did to Aaron Swartz.

2OEH8eoCRo0 30 minutes ago

I think the term has gotten way too long (70+ years at least) and we can thank Disney for that.

jonathanstrange 14 hours ago

Only people who don't create anything say that. Every musician and every author I know (including myself) thinks they should have some rights concerning the distribution and sale of the products of their work. Why should a successful book author be forced to live on charity?

[-]

BrawnyBadger53 14 hours ago

Weird framing, I don't think this is what they were suggesting

[-]

sothatsit 13 hours ago

It seems like a pretty logical conclusion that if you removed copyright, then book manufacturers would just copy author's books and sell them without paying the author. Or ebook services would just distribute their books for free.

Author's could potentially get a couple months of sales by working with manufacturers themselves and being the first to sell their books. But as soon as untrusted parties can get their hands on the book, someone will start selling their own copies of it.

arduanika 16 hours ago

What do you do for work, and do you believe it should be given away for free? Or are you just talking about other people's work?

[-]

nextworddev 14 hours ago

Are we even sure some of these posters aren’t LLms

jokoon 14 hours ago

That tiles is weird, what is an "Anthropic judge"?

[-]

rideontime 14 hours ago

The judge for the Anthropic lawsuit, obviously.

phaedryx 13 hours ago

It sounds like the judge works for Anthropic

giveita 13 hours ago

A human judge. Make the most of it, times are changing.

alok-g 11 hours ago

Indeed. While I could sense what was implied, I also thought of some newly-launched 'AI Judge' by Anthropic making the said claim. :-)

cleandreams 12 hours ago

The judge IIRC found that training models using copyrighted materials was fair use. I disagree. Furthermore this will be a problem for anyone who generates text for a living. Eventually LLMs will undercut web, news, and book publishing because LLMs capture the value and don't pay for it. The ecosystem will be harmed.

The only problem the judge found here was training on pirated texts.

[-]

xvector 4 hours ago

The ecosystem is irrelevant, the development of AI is a far higher priority than the ecosystem.

[-]

nicce 3 hours ago

Said by every for-profit company ever.

firesteelrain 17 hours ago

How do any of these AI companies protect authors by users uploading full PDF or even plaintext of anything? Aren’t the same piracy concerns real even if they train on what users are providing ?

[-]

jahbrewski 17 hours ago

If you’re vacuuming, shouldn’t you be responsible for what you’re vacuuming?

gowld 15 hours ago

If this is detected as leading to copyright violation, then that can be the subject of a lawsuit.

Since the violation is detected via model output, it doesn't matter what the input method is.

robryan 15 hours ago

Training aside, an llm reading a pdf as part of a prompt feels similar to say Dropbox storing a pdf for you.

[-]

terminalshort 14 hours ago

It's not similar at all because you can't get the book back out of the LLM like you can out of Dropbox. Copyright law is concerned with outputs, not with inputs. If you could make a machine that could create full exact copies of books without ever training on or copying those books, that would still be infringement.

[-]

shkkmo 11 hours ago

> make a machine that could create full exact copies of books without ever training on or copying those books, that would still be infringement.

No it wouldn't. Making the machine is not making a copy of the book. Using the machine to make a copy of the book would be infringment because...you would be making a copy of the book.

paddw 13 hours ago

Anthropic should drop the deal and take the battle up the court system, they'll probably win

[-]

AuthError 13 hours ago

they did, judge told authors to get better representation to prove harm. I think it's dicey cause if anthropic loses then it could be catastrophic (i.e. if judge jury thinks reward is 5x of what the proposal is would mean they would need to raise a new round)

[-]

3np 13 hours ago

Anthropic having to raise a new round doesn't sound "catastrophic"...

pier25 13 hours ago

It's an indisputable fact they downloaded like 7M books illegally.

[-]

bhelkey 12 hours ago

From the article:

> Alsup gave the parties a Sept. 15 deadline to submit a final list of works, which currently stands around 465,000.

[-]

pier25 11 hours ago

They did download 7M books.

> That's a far cry from the 7 million works that he initially certified as covered in the class. A breakdown from the Authors Guild—which consulted on the case and is part of a Working Group helping to allocate claims of $3,000 per work to authors and publishers—explained that "after accounting for the many duplicates," foreign editions, unregistered works, and books missing class criteria, "only approximately 500,000 titles meet the definition required to be part of the class."

https://arstechnica.com/tech-policy/2025/09/judge-anthropics...

kelnos 8 hours ago

They're likely taking the deal because they think there's a good chance they'll lose, and that the final judgement would be more -- possibly a lot more -- than this settlement.

But sure, I bet us randos on HN have a better feel for this than Anthropic's legal team.

rvz 13 hours ago

> they'll probably win

They are settling because the risk of losing will cost their entire business.

Anthropic knows that they will lose if they were brought to trial.

[-]

wrsh07 12 hours ago

Yeah settling seems good for investors imo. It's variance reduction.

[-]

alok-g 11 hours ago

Indeed.

I know little but perhaps the harm felt on future valuation is more than the settlement amount.

SamoyedFurFluff 15 hours ago

For folks unable to access the full article, does the judge say why?

[-]

jorams 14 hours ago

Two paragraphs from the article that I think sum it up pretty nicely:

> Judge William Alsup at the hearing said the motion to approve the deal was denied without prejudice, but in a minute order after the hearing said approval is postponed pending submission of further clarifying information.

> Alsup said class members “get the shaft” in many class actions once the monetary relief is established and attorneys stop caring. He told the parties that “very good notice” must be given to class members to ensure they have the opportunity to opt in or out, and protect Anthropic from potential claimants coming out of the woodwork later.

Essentially he has concerns about missing details in two directions:

1. How class members are going to get notified, submit claims, and paid out, what works are even included, and the involvement of an army of lawyers that shouldn't be paid from the settlement.

2. How this deal is going to prevent Anthropic getting sued for cases that should have been covered.

[-]

stingraycharles 13 hours ago

Almost feels like the judge is siding with Anthropic here. But he’s right that in these types of cases, the lawyers stop caring once a settlement is reached because that’s the massive pay day they were after.

[-]

IshKebab 2 hours ago

Feels like they should only get paid when each individual settlement is actually settled surely?

xvector 4 hours ago

The tech companies need to make a shared subsidiary that simply buys all the books just once and then shares that data amongst subsidiary owners.

[-]

internet_points 3 hours ago

Buying a book at the regular price doesn't give you rights to sell copies of that book.

(I'm not saying selling llm access means selling copies of the book -- but then I'm also not not saying it.)

nobody9999 a day ago

https://archive.ph/tOwYx

HardCodedBias 13 hours ago

Outputs not inputs needs to become law.

atleastoptimal 17 hours ago

Copying files of scanned books isn’t worth a 1T dollar fine

[-]

adrr 14 hours ago

Maybe Anthropic should have paid attention to the law that has $150k statutory damages per violation if the infringement is willful. So much cheaper just to buy the books and scan it instead of violating a law that has a statutory damage clause.

[-]

ripped_britches 14 hours ago

I really want to see an alternative universe where we have mechanical turk folks scanning a huge literal book library into a data warehouse.

[-]

kg 13 hours ago

Google and the Internet Archive both did/do this with elaborate setups. https://archive.org/details/eliza-digitizing-book_202107

wrsh07 12 hours ago

I think anthropic has this operation now

atleastoptimal 12 hours ago

If a person pirates a book should they have to pay 150k?

[-]

ch_fr 3 hours ago

So true! I too think the average person is basically indistinguishable from anthropic.

eschaton 13 hours ago

In general buying books and scanning them for this type of use would *also* be copyright infringement.

[-]

adrr 13 hours ago

No. Thats fair use. Format shifting is fair use as affirmed by RIAA v. Diamond Multimedia which was about ripping CDs to MP3s.

[-]

eschaton 13 hours ago

…for personal use, just as timeshifting was with MPAA v. Sony. Neither were about commercial use/exploitation.

[-]

adrr 13 hours ago

Meta case just affirmed that training an LLM is fair use under transformative use. Alsom, Google's indexing(transformative use) of scanned books is settled law with Authors Guild v. Google.

[-]

eschaton 7 hours ago

And Roe v. Wade was settled law too, until it wasn’t.