Penguin Random House underscores copyright protection in AI rebuff

(thebookseller.com)

28 points | by marban 5 hours ago ago

21 comments

  • xnorswap an hour ago

    It's hard to understand the position of AI companies, when their AIs can be prompted to reproduce copyrighted works verbatim, e.g.

    https://imgur.com/a/XBO2B7V

    You can argue that a game that old ought not to be still in copyright, and I'd support that position.

    But it is in copyright, and I'd rather a world where we curtail copyright terms but enforce them fairly rather than a world in which we stifle culture by automated means of suppressing "DMCA" violations on some platforms while large platform owners invest in AI which trample over the notion of copyright.

    • ben_w 30 minutes ago

      It's a complicated mess.

      Reason being: I, too, am physically capable of reproducing copyrighted works verbatim when correctly prompted. Or at least, can do so as well as current GenAI can in certain specific cases, and know how to use a photocopier in others.

      Is the copyright violation the user asking for this, or is it the model having the capability?

      If it's the mere capability, the inventors of copy-paste and the camera and voice recorder apps have a problem, along with all search engines that ever index pirated material.

      If it's the users, given that copyright databases are much too large for any human to actually know exhaustively what is and isn't in them, there's a real danger that normal people will accidentally do exactly this.

      • kranke155 2 minutes ago

        Ridiculous. The copyright violation is in its use for training data. And then its doubled down by the user asking for copyrighted material and getting it verbatim.

        It's not complicated. It's only complicated as it might be in the way of some people making billions.

        Let's stop with the analogies. Analogies won't get you anywhere.

        Also: - if you recorded a concert with a tape recorder, then distributed it online for money, pretty sure that was a copyright violation.

        - If you used a camera to film a movie being shown in a theatre, then distributed it online, that is a copyright violation.

        Stop the BS.

  • openrisk 2 hours ago

    "Digital societies with a poor rule of law and institutions that exploit the population do not generate growth or change for the better"

    Excerpt from the announcement of the 2044 Nobel prize in economics.

    It echoes the earlier arguments of the 2024 prize to Acemoglu, Johnson, Robinson about the economic fate of colonies and an even earlier thesis by Hernando de Soto about the role of property rights in wealth formation.

    Exploiting legal loopholes to transfer wealth with immunity has been the central profit mechanism of many recent tech business models, whether that was the gig economy, adtech or now "AI".

    The tragedy is that stock markets as currently organized (besides being famously amoral) have no sense of long term value. This creates a positive feedback loop where tech business models that gnaw at the pre-digital institutional wealth foundations are empowered to do so even more.

    Yet those economists are not wrong. In historical timescales the digital transition will see massive redistribution of wealth to nations that can actually create strong digital institutions and legal protections.

    This is, incidentally, why the EU and its enthusiastic rule making might be onto something, even if its currently lethargic tech industry does not help validate the alternative vision.

  • gruez 4 hours ago

    IANAL, but this seems kind of pointless? Either AI training is copyright infringement or it's not. A tiny disclaimer isn't going to affect that. The AI companies contend that it's fair use, which would probably override any fine print on the inside cover.

    • dwattttt 3 hours ago

      > AI companies contend that it's fair use

      It's a good thing they don't train on copyrighted material from jurisdictions that have something other than "fair use", such as Australia's "fair dealing". Otherwise, they'd have to argue that their use of such copyrighted material doesn't "usurp either the market of the original work or a derivative market".

      https://addisons.com/knowledge/insights/fair-use-or-fair-dea...

    • AStonesThrow 3 hours ago

      > The AI companies contend that it's fair use

      Do they? "Fair Use" is an affirmative defense, so the only time we're going to get into that is in a court case, where it'll be tested through legal means.

      I would say it's even more nuanced: if LLM training involves merely reading a dataset, but it is not strictly necessary to copy, or even store it verbatim to be useful, then does it even fall under copyright protection at all? A lot of computer-based data processing is already immune to copyright issues; you can place a webpage into a server-based cache or CDN, you can stream it across a network, you can cache it in RAM or local storage, you can make backups of things, and all these processing uses don't fall afoul of copyright.

      So I would say that we're going to watch the LLM trainers say that the models aren't storing copies at all, and that seems an even stronger defense than "Fair Use". It is a strange copyright protection indeed that explicitly or implicitly prohibits certain types of machine readings, while allowing many others.

      • profmonocle an hour ago

        > if LLM training involves merely reading a dataset, but it is not strictly necessary to copy, or even store it verbatim to be useful, then does it even fall under copyright protection at all?

        Copyright includes the creation of derivative works, not just literally copying the source material.

        For instance, imagine I read a novel, then I decide to write my own, unauthorized sequel to it. It's not a literal "copy" of the original material - it's my own original text, but obviously a derivative work of the original material. Under copyright law, that would be infringement - I would be sued if I tried to sell that. (Yes, that means fanfiction is infringing, but most rights holders have wisely decided to look the other way on that, as long as it's non-commercial.)

        This is what people who claim AI is infringing are worried about. Not that the AI has a literal copy of the source material in its training data, but that the training data can be used to produce a derivative work.

        I could write a (crappy) fanfic of the Lord of the Rings without directly referencing the books/movies. And that doesn't mean I have a complete copy of the books/movies in my head - that isn't how memory works. Until now, creating a derivative work without directly using the source material was something only humans could do. This is completely uncharted legal territory.

      • A4ET8a8uTh0 2 hours ago

        << Do they?

        It is a fair point. Companies contend what they always contend, which is their position in an argument; they do so forcefully and regardless of the reality on the ground. Companies are basically modeled after opportunists.

  • photonthug 4 hours ago

    > There is no standard ‘All rights reserved’ wording and even the most basic notice covers all uses. Having said that, we’re pleased to see publishers starting to add to the ‘All rights reserved’ notice to explicitly exclude the use of a work for the purpose of training [generative AI], as it provides greater clarity and helps to explain to readers what cannot be done without rights-holder consent.”

    So their position is that all rights reserved always meant that unauthorized use of any kind was already forbidden, but that was ignored / unenforceable and so they are adding “looking at you” language to carve out stuff that’s disallowed more specifically. Feels like bargaining, because this just gives the opposition the chance to argue that it wasn’t illegal before.

    • dwattttt 3 hours ago

      > unauthorized use of any kind was already forbidden

      Is a funny sentence to have to say. Is unauthorized use also unauthorized?

      • turbonaut 2 hours ago

        Authorization and forbidding are both explicit actions.

        Unauthorized refers to the absence of authorisation.

        ‘Unauthorized use is forbidden’ means ‘all use must have explicit permission.’

        Clearly it is a bit disingenuous as it usually means ‘all use I don’t like / is not inline with norms’.

  • CaptainFever 3 hours ago

    If it's fair use in the US, it doesn't matter how many "X is prohibited because of copyright" clauses they add, AI training is allowed.

    For EU TDM, this opt out probably works to disallow commercial entities from using it for training, but not researchers.

    For SG TDM, this opt out can be ignored. All legally-acquired material can be used for AI training purposes, period.

    Disclaimer: IANAL.

    • NeoTar an hour ago

      SG = Singapore, TDM = Text Data Mining?

  • rahimnathwani 3 hours ago

    Which country has the most favorable 'fair use' laws, and why wouldn't big companies train their models there?

    • profmonocle 2 hours ago

      Would that matter if the company wants to do business in countries with more restrictive laws?

      I.E. if I wrote my own spin-off of a popular book series, which was somehow considered fair use in country A, but considered infringing in country B, the publisher could get it removed from stores in country B.

      By the same logic, if AI training is ruled as copyright infringement in the US, it won't matter if the company trains their model somewhere else - if they open a US division to sell service using that model, they'd get sued.

      Granted I'm not an IP lawyer and AI IP law is in its infancy - maybe I'm missing something?

    • NitpickLawyer 2 hours ago

      IIRC Japan has had at least one court ruling that training on copyrighted data is fair use (or a version thereof).

  • portaouflop an hour ago

    Penguin Random House is a predatory corrupt business that the world would be much better off without. If AI means businesses like this shut down I need more „AI“

    • 2muchcoffeeman an hour ago

      Two wrongs don’t make a right.

      • portaouflop an hour ago

        Anything that Penguin Random House perceives as a threat is good and right for me.

    • paganel an hour ago

      Honest question, what's "predatory corrupt" about their business? I have some of their paperbacks on my bookshelves and they've been a pretty decent value for money thing.