What neither Big Tech nor Big Media will say is that stronger antitrust rules and enforcement would be a much better solution.
What’s more, looking beyond copyright future-proofs the protections. Stronger environmental protections, comprehensive privacy laws, worker protections, and media literacy will create an ecosystem where we will have defenses against any new technology that might cause harm in those areas, not just generative AI.
None of their alternatives will work or solve the problems that creatives face or the problem that people cannot think for themselves any longer (as seen by the downvoting in this submission).
How do they get to the conclusion that AI uses are protected under the fair use doctrine and anything otherwise would be an "expansion" of copyright? Fairly telling IMO
Basically, it’s an open question that courts have yet to decide. But the idea is that it’s fair use until courts decide otherwise (or laws decide otherwise, but that doesn’t seem likely). That’s my understanding, but I could be wrong. I expect we’ll see more and more cases about this, which is exactly why the EFF wants to take a position now.
They do link to a (very long) article by a law professor arguing that data mining is fair use. If you want to get into the weeds there, knock yourself out.
AI training and the thing search engines do to make a search index are essentially the same thing. Hasn't the latter generally been regarded as fair use, or else how do search engines exist?
re ".....AI training and the thing search engines do to make a search index are essentially the same thing. ...."
Well, AI training has annoyed LOTS people. Overloaded websites.. Done things just because they can . ie Facebook sucking up content of lots pirate books
Since this AI race started our small website is constantly over run by bots and it is not usable by humans because of the load.. NEWER HAD this problem before AI , when just access by search engine indexing .....
The fair use test (in US copyright law) is a 4 part test under which impact on the market for the original work is one of 4 parts. Notably, just because a use has massively detrimental harms to a work's market does not in and of itself constitute a copyright violation. And it couldn't be any other way. Imagine if you could be sued for copyright infringement for using a work to criticize that work or the author of that work if the author could prove that your criticism hurt their sales. Imagine if you could be sued for copyright infringement because you wrote a better song or book on the same themes as a previous creator after seeing their work and deciding you could do it better.
Perhaps famously, emulators very clearly and objectively impact the market for a game consoles and computers and yet they are also considered fair use under US copyright law.
No one part of the 4 part test is more important than the others. And so far in the US, training and using an LLM has been ruled by the courts to be fair use so long as the materials used in the training were obtained legally.
There was a relatively tiny but otherwise identical uproar over Google even before they added infoboxes that reduced the number of people who clicked through.
There was also the lawsuit against google for the Google Scholar project, which is not only very similar to how AI use ingest copyright material, but even more than AI actually reproduced word for word (intentionally so) snippets of those works. Google Scholar is also fair use.
They probably get to that conclusion because the courts have rules that AI uses are protected under fair use, and so yes changing that would be an expansion of copyright.
EFF is bought and paid for. Not once does this piece mention that "AI" and humans are different and that a digital combine harvester mowing down and ingesting the commons does not need the same rights as a human.
It is not fair use when the entire output is made of chopped up quotes from all of humanity. It is not fair use when only a couple of oligarchs have the money and grifting ability to build the required data centers.
This is a another in the long lists of institutions that have been subverted. ACLU and OSI are other examples.
What definition of "sufficiently transformative" doesn't include "a book about wizards" by some process being used to make "a machine that spits out text"? A magazine publisher has a more legitimate claim against the person making a ransom letter: at least the fonts are copied verbatim.
There are legitimate arguments to be made about whether or not AI training should be allowed, but it should take the form of new legislation, not wild reinterpretations of copyright law. Copyright law is already overreaching, just imagine how goddawful companies could be if they're given more power to screw you for ever having interacted with their "creative works".
We did have that. In some EU countries, during the cassette tape and Sony Walkman era, private individuals were allowed to make around 5 copies for friends from a legitimate source.
Companies were not allowed to make 5 trillion copies.
> It is not fair use when only a couple of oligarchs have the money and grifting ability to build the required data centers.
Seems like a good argument to not lock down the ability to create and use AI models only to those with vast sums of money able to pay extortionist prices from copyright holders. And let's be clear, copyright holders will happily extort the hell out of things if they can, for an example of this we can look to the number of shows and movies that have had to be re-edited in the modern era because there are no streaming rights to the music they used.
One of the few times I vehemently disagree with the EFF.
The problem is this article seems to make absolutely no effort to differentiate legitimate uses of GenAI (things like scientific and medical research) from the completely illegitimate uses of GenAI (things like stealing the work of every single artist, living and dead, for the sole purpose of making a profit)
One of those is fair use. The other is clearly not.
At what point do you cross the line from "legitimate use of a work" to illegitimate use?
If I take my legally purchased epub of book and pipe it through `wc` and release the outputs, is that a violation of copyright? What about 10 books? 100? How many books would I have to pipe through `wc` before the outputs become a violation of copyright?
What if I take those same books and generate a spreadsheet of all the words and how frequently they're used? Again, same question, where is the line between "fine" and "copyright violation"?
What if I take that spreadsheet, load it into a website and make a javascript program that weights every word by count and then generates random text strings based on those weights? Is that not essentially an LLM in all but usefulness? Is that a violation of copyright now that I'm generating new content based on statistical information about copyright content? If I let such a program run long enough and run on enough machines, I'm sure those programs would generate strings of text from the works that went into the models. Is that what makes this a copyright violation?
If that's not a violation, how many other statistical transformation and weighting models would I have to add to my javascript program before it's a violation of copyright? I don't think it's reasonable to say any part of this is "clearly not" fair use, no matter how many books I pump into that original set of statistics. And at least so far, the US courts agree with that.
What happens when a researcher makes a generative art model and publicly releases the weights? Anyone can download the weights and use it to turn a quick profit.
Should the original research use be considered legitimate fair use? Does the legitimacy get 'poisoned' along the way when a third party uses the same model for profit?
Is there any difference between a mom-and-pop restaurant who uses the model to make a design for their menu versus a multi-billion dollar corp that's planning on laying off all their in house graphic designers? If so, where in between those two extremes should the line be drawn?
I'm not a copyright attorney in any country, so the answer (assuming you're asking me personally) is "I don't know and it probably depends heavily on the specific facts of the case."
If you're asking for my personal opinion, I can weigh in on my personal take for some fair use factors.
- Research into generative art models (the kind which is done by e.g. OpenAI, Stable Diffusion) is only possible due to funding. That funding mainly comes from VC firms who are looking to get ROI by replacing artists with AI[0], and then debt financing from major banks on top of that. This drives both the market effect factor and the purpose/character of use factor, and not in their favor. If the research has limited market impact and is not done for the express purpose of replacing artists, then I think it would likely be fair use (an example could be background removal/replacement).
- I don't know if there are any legal implications of a large vs. small corporation using a product of copyright infringement to produce profit. Maybe it violates some other law, maybe it doesn't. All I know is that the end product of a GenAI model is not copyrightable, which to my understanding means their profit potential is limited as literally anyone else can use it for free.
"Here's What to Do Instead" misleading title, no alternatives suggested. Just hand-wavey GenAI agitprop.
These are their alternatives:
What neither Big Tech nor Big Media will say is that stronger antitrust rules and enforcement would be a much better solution. What’s more, looking beyond copyright future-proofs the protections. Stronger environmental protections, comprehensive privacy laws, worker protections, and media literacy will create an ecosystem where we will have defenses against any new technology that might cause harm in those areas, not just generative AI.
None of their alternatives will work or solve the problems that creatives face or the problem that people cannot think for themselves any longer (as seen by the downvoting in this submission).
>the problem that people cannot think for themselves any longer (as seen by the downvoting in this submission).
Quite an interesting take to assume that everyone who disagrees with you cannot think for themselves.
How do they get to the conclusion that AI uses are protected under the fair use doctrine and anything otherwise would be an "expansion" of copyright? Fairly telling IMO
Basically, it’s an open question that courts have yet to decide. But the idea is that it’s fair use until courts decide otherwise (or laws decide otherwise, but that doesn’t seem likely). That’s my understanding, but I could be wrong. I expect we’ll see more and more cases about this, which is exactly why the EFF wants to take a position now.
They do link to a (very long) article by a law professor arguing that data mining is fair use. If you want to get into the weeds there, knock yourself out.
https://lawreview.law.ucdavis.edu/sites/g/files/dgvnsk15026/...
AI training and the thing search engines do to make a search index are essentially the same thing. Hasn't the latter generally been regarded as fair use, or else how do search engines exist?
re ".....AI training and the thing search engines do to make a search index are essentially the same thing. ...."
Well, AI training has annoyed LOTS people. Overloaded websites.. Done things just because they can . ie Facebook sucking up content of lots pirate books
Since this AI race started our small website is constantly over run by bots and it is not usable by humans because of the load.. NEWER HAD this problem before AI , when just access by search engine indexing .....
Most important part of fair use is does it harm the market for the original work. Search helps to brings more eyes to the original work, llms don't.
The fair use test (in US copyright law) is a 4 part test under which impact on the market for the original work is one of 4 parts. Notably, just because a use has massively detrimental harms to a work's market does not in and of itself constitute a copyright violation. And it couldn't be any other way. Imagine if you could be sued for copyright infringement for using a work to criticize that work or the author of that work if the author could prove that your criticism hurt their sales. Imagine if you could be sued for copyright infringement because you wrote a better song or book on the same themes as a previous creator after seeing their work and deciding you could do it better.
Perhaps famously, emulators very clearly and objectively impact the market for a game consoles and computers and yet they are also considered fair use under US copyright law.
No one part of the 4 part test is more important than the others. And so far in the US, training and using an LLM has been ruled by the courts to be fair use so long as the materials used in the training were obtained legally.
There was a relatively tiny but otherwise identical uproar over Google even before they added infoboxes that reduced the number of people who clicked through.
There was also the lawsuit against google for the Google Scholar project, which is not only very similar to how AI use ingest copyright material, but even more than AI actually reproduced word for word (intentionally so) snippets of those works. Google Scholar is also fair use.
They probably get to that conclusion because the courts have rules that AI uses are protected under fair use, and so yes changing that would be an expansion of copyright.
Not the EFF I once knew. Are they now pro-bigtech?
EFF is bought and paid for. Not once does this piece mention that "AI" and humans are different and that a digital combine harvester mowing down and ingesting the commons does not need the same rights as a human.
It is not fair use when the entire output is made of chopped up quotes from all of humanity. It is not fair use when only a couple of oligarchs have the money and grifting ability to build the required data centers.
This is a another in the long lists of institutions that have been subverted. ACLU and OSI are other examples.
What definition of "sufficiently transformative" doesn't include "a book about wizards" by some process being used to make "a machine that spits out text"? A magazine publisher has a more legitimate claim against the person making a ransom letter: at least the fonts are copied verbatim.
There are legitimate arguments to be made about whether or not AI training should be allowed, but it should take the form of new legislation, not wild reinterpretations of copyright law. Copyright law is already overreaching, just imagine how goddawful companies could be if they're given more power to screw you for ever having interacted with their "creative works".
We did have that. In some EU countries, during the cassette tape and Sony Walkman era, private individuals were allowed to make around 5 copies for friends from a legitimate source.
Companies were not allowed to make 5 trillion copies.
> It is not fair use when only a couple of oligarchs have the money and grifting ability to build the required data centers.
Seems like a good argument to not lock down the ability to create and use AI models only to those with vast sums of money able to pay extortionist prices from copyright holders. And let's be clear, copyright holders will happily extort the hell out of things if they can, for an example of this we can look to the number of shows and movies that have had to be re-edited in the modern era because there are no streaming rights to the music they used.
> EFF is bought and paid for.
by whom?
One of the few times I vehemently disagree with the EFF.
The problem is this article seems to make absolutely no effort to differentiate legitimate uses of GenAI (things like scientific and medical research) from the completely illegitimate uses of GenAI (things like stealing the work of every single artist, living and dead, for the sole purpose of making a profit)
One of those is fair use. The other is clearly not.
At what point do you cross the line from "legitimate use of a work" to illegitimate use?
If I take my legally purchased epub of book and pipe it through `wc` and release the outputs, is that a violation of copyright? What about 10 books? 100? How many books would I have to pipe through `wc` before the outputs become a violation of copyright?
What if I take those same books and generate a spreadsheet of all the words and how frequently they're used? Again, same question, where is the line between "fine" and "copyright violation"?
What if I take that spreadsheet, load it into a website and make a javascript program that weights every word by count and then generates random text strings based on those weights? Is that not essentially an LLM in all but usefulness? Is that a violation of copyright now that I'm generating new content based on statistical information about copyright content? If I let such a program run long enough and run on enough machines, I'm sure those programs would generate strings of text from the works that went into the models. Is that what makes this a copyright violation?
If that's not a violation, how many other statistical transformation and weighting models would I have to add to my javascript program before it's a violation of copyright? I don't think it's reasonable to say any part of this is "clearly not" fair use, no matter how many books I pump into that original set of statistics. And at least so far, the US courts agree with that.
I think your analogy is a massive stretch. `wc` is neither generative nor capable of having market effect.
Your second construction is generative, but likely worse than a Markov chain model, which also did not have any market effect.
We're talking about the models that have convinced every VC it can make a trillion dollars from replacing millions of creative jobs.
What happens when a researcher makes a generative art model and publicly releases the weights? Anyone can download the weights and use it to turn a quick profit.
Should the original research use be considered legitimate fair use? Does the legitimacy get 'poisoned' along the way when a third party uses the same model for profit?
Is there any difference between a mom-and-pop restaurant who uses the model to make a design for their menu versus a multi-billion dollar corp that's planning on laying off all their in house graphic designers? If so, where in between those two extremes should the line be drawn?
I'm not a copyright attorney in any country, so the answer (assuming you're asking me personally) is "I don't know and it probably depends heavily on the specific facts of the case."
If you're asking for my personal opinion, I can weigh in on my personal take for some fair use factors.
- Research into generative art models (the kind which is done by e.g. OpenAI, Stable Diffusion) is only possible due to funding. That funding mainly comes from VC firms who are looking to get ROI by replacing artists with AI[0], and then debt financing from major banks on top of that. This drives both the market effect factor and the purpose/character of use factor, and not in their favor. If the research has limited market impact and is not done for the express purpose of replacing artists, then I think it would likely be fair use (an example could be background removal/replacement).
- I don't know if there are any legal implications of a large vs. small corporation using a product of copyright infringement to produce profit. Maybe it violates some other law, maybe it doesn't. All I know is that the end product of a GenAI model is not copyrightable, which to my understanding means their profit potential is limited as literally anyone else can use it for free.
[0]: https://harlem.capital/generative-ai-the-vc-landscape/