OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series

L4sBot@lemmy.world · 1 year ago

OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series

Blapoo@lemmy.ml · 1 year ago

We have to distinguish between LLMs

Trained on copyrighted material and
Outputting copyrighted material

They are not one and the same

Even_Adder@lemmy.dbzer0.com · 1 year ago

Yeah, this headline is trying to make it seem like training on copyrighted material is or should be wrong.

scv@discuss.online · 1 year ago

Legally the output of the training could be considered a derived work. We treat brains differently here, that’s all.

I think the current intellectual property system makes no sense and AI is revealing that fact.

TropicalDingdong@lemmy.world · 1 year ago

I think this brings up broader questions about the currently quite extreme interpretation of copyright. Personally I don’t think its wrong to sample from or create derivative works from something that is accessible. If its not behind lock and key, its free to use. If you have a problem with that, then put it behind lock and key. No one is forcing you to share your art with the world.

Technoguyfication@lemmy.ml · 1 year ago

People are acting like ChatGPT is storing the entire Harry Potter series in its neural net somewhere. It’s not storing or reproducing text in a 1:1 manner from the original material. Certain material, like very popular books, has likely been interpreted tens of thousands of times due to how many times it was reposted online (and therefore how many times it appeared in the training data).

Just because it can recite certain passages almost perfectly doesn’t mean it’s redistributing copyrighted books. How many quotes do you know perfectly from books you’ve read before? I would guess quite a few. LLMs are doing the same thing, but on mega steroids with a nearly limitless capacity for information retention.

Teritz@feddit.de · 1 year ago

Using Copyrighted Work as Art as example still influences the AI which their make Profit from.

If they use my Works then they need to pay thats it.

TropicalDingdong@lemmy.world · 1 year ago

Its a bit pedantic, but I’m not really sure I support this kind of extremist view of copyright and the scale of whats being interpreted as ‘possessed’ under the idea of copyright. Once an idea is communicated, it becomes a part of the collective consciousness. Different people interpret and build upon that idea in various ways, making it a dynamic entity that evolves beyond the original creator’s intention. Its like issues with sampling beats or records in the early days of hiphop. Its like the very principal of an idea goes against this vision, more that, once you put something out into the commons, its irretrievable. Its not really yours any more once its been communicated. I think if you want to keep an idea truly yours, then you should keep it to yourself. Otherwise you are participating in a shared vision of the idea. You don’t control how the idea is interpreted so its not really yours any more.

If thats ChatGPT or Public Enemy is neither here nor there to me. The idea that a work like Peter Pan is still possessed is such a very real but very silly obvious malady of this weirdly accepted but very extreme view of the ability to possess an idea.

Bogasse@lemmy.world · 1 year ago

Well, I’d consider agreeing if the LLMs were considered as a generic knowledge database. However I had the impression that the whole response from OpenAI & cie. to this copyright issue is “they build original content”, both for LLMs and stable diffusion models. Now that they started this line of defence I think that they are stuck with proving that their “original content” is not derivated from copyrighted content 🤷

TropicalDingdong@lemmy.world · 1 year ago

Well, I’d consider agreeing if the LLMs were considered as a generic knowledge database. However I had the impression that the whole response from OpenAI & cie. to this copyright issue is “they build original content”, both for LLMs and stable diffusion models. Now that they started this line of defence I think that they are stuck with proving that their “original content” is not derivated from copyrighted content 🤷

Yeah I suppose that’s on them.

treefrog@lemm.ee · 1 year ago

If you sample someone else’s music and turn around and try to sell it, without first asking permission from the original artist, that’s copyright infringement.

So, if the same rules apply, as your post suggests, OpenAI is also infringing on copyright.

TropicalDingdong@lemmy.world · 1 year ago

If you sample someone else’s music and turn around and try to sell it, without first asking permission from the original artist, that’s copyright infringement.

I think you completely and thoroughly do not understand what I’m saying or why I’m saying it. No where did I suggest that I do not understand modern copyright. I’m saying I’m questioning my belief in this extreme interpretation of copyright which is represented by exactly what you just parroted. That this interpretation is both functionally and materially unworkable, but also antithetical to a reasonable understanding of how ideas and communication work.