AI Companies Running Out of Training Data After Burning Through Entire Internet

voidx · 2 years ago

AI Companies Running Out of Training Data After Burning Through Entire Internet

Pennomi@lemmy.world · 2 years ago

There’s already more than enough training data out there. The important thing that remains is to filter it so it doesn’t also include humanity’s stupidest data.

That and make the algorithms smarter so they are resistant to hallucination and misinformation - that’s not a data problem, it’s an architecture problem.

FaceDeer@fedia.io · 2 years ago

Stupid data can be useful for training as a negative example. Image generators use negative prompts to good effect.

MotoAsh@lemmy.world · edit-2 10 months ago

deleted by creator

Takumidesh@lemmy.world · 2 years ago

Well is the goal truth? Or a simulacrum of a human?

MotoAsh@lemmy.world · edit-2 10 months ago

deleted by creator

CanadaPlus@lemmy.sdf.org · edit-2 2 years ago

Well, it’s established wisdom that the dataset size needs to scale with the number of model parameters. Quadratically, IIRC. If you don’t have that much data the training basically won’t work; it will overfit or just not progress.

Ultraviolet@lemmy.world · 2 years ago

You also have to filter out the AI generated garbage that is rapidly becoming a majority of content on the internet.