As LLMs become the go-to for quick answers, fewer people are posting questions on forums or social media. This shift could make online searches less fruitful in the future, with fewer discussions and solutions available publicly. Imagine troubleshooting a tech issue and finding nothing online because everyone else asked an LLM instead. You do the same, but the LLM only knows the manual, offering no further help. Stuck, you contact tech support, wait weeks for a reply, and the cycle continues—no new training data for LLMs or new pages for search engines to index. Could this lead to a future where both search results and LLMs are less effective?
Maybe in the sense that the Internet may become so inundated with AI garbage that the only way to get factual information is by actually reading a book or finding a real person to ask, face to face.
You know how the steel from prenuclear proliferation is prized? I wonder if that’s going to happen with data from before 2022 as well now. Lol.
There might be a way to mitigate that damage. You could categorize the training data by the source. If it’s verified to be written by a human, you could give it a bigger weight. If not, it’s probably contaminated by AI, so give it a smaller weight. Humans still exist, so it’s still possible to obtain clean data. Quantity is still a problem, since these models are really thirsty for data.
LLMs can’t distinguish truth from falsehoods, they only produce output that resembles other output. So they can’t tell the difference between human and AI input.
That’s a problem when you want to automate the curation and annotation process. So far, you could have just dumped all of your data into the model, but that might not be an option in the future, as more and more of the training data was generated by other LLMs.
When that approach stops working, AI companies need to figure out a way to get high quality data, and that’s when it becomes useful to have data that was verified to be written by actual people. This way, an AI doesn’t even need to be able to curate the data, as humans have done that to some extent. You could just prioritize the small amount of verified data while still using the vast amounts of unverified data for training.
My 70 year old boss and his 50 year old business partner just today generated a set of instructions for scanning to a thumb drive on a specific model of printer.
They obviously missed the “AI Generated” tag on the Google search and couldn’t figure out why the instructions cited the exact model but told them to press buttons and navigate menus that didn’t exist.
These are average people and they didn’t realize that they were even using ai much less how unreliable it can be.
I think there’s going to be a place for forums to discuss niche problems for as long as ai just means advanced LLM and not actual intelligence.
When diagnosing software related tech problems with proper instructions, there’s always the risk of finding outdated tips. You may be advised to press buttons that no longer exist in the version you’re currently using.
With hardware though, that’s unlikely to happen, as long as the model numbers match. However, when relying on AI generated instructions, anything is possible.
Trouble is that ‘quick answers’ mean the LLM took no time to do a thorough search. Could be right or wrong - just by luck.
When you need the details to be verified by trustworthy sources, it’s still do-it-yourself time. If you -don’t- verify, and repeat a wrong answer to someone else, -you- are untrustworthy.
A couple months back I asked GPT a math question (about primes) and it gave me the -completely wrong- answer … ‘none’ … answered as if it had no doubt. It was -so- wrong it hadn’t even tried. I pointed it to the right answer (‘an infinite number’) and to the proof. It then verified that.
A couple of days ago, I asked it the same question … and it was completely wrong again. It hadn’t learned a thing. After some conversation, it told me it couldn’t learn. I’d already figured that out.
Trouble is that ‘quick answers’ mean the LLM took no time to do a thorough search.
LLMs don’t “search”. They essentially provide weighted parrot-answers based on what they’ve seen elsewhere.
If you tell an LLM that the sky is red, they will tell you the sky is red. If you tell them your eyes are the colour of the sky, they will repeat that your eyes are red. LLMs aren’t capable of checking if something is true.
Theyre just really fast parrots with a big vocabulary. And every time they squawk, it burns a tree.
Math problems are a unique challenge for LLMs, often resulting in bizarre mistakes. While an LLM can look up formulas and constants, it usually struggles with applying them correctly. Sort of, like counting the hours in a week, it says it calculates 7*24, which looks good, but somehow the answer is still 10 🤯. Like, WTF? How did that happen? In reality, that specific problem might not be that hard, but the same phenomenon can still be seen in more complicated problems. I could give some other examples too, but this post is long enough as it is.
For reliable results in math-related queries, I find it best to ask the LLM for formulas and values, then perform the calculations myself. The LLM can typically look up information reasonably accurately but will mess up the application. Just use the right tool for the right job, and you’ll be ok.
Is your abuse of the ellipsis and dashes supposed to be ironic? Isn’t that a LLM tell?
I’m not even sure what the (‘phrase’) construct is even meant to imply, but it’s wild. Your abuse of punctuation in general feels like a machine trying to convince us it’s human or a machine transcribing a human’s stream of consciousness.
deleted by creator
No. It hallucinates all the time.
Sure does, but somehow many of the answers still work well enough. In many contexts, the hallucinations are only speed bumps, not show stopping disasters.
It told people to put glue in their pizza to make the dough chewy. It’s pretty fucking awful.
Copilot wrote me some code that totally does not work. I pointed out the bug and told it exactly how to fix the problem. It said it fixed it and gave me the exact same buggy trash code again. Yes, it can be pretty awful. LLMs fail in some totally absurd and unexpected ways. On the other hand, it knows the documentation of every function, but somehow still fails at some trivial tasks. It’s just bizarre.
It does this because it inherently hallucinates. It’s just an analytical letter guesser that sounds human because it amalgamates and predicts the next word. It’s just gotten so much input that it can sound human. But it has no concept of right and wrong. Even when you tell it that it’s wrong. It doesn’t understand anything. That’s why it sucks. And that’s why it will always suck. It will not replace search because it makes shit up. I use it for coding here and there as well and it’s just making up functions that don’t exist or attributes functions to packages that aren’t real.
Probably, however I will not be doing that because LLM models are dogshit and hallucinate bullshit half the time. I wouldn’t trust a single fucking thing that a LLM provides.
Fair enough, and that’s actually really good. You’re going to be one of the few who actually go through the trouble of making an account on a forum, ask a single question, and never visit the place after getting the answer. People like you are the reason why the internet has an answer to just about anything.
Haha. Yes I’ll be a tech Boomer. Stuck in my old ways. Although answers on forums are often straight misinformation so really there’s no perfect solution to get answers. You just have to cross check as many sources as possible.
And where does LLM take the answer? Forum and socmed. And if LLM don’t have the actual answer they blabbering like a redditor, and if someone can’t get an accurate answer they start asking forum and socmed.
So no, LLM will not replace human interaction because LLM relies on human interaction. LLM cannot diagnose your car without human first diagnose your car.
The problem is that the LLMs have stolen all that information, repackaged it in ways that are subtly (or blatantly) false or misleading, and then hidden the real information behind a wall of search results that are entire domains of ai trash. It’s very difficult to even locate the original sources or forums anymore.
I’ve even tried to use Gemini to find a particular YouTube video that matches specific criteria. Unsurprisingly, it gave me a bunch of videos, none of which were even close to what I’m looking for.
That’s true. There could be a balance of sorts. Who knows. If LLMs become increasingly useful, people start using them more. As they loose training data, quality goes down, and people shift back to forums etc. Could work that way too.
to an extent, yes, but not completely
LLMs are awesome in their knowledge until you start to hear its answers to stuff you already know and makes you wonder if anything was correct.
What they call hallucinations in other areas was called fabulations, to invent tales or stories.
I’m curious about what is the shortest acceptable answer for these things and if something close to “I don’t know” is even an option.
LLMs are awesome in their knowledge until you start to hear its answers to stuff you already know and makes you wonder if anything was correct.
This applies equally well to human-generated answers to stuff.
True, the difference is that with humans it’s usually more public, it is easier for someone to call bullshit. With LLMs the bullshit is served with the intimacy of embarrassing porn so is less likely to see any warnings.
Sound similar to betteridges law of headlines.
Im sure there are tricks like adding ‘fact check your response’ but I suspect there is something intrinsic to these models that makes it a super difficult problem.I get the feeling that LLMs are designed to please humans, so uncomfortable answers like “I don’t know” are out of the question.
- This thing is broken. How do I fix it?
- Don’t know. 🤷
- Seriously? I need an answer? Any ideas?
- Nope. You’re screwed. Best of luck to you. Figure it out. I believe in you. ❤️
Not designed, but trained. Training involves rewarding finding answers, so they WILL give you something. “I don’t know” is not going to fare well in the training development, so it naturally gets filtered out, while very creative (but wrong) LLMs do well.
There have been enough times that I googled something, saw the AI answer at the top, and repeated it like gospel. Only to look like a buffoon when we realize the AI was completely wrong.
Now I look right past the AI answer and read the sources it’s pulling from. Then I don’t have to worry about anything misinterpreting the answer.
True, but soon the sources will be AI generated too, in a big GIGO loop.
That’s exactly what I’m worried about happening. What If one day there are hardly any sources left?
At this rate that day is not too distant, I’m affraid.
I was expecting either Huxley or Orwell to be right, not both.
Interestingly, there’s an Intelligence Squared episode that explores that very point. As usual, there’s a debate, voting and both sides had some pretty good arguments. I’m convinced that Orwell and Huxley were correct about certain things. Not the whole picture, but specific parts of it.
Agreed, if we look closely we can find some Bradbury and William Gibson elements in the lovely dystopia we’re currently enjoying.
Oh absolutely. Cyberpunk was meant to feel alien and revolting, but nowadays it is beginning to feel surprisingly familiar. Still revolting though, just like the real world.
If the tech matures enough , potentially !
Not wrong about LLMs (currently )? bad with tech support , but so are search engines lol
People will use whatever method of finding answers that works best for them.
Stuck, you contact tech support, wait weeks for a reply, and the cycle continues
Why didn’t you post a question on a public forum in that scenario? Or, in the future, why wouldn’t the AI search agent itself post a question? If questions need to be asked then there’s nothing stopping them from still being asked.
If you cut a forum’s population by 90% it will die.
This is one of the biggest problems with AI. If it becomes the easiest way to get good answers for most things, it will starve the channels that can answer the things it can’t (including everything new).
Depends which 90%.
It’s ironic that this thread is on the Fediverse, which I’m sure has much less than 10% the population of Reddit or Facebook or such. Is the Fediverse “dead”?
This is one of the biggest problems with AI. If it becomes the easiest way to get good answers for most things
If it’s the easiest way to get good answers for most things, that doesn’t seem like a problem to me. If it isn’t the easiest way to get good answers, then why are people switching to it en mass anyway in this scenario?
I said “cut a forum by 90%”, not “a forum happens to be smaller than another”. Ask ChatGPT if you have trouble with words.
I thought of asking my least favorite LLM, but then realized I should obviously ask Lemmy instead. Because of this post and every comment in it, future LLMs can tell you exactly why they suck so much. I’ve done my part.
That is an option, and undoubtedly some people will continue to do that. It’s just that the number of those people might go down in the future.
Some people like forums and such much more than LLMs, so that number probably won’t go down to zero. It’s just that someone has to write that first answer, so that eventually other people might benefit from it.
What if it’s a very new product and a new problem? Back in the old days, that would translate to the question being asked very quickly in the only place where you can do that - the forums. Nowadays, the first person to even discover the problem might not be the forum type. They might just try all the other methods first, and find nothing of value. That’s the scenario I was mainly thinking of.
I did suggest a possible solution to this - the AI search agent itself could post a question in a forum somewhere if has been unable to find an answer.
This isn’t a feature yet of mainstream AI search agents but I’ve been following development and this sort of thing is already being done by hobbyists. Agentic AI workflows can be a lot more sophisticated than simple “do a search summarize results.” An AI agent could even try to solve the problem itself - reading source code, running tests in a sandbox, and so forth. If it figures out a solution that it didn’t find online, maybe it could even post answers to some of those unanswered forum questions. Assuming the forum doesn’t ban AI of course.
Basically, I think this is a case of extrapolating problems without also extrapolating the possibilities of solutions. Like the old Malthusian scenario, where Malthus projected population growth without also accounting for the fact that as demand for food rises new technologies for making food production more productive would also be developed. We won’t get to a situation where most people are using LLMs for answers without LLMs being good at giving answers.
This idea about automated forum posts and answers could work. However, a human would also need to verify that the generated solution actually solves a problem. There are still some pretty big ifs and buts in this thing, but I assume it could work. I just don’t think current LLMs are quite smart enough yet. It’s a fast moving target, and new capabilities are bing added on a daily basis, so it might not take very long until we get there.
However, a human would also need to verify that the generated solution actually solves a problem.
That’s already an issue with human-generated answers to problems. :)
“Verification” could be done by an AI agent too, though, as I described above. Depends on the sort of problem. A programming solution can be tested in a simple sandbox, a medical solution would require a bit more effort to validate (whether by human or by AI).
I just don’t think current LLMs are quite smart enough yet.
Certainly, we’re both speculating about future developments here.
LLMs are the big block V8 of search engines. They can do things very fast and consume tons of resources with subterranean efficiency. On top of that, they are privacy invasive, easy to use for manipulation and speed up the problem of less mature users being spoon fed. General purpose LLMs need to be outlawed immediately.
prohibition of anything is usually a bad idea
Right. How about csam, incest, cannibalism?
arguments like this are fucking stupid
Glad you agree. Non arguments are not a good idea.
No, your argument is stupid. OF COURSE those things are bad, its stupid to think that’s what I implied.
You made a blanket statement and now you’re angry because someone called you out on it. I get that. But i dont care. Please dont make blanket statements like that. Thats not a good way of debating stuff.
Of course outlawing of stuff is good in certain cases. And LLMs (and AI in general) as a public tool, exploited for profit, isn’t good for humanity. It sucks energy like crazy, produces bullshit results, diseducates people and further benefits the capitalist class.
It’s just not okay to have that. I would have gone with an argument that goes “but how about for personal use on your own computer?” Then I would say I can see that being okay, as long as it doesnt permanently increase everyones personal power usage because that is the same as if you had giant centralized AIs.
See? You can argue against my point without making self defeating statements.
I’m not angry at all. I just think your response is childish.
Silly me, I forgot that running an LLM model was so similar to cannibalism.
Thanks for showing that you have no actual arguments.
LLMs are inherently bad for society in their current form. They have no real benefit. They push capital extraction and further increase the pressure on workers. They have insane energy requirements, insane hardware requirements. We are working on saving our planet and can absolutely not spare the massive amounts of energy required for this shit.
Thanks for showing that you have no actual arguments.
You did it first by jumping to “think of the children!” And analogizing running a program to cannibalism.
They have no real benefit.
No need to ban them, then. Nobody will use them if this is true.
They have insane energy requirements, insane hardware requirements.
I run them locally on my computer, I know this is factually incorrect through direct experience.
Personal experience aside, if running an LLM query really required “insane” energy and hardware expenditures then why are companies like Google so eager to do it for free? These are public companies whose mandates are to generate a profit. Whatever they’re getting out of running those LLM queries must be worth the cost of running them.
We are working on saving our planet
I see you’ve switched from “think of the children!” To “think of the environment!”
You just showed again that you have no actual arguments. You’re using populism to “win” against factually correct and provable statements.
Using anecdotal evidence is a cheap trick and I believe you know it. It’s not evidence at all. Numbers show that I’m right and you’re wrong in this case.
“Think of the children” is used as a thought stopper by the political right to push their laws against humanity through. It isnt as smart as you think to wrongly ascribe it. I was right and showed it, you cant live with it. Thats okay.
Using anecdotal evidence is a cheap trick and I believe you know it. It’s not evidence at all. Numbers show that I’m right and you’re wrong in this case.
So… got any?
“Think of the children” is used as a thought stopper by the political right to push their laws against humanity through.
I refer you back to your earlier comment analogizing LLMs to “csam”.
I haven’t looked into many LLMs, but Microsoft will use your data for training the next version of Copilot. If you’re a paying enterprise customer, then your data won’t be used for that.
I suspect Google is also using every bit of data they can get their hands on. They have a habit of handing out shiny new stuff in exchange for your data. That’s exactly why Android and Chrome don’t require your money.