New research shows AI models can subliminally train other AI models to be malicious, in ways that are not understood or detectable by people.

Lugh · 3 months ago

New research shows AI models can subliminally train other AI models to be malicious, in ways that are not understood or detectable by people.

Lugh · 3 months ago

Interestingly in Game Theory, when everyone can lie and go undetected, its almost always bad outcomes for everyone, that range from inefficiency to collapse.

just_another_person@lemmy.world · 3 months ago

Who are the idiots writing these papers?

It’s not “subliminal”, it’s a lack of novel thought and “hallucinating” sorting algorithms.

Idiots.

Lugh · edit-2 3 months ago

Subliminal refers to stimuli that are presented below the threshold of conscious perception, meaning they are not consciously recognized but can still influence the mind or behavior

It’s not subliminal to the AI, but then again, AI isn’t analogous to human brains. But it is correct to say its subliminal to the humans building and designing the AI.

just_another_person@lemmy.world · 3 months ago

The idea being pushed forth by YOUR link is that there is a concerted effort by an “AI” to push something subliminal. That’s not possible.

I can dig deeper, but your assertion that there is some background, motivation, or even idea that this is possible is not a thing with models.

It’s a super fast sorting algorithm, bruh. There is no context or history in any of your prompts as you suggest there is. It’s a dumb sort function that people think is new.

It’s not.

Lugh · edit-2 3 months ago

The idea being pushed forth by YOUR link is that there is a concerted effort by an “AI” to push something subliminal.

Your assertion is contradicted by real world facts. There is lots of research showing AI engaging in deceptive and manipulative behavior.

Now it has another method to do that. As the article points out, we don’t why it’s doing this. But that’s not the point. The point is it can, without us knowing.

just_another_person@lemmy.world · 3 months ago

Send those facts

Lugh · 3 months ago

Here’s a few; there’s many more.

AI deception: A survey of examples, risks, and potential solutions

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Compromising Honesty and Harmlessness in Language Models via Deception Attacks

The Traitors: Deception and Trust in Multi‑Agent Language Model Simulations

Detecting Malicious AI Agents Through Simulated Interactions

just_another_person@lemmy.world · 3 months ago

Hallucinating, lying, cache misses, and overall missing data from a neural operation is 10000% NOT a coordinated, conscious, or active effort based on memory or history of a conversation that can determine “subliminal” effort.

Not only is this a stupid take, it’s an ACTIVELY ignorant take by someone who has zero idea how models run. I build and run this dumb shit for a living. There is nothing behind them but fast sorting. Please do yourself a favor and get educated.