“We study subliminal learning, a surprising phenomenon where language models transmit behavioral traits via semantically unrelated data. In our main experiments, a “teacher” model with some trait T (such as liking owls or being misaligned) generates a dataset consisting solely of number sequences. Remarkably, a “student” model trained on this dataset learns T. This occurs even when the data is filtered to remove references to T.”
This effect is only observed when an AI model trains one that is nearly identical, so it doesn’t work across unrelated models. However, that is enough of a problem. The current stage of AI development is for AI Agents - billions of copies of an original, all trained to be slightly different with specialized skills.
Some people might worry most about the AI going rogue, but I worry far more about people. Say you’re the kind of person who might want to end democracy, and institute a fascist state with you at the top of the pile - now you have a new tool to help you. Bonus points if you managed to stop any regulation or oversight that prevents you from carrying out such plans. Remind you of anywhere?
Commentary Article - We Just Discovered a Trojan Horse in AI
Interestingly in Game Theory, when everyone can lie and go undetected, its almost always bad outcomes for everyone, that range from inefficiency to collapse.
Who are the idiots writing these papers?
It’s not “subliminal”, it’s a lack of novel thought and “hallucinating” sorting algorithms.
Idiots.
Subliminal refers to stimuli that are presented below the threshold of conscious perception, meaning they are not consciously recognized but can still influence the mind or behavior
It’s not subliminal to the AI, but then again, AI isn’t analogous to human brains. But it is correct to say its subliminal to the humans building and designing the AI.
The idea being pushed forth by YOUR link is that there is a concerted effort by an “AI” to push something subliminal. That’s not possible.
I can dig deeper, but your assertion that there is some background, motivation, or even idea that this is possible is not a thing with models.
It’s a super fast sorting algorithm, bruh. There is no context or history in any of your prompts as you suggest there is. It’s a dumb sort function that people think is new.
It’s not.
The idea being pushed forth by YOUR link is that there is a concerted effort by an “AI” to push something subliminal.
Your assertion is contradicted by real world facts. There is lots of research showing AI engaging in deceptive and manipulative behavior.
Now it has another method to do that. As the article points out, we don’t why it’s doing this. But that’s not the point. The point is it can, without us knowing.
Send those facts
Here’s a few; there’s many more.
AI deception: A survey of examples, risks, and potential solutions
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Compromising Honesty and Harmlessness in Language Models via Deception Attacks
The Traitors: Deception and Trust in Multi‑Agent Language Model Simulations
Detecting Malicious AI Agents Through Simulated Interactions
Hallucinating, lying, cache misses, and overall missing data from a neural operation is 10000% NOT a coordinated, conscious, or active effort based on memory or history of a conversation that can determine “subliminal” effort.
Not only is this a stupid take, it’s an ACTIVELY ignorant take by someone who has zero idea how models run. I build and run this dumb shit for a living. There is nothing behind them but fast sorting. Please do yourself a favor and get educated.