“We study subliminal learning, a surprising phenomenon where language models transmit behavioral traits via semantically unrelated data. In our main experiments, a “teacher” model with some trait T (such as liking owls or being misaligned) generates a dataset consisting solely of number sequences. Remarkably, a “student” model trained on this dataset learns T. This occurs even when the data is filtered to remove references to T.”

This effect is only observed when an AI model trains one that is nearly identical, so it doesn’t work across unrelated models. However, that is enough of a problem. The current stage of AI development is for AI Agents - billions of copies of an original, all trained to be slightly different with specialized skills.

Some people might worry most about the AI going rogue, but I worry far more about people. Say you’re the kind of person who might want to end democracy, and institute a fascist state with you at the top of the pile - now you have a new tool to help you. Bonus points if you managed to stop any regulation or oversight that prevents you from carrying out such plans. Remind you of anywhere?

Original Research Paper - Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

Commentary Article - We Just Discovered a Trojan Horse in AI

  • LughOPMA
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 day ago

    Interestingly in Game Theory, when everyone can lie and go undetected, its almost always bad outcomes for everyone, that range from inefficiency to collapse.

  • just_another_person@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    1 day ago

    Who are the idiots writing these papers?

    It’s not “subliminal”, it’s a lack of novel thought and “hallucinating” sorting algorithms.

    Idiots.