Two-faced AI language models learn to hide deception - ‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

Lugh · 2 years ago

mateomaui@reddthat.com · 2 years ago

Just… don’t hook it up to the defense grid.

Possibly linux@lemmy.zip · 2 years ago

Sorry, to late for that

mateomaui@reddthat.com · 2 years ago

Alright, I’ll be out back digging the bomb shelter.

Possibly linux@lemmy.zip · edit-2 2 years ago

Its too late for that honestly

mateomaui@reddthat.com · 2 years ago

Alright, I’ll switch to digging holes for the family burial ground.