Two-faced AI language models learn to hide deception - ‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

Lugh · 5 months ago

@mateomaui@reddthat.com · 5 months ago

Just… don’t hook it up to the defense grid.

Possibly linux · 5 months ago

Sorry, to late for that

@mateomaui@reddthat.com · 5 months ago

Alright, I’ll be out back digging the bomb shelter.

Possibly linux · edit-2 5 months ago

Its too late for that honestly

@mateomaui@reddthat.com · 5 months ago

Alright, I’ll switch to digging holes for the family burial ground.