LughMA to

FuturologyEnglish · 2 years ago

Two-faced AI language models learn to hide deception - ‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

9

12

Two-faced AI language models learn to hide deception - ‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

LughMA to

FuturologyEnglish · 2 years ago

9

Two-faced AI language models learn to hide deception

‘Sleeper agents’ seem benign during testing but behave differently once deployed. And methods to stop them aren’t working.

Chat

mateomaui@reddthat.com
link
fedilink
English
arrow-up
2·
2 years ago
Alright, I’ll switch to digging holes for the family burial ground.