LughMA to FuturologyEnglish · 1 year agoSycophancy to subterfuge: Investigating reward tampering in language modelswww.anthropic.comexternal-linkmessage-square0linkfedilinkarrow-up18arrow-down10
arrow-up18arrow-down1external-linkSycophancy to subterfuge: Investigating reward tampering in language modelswww.anthropic.comLughMA to FuturologyEnglish · 1 year agomessage-square0linkfedilink