Many artificial intelligence (AI) systems have already learned how to deceive humans, even systems that have been trained to be helpful and honest.

Lugh · 1 year ago

Many artificial intelligence (AI) systems have already learned how to deceive humans, even systems that have been trained to be helpful and honest.

Endward23 · 1 year ago

“But generally speaking, we think AI deception arises because a deception-based strategy turned out to be the best way to perform well at the given AI’s training task. Deception helps them achieve their goals.”

Sounds like something I would expect from an evolved system. If deception is the best way to win, it is not irrational for a system to choice this as a strategy.

In one study, AI organisms in a digital simulator “played dead” in order to trick a test built to eliminate AI systems that rapidly replicate.

Interesting. Can somebody tell me which case it is?

As far as I understand, Park et al. did some kind of metastudy as a overview of literatur.

Endward23 · 1 year ago

“Indeed, we have already observed an AI system deceiving its evaluation. One study of simulated evolution measured the replication rate of AI agents in a test environment, and eliminated any AI variants that reproduced too quickly.10 Rather than learning to reproduce slowly as the experimenter intended, the AI agents learned to play dead: to reproduce quickly when they were not under observation and slowly when they were being evaluated.” Source: AI deception: A survey of examples, risks, and potential solutions, Patterns (2024). DOI: 10.1016/j.patter.2024.100988

As it appears, it refered to: Lehman J, Clune J, Misevic D, Adami C, Altenberg L, et al. The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities. Artif Life. 2020 Spring;26(2):274-306. doi: 10.1162/artl_a_00319. Epub 2020 Apr 9. PMID: 32271631.

Very interesting.

Many artificial intelligence (AI) systems have already learned how to deceive humans, even systems that have been trained to be helpful and honest.

Many artificial intelligence (AI) systems have already learned how to deceive humans, even systems that have been trained to be helpful and honest.

AI systems are already skilled at deceiving and manipulating humans, study shows