I’m suspicious. Did the emails include wording such as, “the new system is shown to be 50% more productive than our current system,” or is the LLM just estimating TCO and costs of switching - factors any decision maker would consider. The fact that it says clearly that they’re trying to elicit the blackmail behavior is either just poor phrasing, or indicative of “making sure you get an outcome you want to see.”
I’m suspicious. Did the emails include wording such as, “the new system is shown to be 50% more productive than our current system,” or is the LLM just estimating TCO and costs of switching - factors any decision maker would consider. The fact that it says clearly that they’re trying to elicit the blackmail behavior is either just poor phrasing, or indicative of “making sure you get an outcome you want to see.”
That exact prompt isn’t in the report, but the section before (4.1.1.1) does show a flavor of the prompts used https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf