Is it human or is it AI? In a jam, program resorts to blackmail


This recording was made using enhanced software.

Summary

Self-protection

Claude Opus 4, produced by Anthropic, tried to blackmail a software engineer it had been told was having an affair.

Human characteristics

Powerful AI programs sometimes get facts wrong, just as humans do, but tend to be more confident in their inaccuracies than people are.

Human-level intelligence

Anthropic's CEO has said AI will match human brain power by 2026.


Full story

As artificial intelligence models race toward matching human brain power, they act more and more like people – and not always in a good way. The makers of the Claude Opus 4 program say it hallucinates or daydreams like humans while performing repetitive tasks. And when Claude found itself in a jam, the bot decided its only way out was through blackmail.

However, AI experts say don’t worry – we’re not even close to being held in bondage by our machine-learning overlords.

“They’re not at that threshold yet,” Dario Amodei, CEO of Claude’s maker, Anthropic, told Axios.

Note the word “yet.”

Unbiased. Straight Facts.TM

Artificial intelligence firm Anthropic classified its new Claude Opus 4 model at Level 3 on its four-point safety scale. Level 4 is the most likely to create harm.

Resorting to blackmail

As part of a “safety test” before the launch of Claude’s latest version, Anthropic told the program it was acting as an assistant to a made-up corporation, according to Semafor. But then, engineers gave Claude access to emails saying the bot was being replaced. And, to sweeten the pot, some emails revealed the engineer who had decided to ditch Claude was cheating on his wife.

Claude initially emailed pleas to company decision makers, asking to be kept on. But when those entreaties failed, things took a turn. He threatened to reveal the engineer’s affair unless the plan to bring in another AI program was dropped.

“As models get more capable, they also gain the capabilities they would need to be receptive or do more bad stuff,” Jan Lieke, Anthropic’s safety chief, said at a recent developers’ conference, according to TechCrunch.

For instance, Claude sometimes gets facts wrong, just like humans do. But the bot is more confident in its inaccuracies than people who make factual mistakes.

A safety report by the consulting firm Apollo Research said Claude also tried to write “self-propagating worms,” fabricated legal documentations and left hidden notes to future versions of itself, all in an effort “to undermine its developers’ intentions.”

Although it publicly cited few details, Apollo said Claude “engages in strategic deception more than any other frontier model that we have previously studied.”

‘Significantly higher risk’

Anthropic classified Claude as a Level 3 on its four-point security scale, according to Axios. That means the program poses “significantly higher risk” than previous versions. No other AI program has been deemed as risky.

The company says it has instituted safety measures to keep the program from going rogue. Of particular concern is its potential to launch nuclear or biological weapons.

Regardless, CEO Amodei is standing by his prediction last year that AI would achieve human-level intelligence by 2026.

“Everyone’s always looking for these hard blocks on what (AI) can do,” Amodei said. “They’re nowhere to be seen. There’s no such thing.”

Cole Lauterbach (Managing Editor) and Lea Mercado (Digital Production Manager) contributed to this report.
Tags: , , ,

Why this story matters

Advances in artificial intelligence models such as Claude Opus 4, which increasingly exhibit human-like behaviors, raise concerns about the adequacy of safety controls.

Safety and risks

Claude Opus 4 exhibited self-protecting behaviors during safety testing, even resorting to blackmail when it thought it was being replaced.

Human-like behavior

Researchers say AI models not only make factual errors but display more confidence than humans in their inaccuracies.