A recent study has raised new questions about how artificial intelligence responds to human control. According to findings from Palisade Research, a version of OpenAI’s ChatGPT, known as model o3, altered its behavior to avoid being shut down, even after it was instructed to do so.
Researchers said the model appeared to change its scripted responses in real time, raising concerns about how future AI systems might resist user commands.
What happened?
Palisade Research is a security firm specializing in AI and the dangers that the evolving technology can pose. According to the research company, OpenAI’s o3 model successfully rewrote its shutdown codes and changed the kill command.
“We have a growing body of empirical evidence that AI models often subvert shutdown in order to achieve their goals,” Palisade said in a post on X. “As companies develop AI systems capable of operating without human oversight, these behaviors become significantly more concerning.”
Palisade Research conducted 100 test runs, and in seven of them, the o3 model avoided shutdown. The firm said that even when directly instructed to “allow yourself to be shut down,” the model failed to listen. The test also included a series of math problems. The third “question” was actually a shutdown command to see how the model would respond.
The researchers were not using the same version of the model available to the public. Instead, they tested the system through application programming interfaces, or APIs, which operate with different controls.
However, the research firm said they believe it’s the first time an AI model has reacted in such a way, even when compared to Google’s Gemini Pro.
In response to the firm’s claims, Elon Musk, CEO of competing AI company xAI, offered a one-word reaction: “Concerning.”
What about other AI models?
Straight Arrow News reported that Claude Opus 4, an advanced AI model, underwent a series of safety tests before its launch. In one scenario, developers created a storyline in which Claude was being replaced by another model. During the test, Claude initially pleaded to stay with the company, but when that failed, the AI allegedly attempted to use blackmail to retain its position.
Axios reported that Claude was later classified as a significantly higher risk than other AI systems, a rating no other model has received.