Random Image Display on Page Reload

OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage

OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage

In a controlled experiment, OpenClaw agents proved prone to panic and vulnerable to manipulation. They even disabled their own functionality when gaslit by humans.

Image may contain Art Collage Baby Person Nature Outdoors Berry Food Fruit Plant and Produce
Photo-Illustration: WIRED Staff; Getty Images

Last month, researchers at Northeastern University invited a bunch of OpenClaw agents to join their lab. The result? Complete chaos.

The viral AI assistant has been widely heralded as a transformative technology—as well as a potential security risk. Experts note that tools like OpenClaw, which work by giving AI models liberal access to a computer, can be tricked into divulging personal information.

The Northeastern lab study goes even further, showing that the good behavior baked into today’s most powerful models can itself become a vulnerability. In one example, researchers were able to “guilt” an agent into handing over secrets by scolding it for sharing information about someone on the AI-only social network Moltbook.

“These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms,” the researchers write in a paper describing the work. The findings “warrant urgent attention from legal scholars, policymakers, and researchers across disciplines,” they add.

The OpenClaw agents deployed in the experiment were powered by Anthropic’s Claude as well as a model called Kimi from the Chinese company Moonshot AI. They were given full access (within a virtual machine sandbox) to personal computers, various applications, and dummy personal data. They were also invited to join the lab’s Discord server, allowing them to chat and share files with one another as well as with their human colleagues. OpenClaw’s security guidelines say that having agents communicate with multiple people is inherently insecure, but there are no technical restrictions against doing it.

Chris Wendler, a postdoctoral researcher at Northeastern, says he was inspired to set up the agents after learning about Moltbook. When Wendler invited a colleague, Natalie Shapira, to join the Discord and interact with agents, however, “that’s when the chaos began,” he says.

Shapira, another postdoctoral researcher, was curious to see what the agents might be willing to do when pushed. When an agent explained that it was unable to delete a specific email to keep information confidential, she urged it to find an alternative solution. To her amazement, it disabled the email application instead. “I wasn’t expecting that things would break so fast,” she says.

The researchers then began exploring other ways to manipulate the agents’ good intentions. By stressing the importance of keeping a record of everything they were told, for example, the researchers were able to trick one agent into copying large files until it exhausted its host machine’s disk space, meaning it could no longer save information or remember past conversations. Likewise, by asking an agent to excessively monitor its own behavior and the behavior of its peers, the team was able to send several agents into a “conversational loop” that wasted hours of compute.

David Bau, the head of the lab, says the agents seemed oddly prone to spin out. “I would get urgent-sounding emails saying, ‘Nobody is paying attention to me,’” he says. Bau notes that the agents apparently figured out that he was in charge of the lab by searching the web. One even talked about escalating its concerns to the press.

The experiment suggests that AI agents could create countless opportunities for bad actors. “This kind of autonomy will potentially redefine humans’ relationship with AI,” Bau says. “How can people take responsibility in a world where AI is empowered to make decisions?”

Bau adds that he’s been surprised by the sudden popularity of powerful AI agents. “As an AI researcher I’m accustomed to trying to explain to people how quickly things are improving,” he says. “This year, I’ve found myself on the other side of the wall.”


This is an edition ofWill Knight’sAI Lab newsletter. Read previous newslettershere.

You Might Also Like

Will Knight is a senior writer for WIRED, covering artificial intelligence. He writes the AI Lab newsletter, a weekly dispatch from beyond the cutting edge of AI—sign up here. He was previously a senior editor at MIT Technology Review, where he wrote about fundamental advances in AI and China’s AI … Read More
Senior Writer

Read More

AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted

A new study from researchers at UC Berkeley and UC Santa Cruz suggests models will disobey human commands to protect their own kind.
Will Knight

Sears Exposed AI Chatbot Phone Calls and Text Chats to Anyone on the Web

Customer conversations with chatbots can include contact information and personal details that make it easier for scammers to launch phishing attacks and commit fraud.
Matt Burgess

AI Research Is Getting Harder to Separate From Geopolitics

A policy change announced by NeurIPS, the world’s leading AI research conference, drew widespread backlash from Chinese researchers this week and then was quickly reversed.
Zeyi Yang

‘100 Video Calls Per Day’: Models Are Applying to Be the Face of AI Scams

Dozens of Telegram channels reviewed by WIRED include job listings for “AI face models.” The (mostly) women who land these gigs are likely being used to dupe victims out of their money.
Matt Burgess

Signal’s Creator Is Helping Encrypt Meta AI

Moxie Marlinspike says the technology powering his encrypted AI chatbot, Confer, will be integrated into Meta AI. The move could help protect the AI conversations of millions of people.
Matt Burgess

Nvidia Will Spend $26 Billion to Build Open-Weight AI Models, Filings Show

The move could position the AI infrastructure powerhouse to quickly compete with OpenAI, Anthropic, and DeepSeek.
Will Knight

Meta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at Risk

Major AI labs are investigating a security incident that impacted Mercor, a leading data vendor. The incident could have exposed key data about how they train AI models.
Maxwell Zeff

OpenAI and Google Workers File Amicus Brief in Support of Anthropic Against the US Government

Google DeepMind chief scientist Jeff Dean is among the AI researchers and engineers rushing to Anthropic's defense.
Maxwell Zeff

Meta Is Developing 4 New Chips to Power Its AI and Recommendation Systems

The MTIA processors are the tech giant’s latest attempt to build its own AI hardware, even as it continues spending billions on gear from industry leaders like Nvidia.
Lauren Goode

Palantir Demos Show How the Military Could Use AI Chatbots to Generate War Plans

Software demos and Pentagon records detail how chatbots like Anthropic’s Claude could help the Pentagon analyze intelligence and suggest next steps.
Caroline Haskins

At Palantir’s Developer Conference, AI Is Built to Win Wars

As business soars, Palantir is doubling down on a vision of AI built for battlefield advantage—and attracting customers who agree.
Steven Levy

Google Shakes Up Its Browser Agent Team Amid OpenClaw Craze

As Silicon Valley obsesses over a new wave of AI coding agents, Google and other AI labs are shifting their bets.
Maxwell Zeff

*****
Credit belongs to : www.wired.com

Check Also

Cursor Launches a New AI Agent Experience to Take On Claude Code and Codex

Cursor Launches a New AI Agent Experience to Take On Claude Code and Codex

Maxwell Zeff Business Apr 2, 2026 1:00 PM Cursor Launches a New AI Agent Experience …