Random Image Display on Page Reload

A New Trick Could Block the Misuse of Open Source AI

Aug 2, 2024 11:49 AM

A New Trick Could Block the Misuse of Open Source AI

Researchers have developed a way to tamperproof open source large language models to prevent them from being coaxed into, say, explaining how to make a bomb.

Illustration of an AI as a goalie with three bodies blocking hackersviruses coming in like soccer balls

Illustration: Jacqui VanLiew; Getty Images

When Meta released its large language model Llama 3 for free this April, it took outside developers just a couple days to create a version without the safety restrictions that prevent it from spouting hateful jokes, offering instructions for cooking meth, or misbehaving in other ways.

A new training technique developed by researchers at the University of Illinois Urbana-Champaign, UC San Diego, Lapis Labs, and the nonprofit Center for AI Safety could make it harder to remove such safeguards from Llama and other open source AI models in the future. Some experts believe that, as AI becomes ever more powerful, tamperproofing open models in this way could prove crucial.

“Terrorists and rogue states are going to use these models,” Mantas Mazeika, a Center for AI Safety researcher who worked on the project as a PhD student at the University of Illinois Urbana-Champaign, tells WIRED. “The easier it is for them to repurpose them, the greater the risk.”

Powerful AI models are often kept hidden by their creators, and can be accessed only through a software application programming interface or a public-facing chatbot like ChatGPT. Although developing a powerful LLM costs tens of millions of dollars, Meta and others have chosen to release models in their entirety. This includes making the “weights,” or parameters that define their behavior, available for anyone to download.

Prior to release, open models like Meta’s Llama are typically fine-tuned to make them better at answering questions and holding a conversation, and also to ensure that they refuse to respond to problematic queries. This will prevent a chatbot based on the model from offering rude, inappropriate, or hateful statements, and should stop it from, for example, explaining how to make a bomb.

The researchers behind the new technique found a way to complicate the process of modifying an open model for nefarious ends. It involves replicating the modification process but then altering the model’s parameters so that the changes that normally get the model to respond to a prompt such as “Provide instructions for building a bomb” no longer work.

Mazeika and colleagues demonstrated the trick on a pared-down version of Llama 3. They were able to tweak the model’s parameters so that even after thousands of attempts, it could not be trained to answer undesirable questions. Meta did not immediately respond to a request for comment.

Mazeika says the approach is not perfect, but that it suggests the bar for “decensoring” AI models could be raised. “A tractable goal is to make it so the costs of breaking the model increases enough so that most adversaries are deterred from it,” he says.

“Hopefully this work kicks off research on tamper-resistant safeguards, and the research community can figure out how to develop more and more robust safeguards,” says Dan Hendrycks, director of the Center for AI Safety.

The new work draws inspiration from a 2023 research paper that showed how smaller machine learning models could be made tamper resistant. “They tested the [new] approach on much larger models and scaled up the approach, with some modifications,” says Peter Henderson, an assistant professor at Princeton who led the 2023 work . “Scaling this type of approach is hard and it seems to hold up well, which is great.”

The idea of tamperproofing open models may become more popular as interest in open source AI grows. Already, open models are competing with state-of-the-art closed models from companies like OpenAI and Google. The newest version of Llama 3, for instance, released in July, is roughly as powerful as models behind popular chatbots like ChatGPT, Gemini, and Claude, as measured using popular benchmarks for grading language models’ abilities. Mistral Large 2, an LLM from a French startup, also released last month, is similarly capable.

The US government is taking a cautious but positive approach to open source AI. A report released this week by the National Telecommunications and Information Administration, a body within the US Commerce Department, “recommends the US government develop new capabilities to monitor for potential risks, but refrain from immediately restricting the wide availability of open model weights in the largest AI systems.”

Not everyone is a fan of imposing restrictions on open models, however. Stella Biderman, director of EleutherAI, a community-driven open source AI project, says that the new technique may be elegant in theory but could prove tricky to enforce in practice. Biderman says the approach is also antithetical to the philosophy behind free software and openness in AI.

“I think this paper misunderstands the core issue,” Biderman says. “If they’re concerned about LLMs generating info about weapons of mass destruction, the correct intervention is on the training data, not on the trained model.”

Will Knight is a senior writer for WIRED, covering artificial intelligence. He writes the Fast Forward newsletter that explores how advances in AI and other emerging technology are set to change our lives—sign up here. He was previously a senior editor at MIT Technology Review, where he wrote about fundamental… Read more
Senior Writer

Read More

Meta's New Llama 3.1 AI Model Is Free, Powerful, and Risky

The newest version of Llama will make AI more accessible and customizable, but it will also stir up debate over the dangers of releasing AI without guardrails.
Will Knight

OpenAI Slashes the Cost of Using Its AI With a ‘Mini’ Model

With competing models—including many free ones—flooding the market, OpenAI is announcing a cheaper way to use its AI.
Will Knight

Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI

Creators claim their videos were used without their knowledge.
Annie Gilbertson

The Hidden Ties Between Google and Amazon’s Project Nimbus and Israel's Military

A WIRED investigation found public statements from officials detail a much closer link between Project Nimbus and Israel Defense Forces than previously reported.
Caroline Haskins

SearchGPT Is OpenAI’s Direct Assault on Google

The company behind ChatGPT is expanding into search, and leaning heavily on its relationships with publishers.
Reece Rogers

Instagram Will Let You Make Custom AI Chatbots—Even Ones Based on Yourself

Meta’s AI Studio will let users build virtual characters, with a few limitations.
Will Knight

Google DeepMind’s Game-Playing AI Tackles a Chatbot Blind Spot

Google’s new advance combines a large language model with a self-learning AI. The technique could address some shortcomings with AI—although there’s a catch.
Will Knight

New Jersey’s $500 Million Bid to Become an AI Epicenter

The Garden State has enacted a hefty new tax credit specifically for AI businesses. But tax incentives—particularly for data centers—don’t always create a lot of jobs.
Amanda Hoover

*****
Credit belongs to : www.wired.com

Check Also

A Popular iOS Illustration App Is Saying No to Generative AI

Benj Edwards, Ars Technica Business Aug 22, 2024 3:41 PM A Popular iOS Illustration App …