Random Image Display on Page Reload

AI Code Hallucinations Increase the Risk of ‘Package Confusion’ Attacks

Apr 30, 2025 3:08 PM

AI Code Hallucinations Increase the Risk of ‘Package Confusion’ Attacks

A new study found that code generated by AI is more likely to contain made-up information that can be used to trick software into interacting with malicious code.

Image may contain Cup Adult Person Computer Hardware Electronics Hardware Monitor Screen Mouse and Indoors

Photo-Illustration: Wired Staff/Getty Images

AI-generated computer code is rife with references to nonexistent third-party libraries, creating a golden opportunity for supply-chain attacks that poison legitimate programs with malicious packages that can steal data, plant backdoors, and carry out other nefarious actions, newly published research shows.

The study, which used 16 of the most widely used large language models to generate 576,000 code samples, found that 440,000 of the package dependencies they contained were “hallucinated,” meaning they were nonexistent. Open source models hallucinated the most, with 21 percent of the dependencies linking to nonexistent libraries. A dependency is an essential code component that a separate piece of code requires to work properly. Dependencies save developers the hassle of rewriting code and are an essential part of the modern software supply chain.

Package Hallucination Flashbacks

These nonexistent dependencies represent a threat to the software supply chain by exacerbating so-called dependency confusion attacks. These attacks work by causing a software package to access the wrong component dependency, for instance by publishing a malicious package and giving it the same name as the legitimate one but with a later version stamp. Software that depends on the package will, in some cases, choose the malicious version rather than the legitimate one because the former appears to be more recent.

Also known as package confusion, this form of attack was first demonstrated in 2021 in a proof-of-concept exploit that executed counterfeit code on networks belonging to some of the biggest companies on the planet, Apple, Microsoft, and Tesla included. It's one type of technique used in software supply-chain attacks, which aim to poison software at its very source in an attempt to infect all users downstream.

“Once the attacker publishes a package under the hallucinated name, containing some malicious code, they rely on the model suggesting that name to unsuspecting users,” Joseph Spracklen, a University of Texas at San Antonio PhD student and lead researcher, told Ars via email. “If a user trusts the LLM's output and installs the package without carefully verifying it, the attacker’s payload, hidden in the malicious package, would be executed on the user's system.”

In AI, hallucinations occur when an LLM produces outputs that are factually incorrect, nonsensical, or completely unrelated to the task it was assigned. Hallucinations have long dogged LLMs because they degrade their usefulness and trustworthiness and have proven vexingly difficult to predict and remedy. In a paper scheduled to be presented at the 2025 USENIX Security Symposium, they have dubbed the phenomenon “package hallucination.”

For the study, the researchers ran 30 tests, 16 in the Python programming language and 14 in JavaScript, that generated 19,200 code samples per test, for a total of 576,000 code samples. Of the 2.23 million package references contained in those samples, 440,445, or 19.7 percent, pointed to packages that didn’t exist. Among these 440,445 package hallucinations, 205,474 had unique package names.

One of the things that makes package hallucinations potentially useful in supply-chain attacks is that 43 percent of package hallucinations were repeated over 10 queries. “In addition,” the researchers wrote, “58 percent of the time, a hallucinated package is repeated more than once in 10 iterations, which shows that the majority of hallucinations are not simply random errors but a repeatable phenomenon that persists across multiple iterations. This is significant, because a persistent hallucination is more valuable for malicious actors looking to exploit this vulnerability and makes the hallucination attack vector a more viable threat.”

In other words, many package hallucinations aren’t random, one-off errors. Rather, specific names of nonexistent packages are repeated over and over. Attackers could seize on the pattern by identifying nonexistent packages that are repeatedly hallucinated. The attackers would then publish malware using those names and wait for them to be accessed by large numbers of developers.

The study uncovered disparities in the LLMs and programming languages that produced the most package hallucinations. The average percentage of package hallucinations produced by open source LLMs such as CodeLlama and DeepSeek was nearly 22 percent, compared with a little more than 5 percent by commercial models. Code written in Python resulted in fewer hallucinations than JavaScript code, with an average of almost 16 percent compared with a little over 21 percent for JavaScript. Asked what caused the differences, Spracklen wrote:

“This is a difficult question because large language models are extraordinarily complex systems, making it hard to directly trace causality. That said, we observed a significant disparity between commercial models (such as the ChatGPT series) and open-source models, which is almost certainly attributable to the much larger parameter counts of the commercial variants. Most estimates suggest that ChatGPT models have at least 10 times more parameters than the open-source models we tested, though the exact architecture and training details remain proprietary. Interestingly, among open-source models, we did not find a clear link between model size and hallucination rate, likely because they all operate within a relatively smaller parameter range.

“Beyond model size, differences in training data, fine-tuning, instruction training, and safety tuning all likely play a role in package hallucination rate. These processes are intended to improve model usability and reduce certain types of errors, but they may have unforeseen downstream effects on phenomena like package hallucination.

“Similarly, the higher hallucination rate for JavaScript packages compared to Python is also difficult to attribute definitively. We speculate that it stems from the fact that JavaScript has roughly 10 times more packages in its ecosystem than Python, combined with a more complicated namespace. With a much larger and more complex package landscape, it becomes harder for models to accurately recall specific package names, leading to greater uncertainty in their internal predictions and, ultimately, a higher rate of hallucinated packages.”

The findings are the latest to demonstrate the inherent untrustworthiness of LLM output. With Microsoft CTO Kevin Scott predicting that 95 percent of code will be AI-generated within five years, here’s hoping developers heed the message.

This story originally appeared onArs Technica.

Dan Goodin is IT Security Editor at Ars Technica. … Read more
Read More

An AI Customer Service Chatbot Made Up a Company Policy—and Created a Mess

When an AI model for code-editing company Cursor hallucinated a new rule, users revolted.
Benj Edwards, Ars Technica

Palantir Is Helping DOGE With a Massive IRS Data Project

For the past three days, DOGE and a handful of Palantir representatives, along with dozens of career IRS engineers, have been collaborating to build a “mega API,” WIRED has learned.
Makena Kelly

Microsoft’s Recall AI Tool Is Making an Unwelcome Return

Microsoft held off on releasing the privacy-unfriendly feature after a swell of pushback last year. Now it’s trying again, with a few improvements that skeptics say still aren't enough.
Dan Goodin, Ars Technica

The AI Agent Era Requires a New Kind of Game Theory

Zico Kolter, a Carnegie Mellon professor and board member at OpenAI, tells WIRED about the dangers of AI agents interacting with one another—and why models need to be more resistant to attacks.
Will Knight

AI Is Spreading Old Stereotypes to New Languages and Cultures

Margaret Mitchell, an AI ethics researcher at Hugging Face, tells WIRED about a new dataset designed to test AI models for bias in multiple languages.
Reece Rogers

DOGE Is Building a Master Database to Surveil and Track Immigrants

DOGE is knitting together data from the Department of Homeland Security, Social Security Administration, and IRS that could create a surveillance tool of unprecedented scope.
Makena Kelly

Florida Man Enters the Encryption Wars

Plus: A US judge rules against police cell phone “tower dumps,” China names alleged NSA agents it says were involved in cyberattacks, and Customs and Border Protection reveals its social media spying tools.
Lily Hay Newman

OpenAI’s New GPT 4.1 Models Excel at Coding

GPT 4.1, GPT 4.1 Mini, and GPT 4.1 Nano are all available now—and will help OpenAI compete with Google and Anthropic.
Will Knight

DOGE Has Access to Sensitive Labor Department Data on Immigrants and Farm Workers

Three DOGE associates have been granted access to systems at the Department of Labor housing sensitive information on migrant farm workers, visa applicants, and more.
Leah Feiger

How To Use Gemini AI To Summarize YouTube Videos

Looking for the CliffsNotes of a lengthy YouTube video? This Gemini feature could be worth a try.
David Nield

DOGE Put a College Student in Charge of Using AI to Rewrite Regulations

A DOGE operative has been tasked with using AI to propose rewrites to the Department of Housing and Urban Development’s regulations—an effort sources are told will roll out across government.
David Gilbert

WhatsApp Is Walking a Tightrope Between AI Features and Privacy

WhatsApp's AI tools will use a new “Private Processing” system designed to allow cloud access without letting Meta or anyone else see end-to-end encrypted chats. But experts still see risks.
Lily Hay Newman

*****
Credit belongs to : www.wired.com

Check Also

Second dead grey whale in less than a week washes ashore in B.C.

A second dead grey whale has washed ashore in British Columbia in less than a …