Random Image Display on Page Reload

OpenAI’s GPT Store Is Triggering Copyright Complaints

Apr 4, 2024 9:00 AM

OpenAI’s GPT Store Is Triggering Copyright Complaints

A publisher says some chatbots in OpenAI’s GPT Store were created using its copyrighted textbooks. OpenAI has taken down some of the bots but could face more complaints from rights holders.

Illustration of a file drawer opening out of a robot head with one folder labeled with the copyright symbol

Illustration: Jacqui VanLiew; Getty Images

For the past few months, Morten Blichfeldt Andersen has spent many hours scouring OpenAI’s GPT Store. Since it launched in January, the marketplace for bespoke bots has filled up with a deep bench of useful and sometimes quirky AI tools. Cartoon generators spin up New Yorker–style illustrations and vivid anime stills. Programming and writing assistants offer shortcuts for crafting code and prose. There’s also a color analysis bot, a spider identifier, and a dating coach called RizzGPT. Yet Blichfeldt Andersen is hunting only for one very specific type of bot: Those built on his employer’s copyright-protected textbooks without permission.

Blichfeldt Andersen is publishing director at Praxis, a Danish textbook purveyor. The company has been embracing AI and created its own custom chatbots. But it is currently engaged in a game of whack-a-mole in the GPT Store, and Blichfeldt Andersen is the man holding the mallet.

“I’ve been personally searching for infringements and reporting them,” Blichfeldt Andersen says. “They just keep coming up.” He suspects the culprits are primarily young people uploading material from textbooks to create custom bots to share with classmates—and that he has uncovered only a tiny fraction of the infringing bots in the GPT Store. “Tip of the iceberg,” Blichfeldt Andersen says.

It is easy to find bots in the GPT Store whose descriptions suggest they might be tapping copyrighted content in some way, as Techcrunch noted in a recent article claiming OpenAI’s store was overrun with “spam.” Using copyrighted material without permission is permissable in some contexts but in others rightsholders can take legal action. WIRED found a GPT called Westeros Writer that claims to “write like George R.R. Martin,” the creator of Game of Thrones. Another, Voice of Atwood, claims to imitate the writer Margaret Atwood. Yet another, Write Like Stephen, is intended to emulate Stephen King.

When WIRED tried to trick the King bot into revealing the “system prompt” that tunes its responses, the output suggested it had access to King’s memoir On Writing. Write Like Stephen was able to reproduce passages from the book verbatim on demand, even noting which page the material came from. (WIRED could not make contact with the bot’s developer, because it did not provide an email address, phone number, or external social profile.)

OpenAI spokesperson Kayla Wood says it responds to takedown requests against GPTs made with copyrighted content but declined to answer WIRED’s questions about how frequently it fulfills such requests. She also says the company proactively looks for problem GPTs. “We use a combination of automated systems, human review, and user reports to find and assess GPTs that potentially violate our policies, including the use of content from third parties without necessary permission,” Wood says.

New Disputes

The GPT store’s copyright problem could add to OpenAI’s existing legal headaches. The company is facing a number of high-profile lawsuits alleging copyright infringement, including one brought by The New York Times and several brought by different groups of fiction and nonfiction authors, including big names like George R.R. Martin.

Chatbots offered in OpenAI’s GPT Store are based on the same technology as its own ChatGPT but are created by outside developers for specific functions. To tailor their bot, a developer can upload extra information that it can tap to augment the knowledge baked into OpenAI’s technology. The process of consulting this additional information to respond to a person’s queries is called retrieval-augmented generation, or RAG. Blichfeldt Andersen is convinced that the RAG files behind the bots in the GPT Store are a hotbed of copyrighted materials uploaded without permission.

OpenAI’s terms for the GPT Store explicitly prohibit “using content from third parties without the necessary permissions,” but right now there’s no way for outsiders to check whether their copyrighted material has been uploaded by the developers creating GPTs. That means concerned copyright holders have to go hunting.

Blichfeldt Andersen uses keywords to comb the GPT Store for chatbots that might be using material from his company’s books. He then has to engage each bot he finds in conversation to try to divine whether it has been trained on Praxis titles. It’s tedious work but is getting results: He ha successfully prompted several bots to reproduce specific passages from Praxis textbooks. “You have to trick the language model to reveal itself,” he says.

The lawsuits accusing OpenAI of scraping copyrighted material without permission to train its systems may take years to resolve, but disputes over material uploaded to the GPT Store could have more immediate repercussions. “GPTs change the relationship between OpenAI and its users in an important way for copyright,” says James Grimmelmann, a professor of internet law at Cornell University. When online platforms allow users to upload their own content—for example, YouTube allowing regular people to publish personal videos—they are subject to the Digital Millennium Copyright Act, part of US copyright law that allows copyright holders to file complaints if their intellectual property is disseminated without their permission. So if, say, a YouTuber posts a clip with music in the background that they didn’t license, sometimes music labels will file complaints and get the videos taken down. Since the GPT Store allows developers to upload their work, it is governed by these rules.

“Infringing” Bots

Intended as an anti-piracy statute, the Digital Millennium Copyright Act now has outsize importance in copyright enforcement, as it allows copyright holders a relatively zippy way to demand that their work be removed when people put it online without their permission: DMCA takedown notices.

After Blichfeldt Andersen found his first few examples of Praxis textbooks in the GPT Store, he filed DMCA takedown notices to OpenAI. He says the company didn’t respond until he asked the Danish Rights Alliance, which represents the interests of creative workers in Denmark, to help out. The DRA has a hard-charging approach to protecting members’ copyright in the age of AI. Last year it got a collection of over 196,000 books used for generative AI training temporarily taken offline by filing DMCA takedown notices.

Thomas Heldrup, the DRA’s head of content protection and enforcement, often leads its AI crusades. He played a central role in taking on the GPT Store, too, filing complaints on behalf of Praxis that led to OpenAI taking down bots that the publisher considered infringing.

“They have been pretty quick to remove infringing GPTs that we have reported to them,” Heldrup says. Still, he’d like to see the company make changes. “There needs to be better tools at the disposal of rights holders to search for these infringing GPTs,” Heldrup says.

Blichfeldt Andersen says Praxis is considering legal action against OpenAI if conditions on the GPT Store do not improve. He would like to see the company and other AI developers add more robust systems that scan for copyrighted material in uploaded RAG content, similar to the Content ID system in place to protect copyrighted materials from appearing on YouTube. (When asked if it plans to introduce a Content ID–like system, OpenAI did not answer directly, but OpenAI’s Wood tells WIRED it does screen GPTs proactively.)

Startups are already appearing that offer to help AI companies scan for infringing output. Anand Kannappan, CEO and founder of Patronus AI, says its recently launched Copyright Catcher service, designed to detect copyrighted text, could “absolutely” detect potential infringement in custom GPTs.

But although OpenAI has complied with some DMCA takedown requests aimed at its GPT Store, some intellectual property experts believe that the company could argue that the concept of fair use protects some GPTs reliant on copyrighted works.

“I think it would be really hasty to say you can't upload anything that’s copyrighted to these tools without permission, because that rules out hugely important education and research functions,” says Meredith Jacob, the project director of copyright and open licensing at American University Washington College of Law. She sees the creation of GPTs that help students understand their textbooks as something that could easily be protected by fair use.

Without a simple way for outsiders to see what’s been uploaded in the supplementary files for the GPT Store’s bots, copyright holders worried about infringements either have to trust that OpenAI’s automated systems are catching violations—or take the time-consuming approach of investigating each suspicious bot individually. “It’s like finding a needle in a haystack,” says Blichfeldt Andersen.

Updated 4/4/2024, 3.35 pm EDT: American University Washington has a College of Law, not a School of Law.

Kate Knibbs is a senior writer at WIRED, covering culture. She was previously a writer at The Ringer and Gizmodo.
Senior Writer

More from WIRED

He Emptied an Entire Crypto Exchange Onto a Thumb Drive. Then He Disappeared

Faruk Özer just started a 11,196-year prison sentence. Did he almost get away with the biggest heist in Turkey’s history, or was it all just a big misunderstanding?

Jenna Scatena

Perplexity's Founder Was Inspired by Sundar Pichai. Now They’re Competing to Reinvent Search

Aravind Srinivas grew up in the same city as Google’s CEO and developed an obsession with the company long before launching his own AI search startup.

Lauren Goode

8 Google Employees Invented Modern AI. Here’s the Inside Story

They met by chance, got hooked on an idea, and wrote the “Transformers” paper—the most consequential tech breakthrough in recent history.

Steven Levy

The NSA Warns That US Adversaries Free to Mine Private Data May Have an AI Edge

Gilbert Herrera, who leads research at the National Security Agency, says large language models are incredibly useful—and a bit of a headache—for America’s intelligence machine.

Will Knight

A Deepfake Nude Generator Reveals a Chilling Look at Its Victims

WIRED reporting uncovered a site that “nudifies” photos for a fee—and posts a feed appearing to show user uploads. They included photos of young girls and images seemingly taken of strangers.

Caroline Haskins

‘$5,000 to Save a Life Is a Bargain’

As Sam Bankman-Fried’s downfall sends effective altruism into a spiral of self-doubt, the idealist quant Elie Hassenfeld is still helping Silicon Valley richies give away hundreds of millions each year.

Virginia Heffernan

Tech Leaders Once Cried for AI Regulation. Now the Message Is ‘Slow Down’

Any dreams of a sweeping AI bill out of Congress are basically a hallucination.

Steven Levy

Here’s Proof You Can Train an AI Model Without Slurping Copyrighted Content

OpenAI claimed it’s “impossible” to build good AI models without using copyrighted data. An “ethically created” large language model and a giant AI dataset of public domain text suggest otherwise.

Kate Knibbs

*****
Credit belongs to : www.wired.com

Check Also

Meta Is Already Training a More Powerful Successor to Llama 3

Will Knight Business Apr 18, 2024 9:08 PM Meta Is Already Training a More Powerful …