Apr 4, 2024 9:00 AM
OpenAI’s GPT Store Is Triggering Copyright Complaints
For the past few months, Morten Blichfeldt Andersen has spent many hours scouring OpenAI’s GPT Store. Since it launched in January, the marketplace for bespoke bots has filled up with a deep bench of useful and sometimes quirky AI tools. Cartoon generators spin up New Yorker–style illustrations and vivid anime stills. Programming and writing assistants offer shortcuts for crafting code and prose. There’s also a color analysis bot, a spider identifier, and a dating coach called RizzGPT. Yet Blichfeldt Andersen is hunting only for one very specific type of bot: Those built on his employer’s copyright-protected textbooks without permission.
Blichfeldt Andersen is publishing director at Praxis, a Danish textbook purveyor. The company has been embracing AI and created its own custom chatbots. But it is currently engaged in a game of whack-a-mole in the GPT Store, and Blichfeldt Andersen is the man holding the mallet.
“I’ve been personally searching for infringements and reporting them,” Blichfeldt Andersen says. “They just keep coming up.” He suspects the culprits are primarily young people uploading material from textbooks to create custom bots to share with classmates—and that he has uncovered only a tiny fraction of the infringing bots in the GPT Store. “Tip of the iceberg,” Blichfeldt Andersen says.
It is easy to find bots in the GPT Store whose descriptions suggest they might be tapping copyrighted content in some way, as Techcrunch noted in a recent article claiming OpenAI’s store was overrun with “spam.” Using copyrighted material without permission is permissable in some contexts but in others rightsholders can take legal action. WIRED found a GPT called Westeros Writer that claims to “write like George R.R. Martin,” the creator of Game of Thrones. Another, Voice of Atwood, claims to imitate the writer Margaret Atwood. Yet another, Write Like Stephen, is intended to emulate Stephen King.
When WIRED tried to trick the King bot into revealing the “system prompt” that tunes its responses, the output suggested it had access to King’s memoir On Writing. Write Like Stephen was able to reproduce passages from the book verbatim on demand, even noting which page the material came from. (WIRED could not make contact with the bot’s developer, because it did not provide an email address, phone number, or external social profile.)
OpenAI spokesperson Kayla Wood says it responds to takedown requests against GPTs made with copyrighted content but declined to answer WIRED’s questions about how frequently it fulfills such requests. She also says the company proactively looks for problem GPTs. “We use a combination of automated systems, human review, and user reports to find and assess GPTs that potentially violate our policies, including the use of content from third parties without necessary permission,” Wood says.
New Disputes
The GPT store’s copyright problem could add to OpenAI’s existing legal headaches. The company is facing a number of high-profile lawsuits alleging copyright infringement, including one brought by The New York Times and several brought by different groups of fiction and nonfiction authors, including big names like George R.R. Martin.
Chatbots offered in OpenAI’s GPT Store are based on the same technology as its own ChatGPT but are created by outside developers for specific functions. To tailor their bot, a developer can upload extra information that it can tap to augment the knowledge baked into OpenAI’s technology. The process of consulting this additional information to respond to a person’s queries is called retrieval-augmented generation, or RAG. Blichfeldt Andersen is convinced that the RAG files behind the bots in the GPT Store are a hotbed of copyrighted materials uploaded without permission.
OpenAI’s terms for the GPT Store explicitly prohibit “using content from third parties without the necessary permissions,” but right now there’s no way for outsiders to check whether their copyrighted material has been uploaded by the developers creating GPTs. That means concerned copyright holders have to go hunting.
Blichfeldt Andersen uses keywords to comb the GPT Store for chatbots that might be using material from his company’s books. He then has to engage each bot he finds in conversation to try to divine whether it has been trained on Praxis titles. It’s tedious work but is getting results: He ha successfully prompted several bots to reproduce specific passages from Praxis textbooks. “You have to trick the language model to reveal itself,” he says.
The lawsuits accusing OpenAI of scraping copyrighted material without permission to train its systems may take years to resolve, but disputes over material uploaded to the GPT Store could have more immediate repercussions. “GPTs change the relationship between OpenAI and its users in an important way for copyright,” says James Grimmelmann, a professor of internet law at Cornell University. When online platforms allow users to upload their own content—for example, YouTube allowing regular people to publish personal videos—they are subject to the Digital Millennium Copyright Act, part of US copyright law that allows copyright holders to file complaints if their intellectual property is disseminated without their permission. So if, say, a YouTuber posts a clip with music in the background that they didn’t license, sometimes music labels will file complaints and get the videos taken down. Since the GPT Store allows developers to upload their work, it is governed by these rules.
“Infringing” Bots
Intended as an anti-piracy statute, the Digital Millennium Copyright Act now has outsize importance in copyright enforcement, as it allows copyright holders a relatively zippy way to demand that their work be removed when people put it online without their permission: DMCA takedown notices.
After Blichfeldt Andersen found his first few examples of Praxis textbooks in the GPT Store, he filed DMCA takedown notices to OpenAI. He says the company didn’t respond until he asked the Danish Rights Alliance, which represents the interests of creative workers in Denmark, to help out. The DRA has a hard-charging approach to protecting members’ copyright in the age of AI. Last year it got a collection of over 196,000 books used for generative AI training temporarily taken offline by filing DMCA takedown notices.
Thomas Heldrup, the DRA’s head of content protection and enforcement, often leads its AI crusades. He played a central role in taking on the GPT Store, too, filing complaints on behalf of Praxis that led to OpenAI taking down bots that the publisher considered infringing.
“They have been pretty quick to remove infringing GPTs that we have reported to them,” Heldrup says. Still, he’d like to see the company make changes. “There needs to be better tools at the disposal of rights holders to search for these infringing GPTs,” Heldrup says.
Blichfeldt Andersen says Praxis is considering legal action against OpenAI if conditions on the GPT Store do not improve. He would like to see the company and other AI developers add more robust systems that scan for copyrighted material in uploaded RAG content, similar to the Content ID system in place to protect copyrighted materials from appearing on YouTube. (When asked if it plans to introduce a Content ID–like system, OpenAI did not answer directly, but OpenAI’s Wood tells WIRED it does screen GPTs proactively.)
Startups are already appearing that offer to help AI companies scan for infringing output. Anand Kannappan, CEO and founder of Patronus AI, says its recently launched Copyright Catcher service, designed to detect copyrighted text, could “absolutely” detect potential infringement in custom GPTs.
But although OpenAI has complied with some DMCA takedown requests aimed at its GPT Store, some intellectual property experts believe that the company could argue that the concept of fair use protects some GPTs reliant on copyrighted works.
“I think it would be really hasty to say you can't upload anything that’s copyrighted to these tools without permission, because that rules out hugely important education and research functions,” says Meredith Jacob, the project director of copyright and open licensing at American University Washington College of Law. She sees the creation of GPTs that help students understand their textbooks as something that could easily be protected by fair use.
Without a simple way for outsiders to see what’s been uploaded in the supplementary files for the GPT Store’s bots, copyright holders worried about infringements either have to trust that OpenAI’s automated systems are catching violations—or take the time-consuming approach of investigating each suspicious bot individually. “It’s like finding a needle in a haystack,” says Blichfeldt Andersen.
Updated 4/4/2024, 3.35 pm EDT: American University Washington has a College of Law, not a School of Law.
Jenna Scatena
Lauren Goode
Steven Levy
Will Knight
Caroline Haskins
Virginia Heffernan
Steven Levy
Kate Knibbs
*****
Credit belongs to : www.wired.com