Random Image Display on Page Reload

A New Group Is Trying to Make AI Data Licensing Ethical

Sep 4, 2024 7:00 AM

A New Group Is Trying to Make AI Data Licensing Ethical

The Dataset Providers Alliance calls for creators and rights holders to be able to opt in to having their material used for training purposes.

Animation: Darrell Jackson; Getty Images

The first wave of major generative AI tools largely were trained on “publicly available” data—basically, anything and everything that could be scraped from the internet. Now, sources of training data are increasingly restricting access and pushing for licensing agreements. With the hunt for additional data sources intensifying, new licensing startups have emerged to keep the source material flowing.

The Dataset Providers Alliance, a trade group formed this summer, wants to make the AI industry more standardized and fair. To that end, it has just released a position paper outlining its stances on major AI-related issues. The alliance is made up of seven AI licensing companies, including music-copyright-management firm Rightsify, Japanese stock-photo marketplace Pixta, and generative-AI copyright-licensing startup Calliope Networks. (At least five new members will be announced in the fall.)

The DPA advocates for an opt-in system, meaning that data can be used only after consent is explicitly given by creators and rights holders. This represents a significant departure from the way most major AI companies operate. Some have developed their own opt-out systems, which put the burden on data owners to pull their work on a case-by-case basis. Others offer no opt-outs whatsoever.

The DPA, which expects members to adhere to its opt-in rule, sees that route as the far more ethical one. “Artists and creators should be on board,” says Alex Bestall, CEO of Rightsify and the music-data-licensing company Global Copyright Exchange, who spearheaded the effort. Bestall sees opt-in as a pragmatic approach as well as a moral one: “Selling publicly available datasets is one way to get sued and have no credibility.”

Ed Newton-Rex, a former AI executive who now runs the ethical AI nonprofit Fairly Trained, calls opt-outs “fundamentally unfair to creators,” adding that some may not even know when opt-outs are offered. “It's particularly good to see the DPA calling for opt-ins,” he says.

Shayne Longpre, the lead at the Data Provenance Initiative, a volunteer collective that audits AI datasets, sees the DPA’s efforts to source data ethically as admirable, although he suspects the opt-in standard could be a tough sell, because of the sheer volume of data most modern-day AI models require. “Under this regime, you’re either going to be data-starved or you’re going to pay a lot,” he says. “It could be that only a few players, large tech companies, can afford to license all that data.”

In the paper, the DPA comes out against government-mandated licensing, arguing instead for a “free market” approach in which data originators and AI companies negotiate directly. Other guidelines are more granular. For example, the alliance suggests five potential compensation structures to make sure creators and rights holders are paid appropriately for their data. These include a subscription-based model, “usage-based licensing” (in which fees are paid per use), and “outcome-based” licensing, in which royalties are tied to profit. “These could work for anything from music to images to film and TV or books,” Bestall says.

“Looking to standardize compensation structures is potentially a good thing,” says Bill Rosenblatt, a technologist who studies copyright. “The Dataset Providers Alliance is in a very good position to put terms out there.” As Rosenblatt sees it, AI companies need incentives to adopt licensing. While the legal reasons (fear of lawsuits, regulation mandating licenses) are the most obviously compelling, Rosenblatt says it’s also important for would-be licensors to make the process as easy and convenient as possible. Standardizing payment models, he argues, helps smooth the road for mainstream adoption.

The DPA also endorses some uses of synthetic data—that which is generated by AI—arguing that it will “constitute the majority” of training data in the near future. “Some copyright holders probably won’t like it,” Bestall says. “But it’s inevitable.” The alliance advocates for “proper licensing” of the pre-training information used to create synthetic data and transparency on how the latter is made. It also calls for regular “evaluation” of the synthetic data models to “mitigate biases and ethical issues.”

Of course, the DPA needs to get the industry’s power players on board, which is easier said than done. “There are standards emerging for how to license data ethically,” Newton-Rex says. “But not enough AI companies are adopting them.”

Still, the very existence of the DPA demonstrates that the AI Wild West days appear to be coming to an end. “Everything is changing so fast,” Bestall says.

Kate Knibbs is a senior writer at WIRED, covering the human side of the generative AI boom and how new tech shapes the arts, entertainment, and media industries. Prior to joining WIRED she was a features writer at The Ringer and a senior writer at Gizmodo. She is based in… Read more
Senior Writer

Read More

The Best Hearing Aids We’ve Personally Tested and Vetted With an Expert

These WIRED-tested and audiologist-approved devices will help you hear sounds more clearly. Never miss out on a dinner conversation again.
Christopher Null

The Best Wireless Earbuds for Everyone

Ready to cut the cord? These are our favorite buds that will never, ever get tangled.
Parker Hall

How Do You Solve a Problem Like Polestar?

The all-electric sibling of Volvo has a new CEO, new models landing, and a new plant in South Carolina—but will this be enough to stop the EV brand's decline?
Carlton Reid

The 21 Best Movies on Amazon Prime Right Now

Elvis, Brittany Runs a Marathon, and American Fiction are just a few of the movies you should be watching on Amazon Prime Video this week.
Matt Kamen

Which Apple Watch Is Best Right Now?

Should you splurge for the Ultra Watch 2? Or stick with the SE? Let us help you figure out which version to get (and which to avoid).
Adrienne So

The 30 Best Shows on Apple TV+ Right Now

Sunny, Constellation, and Bad Monkey are among the best shows on Apple TV+ this month.
Angela Watercutter

We Spent Thousands of Hours Listening to Find the Best Wireless Headphones

Whether you need workout earbuds or gaming over-ears, these WIRED-tested picks sound like a million bucks.
Parker Hall

The 25 Best Shows on Amazon Prime Right Now

The Boys, Batman: Caped Crusader, and Fallout are just a few of the shows you should be watching on Amazon Prime Video this week.
Matt Kamen

*****
Credit belongs to : www.wired.com

Check Also

OpenAI’s o3-Mini Is a Leaner AI Model That Keeps Pace With DeepSeek

Will Knight Business Jan 31, 2025 2:27 PM OpenAI’s o3-Mini Is a Leaner AI Model …