Random Image Display on Page Reload

This Startup Wants YouTube Creators to Get Paid for AI Training Data

Sep 30, 2024 11:15 AM

This Startup Wants YouTube Creators to Get Paid for AI Training Data

While big platforms like Reddit have signed deals with the AI giants, YouTube leaves licensing in the hands of individual creators. The “License to Scrape” program aims to give those streaming stars proper leverage.

An illustration of a handshake between a human hand and a hand made of ones and zeros in front of a red textured background.
Illustration: WIRED Staff/Getty Images

So far, when AI companies have trained on YouTube’s invaluable stash of videos, captions, and other content, they’ve done so without permission. An AI-focused content licensing startup called Calliope Networks is hoping to change that with its new “License to Scrape,” a program aimed directly at YouTube stars.

“There's obvious demand from AI companies to scrape YouTube content. We see that by their actions. So what we're trying to do is to create a tool that makes it legal and simple for them,” says Calliope Networks CEO Dave Davis. Unlike other big social platforms, like Reddit, YouTube hasn’t struck deals with AI bigwigs to scrape its videos. The appeal of the License to Scrape is that it sidesteps the company itself providing a large volume of YouTube content in one go by corralling a group of creators and negotiating a blanket license.

Davis has a background in traditional media licensing; he left a gig at the Motion Picture Licensing Corporation to launch Calliope, betting that the AI industry would eventually move away from permissionless scraping and toward licensing as a norm. He’s not alone in this belief; it’s a boom time for AI data licensing startups. Calliope Networks is a founding member of the Datasets Providers Alliance, a trade group that requires all creators and rights holders to opt into scraping.

Here’s how Davis hopes it’ll work: YouTube creators who want to license their data will enter into a contract with Calliope, which will then sublicense their work out for training generative AI foundational models. It’ll need a critical mass of content to make the deal attractive enough to the AI players first, so the program will need to get YouTubers on board before it can properly get up and running. Calliope would take a percentage of the licensing fees paid by the AI companies.

Although this kind of license is uncommon in the AI world, Davis modeled the scraping license format off other parts of the entertainment industry, like Broadcast Music Inc. (BMI) and the American Society of Composers, Authors, and Publishers (ASCAP), which both use blanket licenses for music.

“It’s early in the recruitment process,” Davis says. He estimates that Calliope will need to offer a minimum of 25,000 to 50,000 hours of YouTube content before it’s taken seriously by the AI industry. That this volume of footage is the likely threshold for blanket licenses demonstrates why banding together could be some creators’ best bet for making money for AI training—in this business, volume matters, and video generators are powered by a large amount of data.

There aren’t any marquee names endorsing the license yet, but Calliope has already drafted a few influencer marketing agencies like Viral Nation to get clients on board. “I’ve been getting really good feedback from creators,” says Bianca Serafini, Viral Nation’s head of content licensing. She is confident that a large number of the company’s client roster—which is close to 900 YouTubers—will participate. “No one has presented something like this to us before.”

In fact, Calliope is joining at least one other startup in offering this type of license. Avail launched its own AI licensing initiative, called Corpus, earlier this year. At the end of August, the AI startup announced its own partnership with a talent agency representing YouTubers, including the popular DIY account JerryRigEverything.

And what does YouTube make of all this? Davis hasn’t directly worked with the company on this project, but he believes it’s in line with the video behemoth’s wishes. “My take is that YouTube wants to give creators more control,” Davis says.

While YouTube won’t comment on specific licensing companies, it does indeed support its users striking their own agreements. “Generally speaking, creators can enter into deals with third-party companies regarding their content on our platform,” says YouTube spokesperson Jack Malon, who noted that the company recently published a blog post emphasizing its intentions to allow YouTubers “more control” in the age of AI. The crucial thing for YouTube is authorization, or getting explicit permission: “Unauthorized access of creator content is prohibited by YouTube’s Terms of Service, and we’ll continue to employ measures to ensure third parties respect these terms.”

Whether the License to Scrape program succeeds will depend on more than just securing big-name YouTubers. It will require a major shift in how AI companies approach foundational training. With more than 30 copyright cases involving permissionless data-scraping winding through US courts, that type of shift may end up legally mandated. However, as text-to-video generation tools often need large amounts of high-quality data to work well, the hunt for more sources of said data may necessitate a different approach.

Until then, though, it’s not at all clear that the AI bigwigs plan to stop scraping what they call “publicly available” data from websites like YouTube. (When they do reach agreements that include foundational model training, like video-focused AI startup Runway inking a deal with movie studio Lionsgate, the data involved is typically not “publicly available.”) Most of the deals they are striking with platforms and publishers are focused on providing content for AI search products like SearchGPT rather than foundational model training. Recently, after it received a legal threat from the popular UK-based parenting forum Mumsnet, OpenAI told WIRED that it is primarily interested in licensing large datasets that aren’t publicly available.

In the meantime, supporters of this project believe it’s time to press forward, rather than wait for AI companies to signal interest. “We just have to get ahead of this,” Serafini says.

Update 10/2/24 9:25am ET: This story has been updated to include the startup Avail, which offers a similar licensing plan.

Kate Knibbs is a senior writer at WIRED, covering the human side of the generative AI boom and how new tech shapes the arts, entertainment, and media industries. Prior to joining WIRED she was a features writer at The Ringer and a senior writer at Gizmodo. She is based in… Read more
Senior Writer

Read More

New Cloudflare Tools Let Sites Detect and Block AI Bots for Free

“The path we’re on isn’t sustainable,” Cloudflare CEO Matthew Prince tells WIRED, in reference to rampant AI scraping. Here’s his plan to course-correct.
Kate Knibbs

How to Generate an AI Podcast Using Google’s NotebookLM

A little-known AI notebook tool from Google is going viral for its Audio Overviews that mimic the speech cadence of podcasters. Here’s how to try it out.
Reece Rogers

This New Tech Puts AI In Touch With Its Emotions—and Yours

Hume AI, a startup founded by a psychologist who specializes in measuring emotion, gives some top large language models a realistic human voice.
Will Knight

Inside Two Years of Turmoil at Big Tech's Anti-Terrorism Group

X has left the board of GIFCT, an organization through which tech companies exchange information to keep violent content off the web. It's the latest in a series of episodes driving tension within the ranks.
Paresh Dave

OpenAI Announces a New AI Model, Code-Named Strawberry, That Solves Difficult Problems Step by Step

The ChatGPT maker reveals details of what’s officially known as OpenAI o1, which shows that AI needs more than scale to advance.
Will Knight

Why AI Is So Bad at Generating Images of Kamala Harris

Race and gender are part of it, but there’s more to those unconvincing pictures of the presidential candidate.
Will Knight

An AI Bot Named James Has Taken My Old Job

A local newspaper in Hawaii has turned to AI-generated presenters to draw in new audiences.
Guthrie Scrimgeour

Microsoft’s Copilot AI Gets a Voice, Vision, and a ‘Hype Man’ Persona

Powered by OpenAI’s latest models, Microsoft’s Copilot assistant is becoming a lot more handy—and wants to be an “encouraging” digital coworker.
Will Knight

*****
Credit belongs to : www.wired.com

Check Also

The Disinformation Warning Coming From the Edge of Europe

By Morgan Meaker Business Oct 19, 2024 2:00 AM The Disinformation Warning Coming From the …