Random Image Display on Page Reload

This Tech Exec Quit His Job to Fight Generative AI’s Original Sin

Ed Newton-Rex quit his job at startup Stability AI over ethical concerns about its collection of training data. His nonprofit Fairly Trained aims to deter startups from scraping the web.

Transparent brain with multicolored geometric shapes inside it against a pink background

Illustration: Eugene Mymrin/Getty Images

Ed Newton-Rex says generative AI has an ethics problem. He ought to know, because he used to be part of the fast-growing industry. Newton-Rex was TikTok’s head AI designer and then an executive at Stability AI until he quit in disgust in November over the company’s stance on collecting training data.

After his high-profile departure, Newton-Rex threw himself into conversation after conversation about what building AI ethically would look like in practice. “It struck me that there are a lot of people who want to use generative AI models that treat creators fairly,” he says. “If you can give them better decisionmaking tools, that’s helpful.”


Headshot of a person

Ed Newton-Rex’s nonprofit is trying to encourage companies to be more thoughtful about sourcing training data for AI projects.

Courtesy of Ed Newton-Rex

Now Newton-Rex has launched a new nonprofit, Fairly Trained, to give people exactly that type of decisionmaking tool. It offers a certification program to identify AI companies that license their training data. The AI industry now has its own version of those “fair trade” certification labels you see on coffee.

To earn Fairly Trained’s certification label, which it calls L Certification, a company must prove that its training data was either explicitly licensed for training purposes, in the public domain, offered under an appropriate open license, or already belonged to the company.

So far, nine companies have received the certification, including image generator Bria AI, which trains exclusively on data licensed from sources like Getty Images, and music generation platform LifeScore Music, which licenses work from all the major record labels. Several others are close to completing their certification. The nonprofit charges a fee of $500 to $6,000, depending on the size of the applicant’s business.

OpenAI— the world’s leading generative AI company—recently argued that it is impossible to create generative AI services like ChatGPT without using unlicensed data. Newton-Rex and the first companies to get Fairly Certified’s stamp of approval disagree. “We already think of it as a mandatory thing to do,” Bria CEO Yair Adato says of licensing data. He compares AI models built on unlicensed data to Napster and the Pirate Bay, and his own company to Spotify. “It’s really easy for us to be compliant,” says Lifescore Music’s cofounder Tom Gruber, who is also an adviser to Fairly Certified. “The music business really cares about provenance and rights.”

Newton-Rex says it has support from trade groups like the Association of American Publishers and the Association of Independent Music Publishers, as well as companies like Universal Music Group. But the movement to overturn the AI industry’s standard approach of scraping training data at will is still very much in its infancy. And Fairly Trained is a one-man operation. Not that Newton-Rex minds; he still has a startup founder’s mindset. “I believe in shipping things early,” he says.

The nonprofit is also not the only one trying to standardize the idea of labeling AI products with information about their ingredients. Former Warner Music Group executive Howie Singer, who now studies how technology is changing the music industry, sees similarities between Fairly Trained and the Content Authenticity Initiative, a project spearheaded by Adobe to help people track the authenticity of images. “This is a good step,” he says of Newton-Rex’s project.

Just as some shoppers check to see whether their eggs are hatched by pasture-raised, non-GMO hens, while others simply grab the cheapest carton they can find, Singer suspects the ethical data certification will appeal to certain groups of plugged-in insiders more than the general populace. “Will the average person care? Some might,” he says, “but not everybody.”

Among people who do care where the data behind AI algorithms comes from, there may be an appetite for additional certifications. Copyright activist and former Recording Industry Association of America executive Neil Turkewitz welcomes the idea of Fairly Trained but says its initial offering is too limited. “What this certification says is that the AI company isn’t relying on fair use to justify unauthorized scraping,” he says. “It doesn’t say that the company’s practices are fair or conform to creators’ expectations about the contours of their consent.”

Newton-Rex agrees. “What I don't want to do is claim that if someone is certified here, they are perfectly ethical,” he says. He wants to roll out additional certificates in the future, possibly addressing issues like compensation. Still, he’s proud of this first-of-its-kind project: “It's not going to solve everything, but I think that it can help.”

Updated 1-18-2024, 6 pm EST: A previous version of this article misstated the name of the Content Authenticity Initiative.

*****
Credit belongs to : www.wired.com

Check Also

Openvibe: Unified app for decentralized socials

If you have not heard of Mastodon and/or BlueSky, then you are missing out… a …