Generative AI Is Making Companies Even More Thirsty for Your Data

Aug 10, 2023 12:00 PM

Generative AI Is Making Companies Even More Thirsty for Your Data

The outcry over Zoom's tweak to its data policy shows how the race to build more powerful AI models creates new pressure to source training data—including by juicing it from users.

Photograph: Jonathan Knowles/Getty Images

Zoom, the company that normalized attending business meetings in your pajama pants, was forced to unmute itself this week to reassure users that it would not use personal data to train artificial intelligence without their consent.

A keen-eyed Hacker News user last week noticed that an update to Zoom’s terms and conditions in March appeared to essentially give the company free rein to slurp up voice, video, and other data, and shovel it into machine learning systems.

The new terms stated that customers “consent to Zoom’s access, use, collection, creation, modification, distribution, processing, sharing, maintenance, and storage of Service Generated Data” for purposes including “machine learning or artificial intelligence (including for training and tuning of algorithms and models).”

The discovery prompted critical news articles and angry posts across social media. Soon, Zoom backtracked. Soon, Zoom backtracked. On Monday, Zoom’s chief product officer, Smita Hasham, wrote a blog post stating, “We will not use audio, video, or chat customer content to train our artificial intelligence models without your consent.” The company also updated its terms to say the same.

Later in the week, Zoom updated its terms again, to clarify to say that it would not feed "audio, video, chat, screen sharing, attachments, or other communications like customer content (such as poll results, whiteboard, and reactions)" to AI models. Vera Ranneft, a spokesperson for the company, says Zoom has not previously used customer content this way.

Those updates seem reassuring enough, but of course many Zoom users or admins for business accounts might click “OK” to the terms without fully realizing what they’re handing over. And employees required to use Zoom may be unaware of the choice their employer has made. One lawyer notes that the terms still permit Zoom to collect a lot of data without consent.

The kerfuffle shows the lack of meaningful data protections at a time when the generative AI boom has made the tech industry even more hungry for data than it already was. Companies have come to view generative AI as a kind of monster that must be fed at all costs—even if it isn’t always clear what exactly that data is needed for or what those future AI systems might end up doing.

Most Popular

The ascent of AI image generators like DALL-E 2 and Midjourny, followed by ChatGPT and other clever-yet-flawed chatbots, was made possible thanks to huge amounts of training data—much of it copyrighted—that was scraped from the web. And all manner of companies are currently looking to use the data they own, or that is generated by their customers and users, to build generative AI tools.

Zoom is already on the generative bandwagon. In June, the company introduced two text-generation features for summarizing meetings and composing emails about them. Zoom could conceivably use data from its users’ video meetings to develop more sophisticated algorithms. These might summarize or analyze individuals’ behavior in meetings, or perhaps even render a virtual likeness for someone whose connection temporarily dropped or hasn’t had time to shower.

The problem with Zoom’s effort to grab more data is that it reflects the broad state of affairs when it comes to our personal data. Many tech companies already profit from our information, and many of them like Zoom are now on the hunt for ways to source more data for generative AI projects. And yet it is up to us, the users, to try to police what they are doing.

“Companies have an extreme desire to collect as much data as they can,” says Janet Haven, executive director of the think tank Data and Society. “This is the business model—to collect data and build products around that data, or to sell that data to data brokers.”

The US lacks a federal privacy law, leaving consumers more exposed to the pangs of ChatGPT-inspired data hunger than people in the EU. Proposed legislation, such as the American Data Privacy and Protection Act, offers some hope of providing tighter federal rules on data collection and use, and the Biden administration’s AI Bill of Rights also calls for data protection by default. But for now, public pushback like that in response to Zoom’s moves is the most effective way to curb companies’ data appetites. Unfortunately, this isn’t a reliable mechanism for catching every questionable decision by companies trying to compete in AI.

In an age when the most exciting and widely praised new technologies are built atop mountains of data collected from consumers, often in ethically questionable ways, it seems that new protections can’t come soon enough. “Every single person is supposed to take steps to protect themselves,” Haven says. “That is antithetical to the idea that this is a societal problem.”

Updated 8-14-2023, 1:20 pm EDT: This article was updated to reflect Zoom making additional changes to its data policy.

Updated 8-10-2023, 7:15 pm EDT: This article was updated with comment from Zoom.

Get More From WIRED

Will Knight is a senior writer for WIRED, covering artificial intelligence. He writes the Fast Forward newsletter that explores how advances in AI and other emerging technology are set to change our lives—sign up here. He was previously a senior editor at MIT Technology Review, where he wrote about fundamental… Read more

Senior Writer

TopicsFast Forward privacy Regulation Zoom ChatGPT