Random Image Display on Page Reload

OpenAI Can Re-Create Human Voices—but Won’t Release the Tech Yet

Mar 30, 2024 1:30 PM

OpenAI Can Re-Create Human Voices—but Won’t Release the Tech Yet

Voice Engine is a new text-to-speech AI model for creating synthetic voices. OpenAI has said a wide release would be too risky.

A sculptural animation of a talking head with a tape.

Voice synthesis has come a long way since 1978’s Speak & Spell toy, which once wowed people with its state-of-the-art ability to read words aloud using an electronic voice. Now, using deep-learning AI models, software can create not only realistic-sounding voices but can also convincingly imitate existing voices using small samples of audio.

Along those lines, OpenAI this week announced Voice Engine, a text-to-speech AI model for creating synthetic voices based on a 15-second segment of recorded audio. It has provided audio samples of the Voice Engine in action on its website.

Once a voice is cloned, a user can input text into the Voice Engine and get an AI-generated voice result. But OpenAI is not ready to widely release its technology. The company initially planned to launch a pilot program for developers to sign up for the Voice Engine API earlier this month. But after more consideration about ethical implications, the company decided to scale back its ambitions for now.

“In line with our approach to AI safety and our voluntary commitments, we are choosing to preview but not widely release this technology at this time,” the company writes. “We hope this preview of Voice Engine both underscores its potential and also motivates the need to bolster societal resilience against the challenges brought by ever more convincing generative models.”

Voice cloning tech in general is not particularly new—there have been severalAI voice synthesis models since 2022, and the tech is active in the open source community with packages like OpenVoice and XTTSv2. But the idea that OpenAI is inching toward letting anyone use its particular brand of voice tech is notable. And in some ways, the company's reticence to release it fully might be the bigger story.

OpenAI says that benefits of its voice technology include providing reading assistance through natural-sounding voices, enabling global reach for creators by translating content while preserving native accents, supporting non-verbal individuals with personalized speech options, and assisting patients in recovering their own voice after speech-impairing conditions.

But it also means that anyone with 15 seconds of someone's recorded voice could effectively clone it, and that has obvious implications for potential misuse. Even if OpenAI never widely releases its Voice Engine, the ability to clone voices has already caused trouble in society through phone scams where someone imitates a loved one's voice and election campaign robocalls featuring cloned voices from politicians like Joe Biden.

Also, researchers and reporters have shown that voice-cloning technology can be used to break into bank accounts that use voice authentication (such as Chase's Voice ID), which prompted US senator Sherrod Brown of Ohio, the chair of the US Senate Committee on Banking, Housing, and Urban Affairs, to send a letter to the CEOs of several major banks in May 2023 to inquire about the security measures banks are taking to counteract AI-powered risks.

OpenAI recognizes that the tech might cause trouble if broadly released, so it's initially trying to work around those issues with a set of rules. It has been testing the technology with a set of select partner companies since last year. For example, video synthesis company HeyGen has been using the model to translate a speaker's voice into other languages while keeping the same vocal sound.

To use Voice Engine, each partner must agree to terms of use that prohibit "the impersonation of another individual or organization without consent or legal right." The terms also require that partners acquire informed consent from the people whose voices are being cloned, and they must also clearly disclose that the voices they produce are AI-generated. OpenAI is also baking a watermark into every voice sample that will assist in tracing the origin of any voice generated by its Voice Engine model.

So, as it stands now, OpenAI is showing off its technology, but the company is not yet ready to put itself on the line (yet) for the potential social chaos a broad release might cause. Instead, the company has re-calibrated its marketing approach to appear as if it is warning all of us about this already-existing technology in a responsible way.

"We are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse," the company said in a statement. "We hope to start a dialogue on the responsible deployment of synthetic voices and how society can adapt to these new capabilities. Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale."

In line with its mission to cautiously roll out the tech, OpenAI has provided three recommendations for how society should change to accommodate its technology in its blog post. These steps include phasing out voice-based authentication for bank accounts, educating the public in understanding "the possibility of deceptive AI content," and accelerating the development of techniques that can track the origin of audio content, "so it's always clear when you're interacting with a real person or with an AI."

OpenAI also says that future voice-cloning tech should require verifying that the original speaker is "knowingly adding their voice to the service" and creating a list of voices that are forbidden to clone, such as those that are "too similar to prominent figures." That kind of screening tech may end up excluding anyone whose voice might naturally and accidentally sound too close to a celebrity or US president.

Tech Developed in 2022

According to the company, OpenAI developed its Voice Engine technology in late 2022, and many people have already been using a version of the technology with pre-defined (and not cloned) voices in two ways: The spoken conversation mode in the ChatGPT app released in September and OpenAI's text-to-speech API that debuted in November of last year.

With all the voice-cloning competition out there, OpenAI says that Voice Engine is notable for being a “small” AI model (how small, exactly, we do not know). But having been developed in 2022, it almost feels late to the party. And it may not be perfect in its cloning ability. Previous user-trained text-to-voice models like those from ElevenLabs and Microsoft have struggled with accents that fall outside their training dataset.

For now, Voice Engine remains a limited release to select partners.

This story originally appeared onArs Technica.

Benj Edwards is an AI and Machine Learning Reporter for Ars Technica. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.
    More from WIRED

    Beeper Took On Apple’s iMessage Dominance. Now It’s Been Acquired

    The app that turned green chat bubbles blue has been acquired by Automattic, the parent company of WordPress.com and Tumblr.

    Lauren Goode

    How to Stop Your Data From Being Used to Train AI

    Some companies let you opt out of allowing your content to be used for generative AI. Here’s how to take back (at least a little) control from ChatGPT, Google’s Gemini, and more.

    Matt Burgess

    Eric Schmidt Warned Against China’s AI Industry. Emails Show He Also Sought Connections to It

    Transparency advocates say that Eric Schmidt's pursuit of “personal” connections with AI companies in China represents a concerning conflict of interest.

    Will Knight

    Students Are Likely Writing Millions of Papers With AI

    Turnitin, a service that checks papers for plagiarism, says its detection tool found millions of papers that may have a significant amount of AI-generated content.

    Amanda Hoover

    How I Became a Python Programmer—and Fell Out of Love With the Machine

    When I started coding, I was suspicious of all the abstractions. Then I discovered the Django framework.

    Scott Gilbertson

    The Internet Archive Just Backed Up an Entire Caribbean Island

    By becoming the official custodian of an entire nation’s history for the first time, the Internet Archive is expanding its already outsize role in preserving the digital world for posterity.

    Kate Knibbs

    A Deepfake Nude Generator Reveals a Chilling Look at Its Victims

    WIRED reporting uncovered a site that “nudifies” photos for a fee—and posts a feed appearing to show user uploads. They included photos of young girls and images seemingly taken of strangers.

    Caroline Haskins

    8 Google Employees Invented Modern AI. Here’s the Inside Story

    They met by chance, got hooked on an idea, and wrote the “Transformers” paper—the most consequential tech breakthrough in recent history.

    Steven Levy

    Credit belongs to : www.wired.com

    Check Also

    High-tech London, Ont.-area farm delivers fresh produce all year. Could it be an answer to high grocery costs?

    At a farm north of London, Ont., researchers with Western University are planting the seeds …