OpenAI Tests New Voice Clone Model
The ChatGPT parent company is testing dramatically improved text-to-speech technology with select developers. Experts warn that realistic voice generation raises serious safety questions.
OpenAI on Friday announced a pilot program for its new custom voice text-to-speech (TTS) offering, called Voice Engine, that will allow users to create realistic speech from text with just a small snippet of audio sample.
In a blog post, the ChatGPT maker says it's currently working with developers to test the newest model in its application programming interface (API), which can take a single 15-second sample of audio to create natural-sounding speech closely matching the original speech. Those developers agreed to a strict usage policy, which prohibits the impersonation of another individual or organization without consent or legal right. Partners must also require explicit informed consent from the original speaker.
In a live demo with InformationWeek, OpenAI Product Lead Jeff Harris showed how a quick live recording of his voice could be used to create a text-to-speech sample that was indistinguishable from his real voice. The whole process took just moments.
The speed and realism of OpenAI’s custom voice TTS will likely be an attractive prospect for many commercial and consumer uses, but it also presents serious risks and challenges. The potential for misuse is profound.
That’s why OpenAI is testing the software first with a select group of developers.
Safety First
AI voice cloning is a serious concern for AI ethics, especially in an election year. US President Joe Biden in his State of the Union address on March 6 called for a ban on AI voice impersonations. Biden’s voice was used in an AI voice-impersonating scam in January that urged New Hampshire primary voters to “save their votes” for the November presidential election.
In February, the Federal Communications Commission (FCC) made AI-generated voices in robocalls illegal under the Telephone Consumer Protection Act.
OpenAI, for its part, says it is moving ahead with its voice cloning model carefully. OpenAI’s blog post called for a broad effort that would phase out voice-based authentication now used widely as a security measure.
“We’re going to start with a limited set of developers and people that we have trusted relationships with and ask them to agree to a pretty comprehensive set of terms that includes things like permission from every speaker whose voice is used and making sure that any generated speech is clearly labeled as AI-generated,” Harris tells InformationWeek. Harris said OpenAI has also developed a “watermarking” system that allows identification of voice recordings generated with its model.
Responsible AI Institute founder Manoj Saxena thinks the pilot program is the right approach, but says more guardrails are needed as AI technology continues to rapidly develop. With hyper-realistic voice generation, a criminal could trick family members into scams or worse. And with an election cycle coming up, concerns about deepfakes used to spread misinformation are growing.