OpenAI only needs 15 seconds of audio to clone your voice

Before, an AI needed hours of audio, now it can clone our voice in seconds.

OpenAI only needs 15 seconds of audio to clone your voice
Chema Carvajal Sarabia

Chema Carvajal Sarabia

  • Updated:

OpenAI, who has become famous for creating the world’s most used chatbot called ChatGPT, has announced that their voice cloning technology only requires 15 seconds of audio material to reproduce someone’s voice.


In a post published on their website, OpenAI shared a small-scale preview of a model called Voice Engine, which they have been developing since late 2022.

Voice Engine works by feeding it with a minimum of 15 seconds of spoken material. Then, the user can input text to create what OpenAI describes as “emotive and realistic” speech that “closely resembles the original speaker”.

A gradual and measured release: potential dangers

OpenAI insists that it is taking a “cautious and informed approach to a broader release due to the potential for misuse of synthetic voice,” and adds that it wants to “initiate a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities.”

And he added: “Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether to deploy this technology at scale and how to do it.”

One of the misuse cases that OpenAI refers to is a scam that some criminals are already carrying out using a similar technology that has been available to the public for some time: it consists of cloning a voice and then calling a friend or family member of that person to deceive them into handing over cash through a bank transfer.

There are also concerns that this technology could be used in the upcoming presidential elections, as evidenced by a recent high-profile incident in which an automated call with a clone of President Joe Biden’s voice urged people not to vote in the January primaries in New Hampshire.

The bright side of the coin

Regarding the most positive uses of technology, OpenAI suggests that it could be used to help people who cannot read by providing emotional and natural voices “that represent a wider range of speakers than is possible with pre-set voices”, as well as for instantly translating videos and podcasts, something that Spotify is already testing.

It could also be used to help patients who gradually lose their voice due to illness to continue communicating using what sounds like their own voice.

OpenAI presents on its website some examples of AI-generated audio and reference audio, and we are sure you will agree that they are extraordinary… and terrifying.

Chema Carvajal Sarabia

Chema Carvajal Sarabia

Journalist specialized in technology, entertainment and video games. Writing about what I'm passionate about (gadgets, games and movies) allows me to stay sane and wake up with a smile on my face when the alarm clock goes off. PS: this is not true 100% of the time.

Latest from Chema Carvajal Sarabia

Editorial Guidelines