OpenAI reveals Voice Engine, emphasizing concerns over potential misuse

01Apr 2024 by Sebaheddin Qurbanov

OpenAI unveiled its latest project Voice Engine, a voice cloning tool, with an emphasis on ethical considerations and safety measures to prevent potential misuse of the technology, according to Anadolu Agency.

Voice Engine, a model developed by OpenAI, promises to generate natural-sounding speech closely resembling the original speaker using just a text input and a single 15-second audio sample. “It is notable that a small model with a single 15-second sample can create emotive and realistic voices,” stated OpenAI in a blog post.

OpenAI first introduced Voice Engine in late 2022, powering preset voices in text-to-speech API as well as ChatGPT Voice and Read Aloud. However, the organization has taken a cautious approach to its broader release due to concerns over potential misuse.

“We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year,” the San Francisco-based start-up said.

Incarcerated former Pakistani Prime Minister Imran Khan had used artificial intelligence (AI) to declare victory in the country’s general elections in February.

To address these concerns, OpenAI has engaged with the US and a wide array of international partners, including government, media, entertainment, education, and civil society, to gather feedback as it builds.

To mitigate risks, OpenAI has implemented usage policies for partners testing Voice Engine. These policies prohibit the impersonation of other individuals or organizations without consent or legal right, require explicit and informed consent from the original speaker, and mandate clear disclosure to the audience that the voices are AI-generated.

Additionally, safety measures such as watermarking have been put in place to trace the origin of any audio generated by Voice Engine, and proactive monitoring of how it’s being used.

“We believe that any broad deployment of synthetic voice technology should be accompanied by voice authentication experiences that verify that the original speaker is knowingly adding their voice to the service and a no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures.”

Looking ahead, OpenAI emphasizes the importance of societal resilience against the challenges posed by increasingly convincing generative models.

They advocate for measures such as phasing out voice-based authentication for sensitive information, policies protecting individuals’ voices in AI, public education on AI capabilities and limitations, and accelerated development of techniques for tracking the origin of audiovisual content.

“We hope this preview of Voice Engine both underscores its potential and also motivates the need to bolster societal resilience against the challenges brought by ever more convincing generative models,” OpenAI stated.