Advertisement

News

Meet Vall-E, an AI that can instantly mimic human speech

Meet Vall-E, an AI that can instantly mimic human speech
Russell Kidson

Russell Kidson

  • Updated:

Even though there are fears that our current interpretation of AI technology is due to hit a ceiling around 2026, the industry is still evolving with every new initiative released. The latest initiative is Microsoft’s Vall-E, and it can replicate and mimic human speech in a matter of seconds. 

The impressive part here is that this isn’t the first time that a company has attempted to create an AI that can mimic human speech. However, previous attempts have routinely proven how difficult and time-consuming such an enterprise is. The core issue seems to be that it takes far too long for these systems to learn individual voices, not to mention the vocal intricacies that each person instinctively employs. 

Microsoft has done something truly remarkable here. Vall-E has astonished almost the entire tech community with the way that Vall-E is able to replicate and mimic human speech in an incredibly short time. In fact, it only takes a few seconds. When we say a few seconds, we truly do mean no more than a few seconds. On average, Vall-E only needs around 3 seconds of speech to be able to replicate someone’s voice, intonation, and general vocal idiosyncrasies. These few seconds of speech required to train Vall-E in replicating a human voice has gone down as the smallest sample size that the industry has ever seen. 

Meet Vall-E, an AI that can instantly mimic human speech

If you’re interested, researchers at Cornell University recently released a paper on how Vall-E works. The paper also breaks down all the differences between Vall-E and other text-to-speech synthesizers. 

Here is an excerpt from the paper that’ll impress science and technology wards. ‘Large-scale data crawled from the Internet cannot meet the requirement, and always lead to performance degradation. Because the training data is relatively small, current TTS systems still suffer from poor generalization. Speaker similarity and speech naturalness decline dramatically for unseen speakers in the zero-shot scenario.’

‘VALL-E significantly outperforms the state-of-the-art zero-shot TTS system [Casanova et al., 2022b] in terms of speech naturalness and speaker similarity, with +0.12 comparative mean option score (CMOS) and +0.93 similarity mean option score (SMOS) improvement on LibriSpeech. VALL-E also beats the baseline on VCTK with +0.11 SMOS and +0.23 CMOS improvements.’
In simple terms, very smart researchers at Cornell University have found a way to do something that was thought to be relatively impossible. As GHacks recently reported, Apple Books has released an AI tool that can turn any book into an audiobook.

However, the utility has faced harsh criticism over the way the tool sounds. I listened to the utility at work and found it pleasing, but others have not been as kind. The release of a tool like Vall-E, however, could possibly revolutionize the audiobook industry and intensify the good work that Apple has initiated.

Russell Kidson

Russell Kidson

I hail from the awe-inspiring beauty of South Africa. Born and raised in Pretoria, I've always had a deep interest in local history, particularly conflicts, architecture, and our country's rich past of being a plaything for European aristocracy. 'Tis an attempt at humor. My interest in history has since translated into hours at a time researching everything from the many reasons the Titanic sank (really, it's a wonder she ever left Belfast) to why Minecraft is such a feat of human technological accomplishment. I am an avid video gamer (Sims 4 definitely counts as video gaming, I checked) and particularly enjoy playing the part of a relatively benign overlord in Minecraft. I enjoy the diverse experiences gaming offers the player. Within the space of a few hours, a player can go from having a career as an interior decorator in Sims, to training as an archer under Niruin in Skyrim. I believe video games have so much more to teach humanity about community, kindness, and loyalty, and I enjoy the opportunity to bring concepts of the like into literary pieces.

Latest from Russell Kidson

Editorial Guidelines