Microsoft VALL-E can simulate anyone’s voice with 3 seconds of audio

Microsoft has just unveiled VALL-E (Voice-Aware Language-Learned Encoder-Decoder), a new text-to-speech AI model that can simulate anyone’s voice with just a three-second audio sample. VALL-E is based on Meta’s EnCodec audio compression technology, which employs artificial intelligence to compress high-quality audio to data rates much lower than MP3 files.

Microsoft’s new AI can preserve a speaker’s emotional tone and acoustic environment.

The technology behind VALL-E is groundbreaking, as it allows the model to analyze how a person sounds and then break that information down into discrete components called “tokens.” VALL-E can use this information to match what it “knows” about how that voice would sound if it spoke other phrases besides the three-second sample.

Text-to-speech systems today require high-quality, very clean training data, and it is done in a recording studio with professional equipment. Microsoft has advanced in the field with VALL-E, allowing the model to simulate anyone’s voice using only a three-second sample. VALL-E can now simulate almost anyone’s voice without them having to spend weeks in a studio.

Join GizChina on Telegram

VALL-E’s capabilities were honed using the LibriLight audio library, which contains 60K hours of speech from over 7K speakers. This enables VALL-E to generate realistic-sounding voices in English. When combined with other generative AI models, it has the potential for high-quality text-to-speech applications.

Microsoft has made available a large collection of VALL-E-generated samples, allowing you to hear for yourself. While the results are not perfect, the VALL-E-generated samples sound natural and indistinguishable from the original speaker’s sample.

Despite VALL-impressive E’s capabilities, Microsoft is aware of the technology’s potential for abuse. According to the company, harmful personnel can use audio for malicious purposes such as spoofing voice identification or impersonating. To mitigate these risks, Microsoft suggests developing a detection model to distinguish between synthesized and genuine speech generated by VALL-E.

Finally, VALL-E is a significant advancement in text-to-speech technology. Its ability to simulate anyone’s voice using only a three-second audio sample is revolutionary for various uses. However, Microsoft must continue to improve VALL-E while ensuring that appropriate safeguards are in place to prevent its misuse.

Disclaimer: We may be compensated by some of the companies whose products we talk about, but our articles and reviews are always our honest opinions. For more details, you can check out our editorial guidelines and learn about how we use affiliate links.

Source/VIA :

arsTechnica

Official Galaxy S25 Wallpapers Now Available for Download

Samsung Reveals the Ultra-Thin Galaxy S25 Edge

Samsung Galaxy S25 series India pricing revealed

Goodbye Bixby: Gemini is the new personal assistant on the Galaxy S25

Scykei, the US Brand Designed for Z Generation, Will Make Its Debut at CES 2025

OnePlus Watch 3 Pro to Launch in 2025 alongside the Watch 3

Essential Tips Before Purchasing Your First Smart Ring

Apple Watch Series 10: Bigger Screen, Thinner Design, More Power

AGM PAD T2 Review: A Tablet for Every Outdoor Adventure and More

Honor MagicPad 2 Review: A Stunning Display with Unmatched VFM!

AGM Pad P2 Active Review: Robust Tablet in a Practical Case

Redmi Pad SE 8.7 Leaked Ahead of Launch

NOVOO 100W USB C Charger Review: Compact Power with GaN III Technology

Honor Magic 7 Lite: A “Budget Flagship” That Redefines Value

Honor Magic7 Pro Review: A Robust Flagship Packed with Innovation and AI

What Makes vivo X200 Pro the Ultimate Flagship?

Microsoft VALL-E can simulate anyone’s voice with 3 seconds of audio

Microsoft’s new AI can preserve a speaker’s emotional tone and acoustic environment.

Previous Vivo X90 Series for the Global Market: Official Teaser Is Out!

Next Google Pixel Series Gets Android 13 QPR2 Beta 2 Update

Adeel Younas

Microsoft warns of update issues with Citrix software

How to Install Windows 11 Without the Usual Hassles

Elon Musk Enters the Gaming World: What He Said About Xbox

Microsoft Begins Preview Rollout of Recall Feature for Snapdragon PCs

Snapdragon 8 Elite for Galaxy: The Fastest Mobile Chip with Satellite Connectivity

Official Galaxy S25 Wallpapers Now Available for Download

Samsung Reveals the Ultra-Thin Galaxy S25 Edge

Samsung Galaxy S25 series India pricing revealed

MENU

Microsoft’s new AI can preserve a speaker’s emotional tone and acoustic environment.

Previous Vivo X90 Series for the Global Market: Official Teaser Is Out!

Next Google Pixel Series Gets Android 13 QPR2 Beta 2 Update

Adeel Younas

Related Posts

MENU