This artificial intelligence model is developed by Microsoft. With its help users can generate realistic human speech with various emotional undertones.
VALL E is an online Windows service that contains complex AI algorithms. It allows you to create high quality voice imitations. The model is trained on speech samples from more than 7000 English-speaking people.
This neural net uses completely different speech synthesizing methods compared with traditional programs. The algorithm is capable of noticing subtle voice features such as timbre and emotional tones. Thanks to that, the service can imitate the specific person after processing just a three-second audio fragment.
There are examples of AI work on the official website. Users can listen to audio segments and compare them with traditional speech synthesizers. Additionally, the online database contains samples with varying emotional coloring. The AI is able to say the same phrase with joy, anger, disgust and so on.
It is important to note that, unlike Stable Diffusion, the source code of the VALL E algorithm is not yet available in the public domain. For this reason, you cannot generate the voice based on a custom audio file. The close nature of the neural net is linked to concerns that the service could be used for malicious purposes.
- free to download and use;
- provides a neural net model trained on the speech patterns of real people;
- allows you to change the timbre and emotional tonality of the voice;
- the source code is not yet public due to various safety concerns;
- compatible with all modern versions of Windows.