VALL E is an online Windows service that contains complex AI algorithms. It allows you to create high quality voice imitations. The model is trained on speech samples from more than 7000 English-speaking people.

Voice generation

This neural net uses completely different speech synthesizing methods compared with traditional programs. The algorithm is capable of noticing subtle voice features such as timbre and emotional tones. Thanks to that, the service can imitate the specific person after processing just a three-second audio fragment.

There are examples of AI work on the official website. Users can listen to audio segments and compare them with traditional speech synthesizers. Additionally, the online database contains samples with varying emotional coloring. The AI is able to say the same phrase with joy, anger, disgust and so on.

Testing period

It is important to note that, unlike Stable Diffusion, the source code of the VALL E algorithm is not yet available in the public domain. For this reason, you cannot generate the voice based on a custom audio file. The close nature of the neural net is linked to concerns that the service could be used for malicious purposes.

Features

free to download and use;
provides a neural net model trained on the speech patterns of real people;
allows you to change the timbre and emotional tonality of the voice;
the source code is not yet public due to various safety concerns;
compatible with all modern versions of Windows.