As i’ve heard lots of great stuff about Mastodon i decided to give it a try. Let’s see how it will feel 😊.
If you would like to see me there, please follow my thorstenvoice account.
Latest news on Thorsten-Voice project.
As i’ve heard lots of great stuff about Mastodon i decided to give it a try. Let’s see how it will feel 😊.
If you would like to see me there, please follow my thorstenvoice account.
I’ve released my latest Thorsten-Voice dataset. This one is pronounced in my regional german dialect, which is “(Süd) Hessisch”. This is mainly spoken in the southern region of my german home state “Hessen” and based on neutral textual input.
Feel free to use it to train an AI model using machine learning technology. Free download is available on Zenodo.
More info is available here.
Even though I’ve published some audio examples of my artificial voice here, you might want to try out “my” voice with your own texts.
So I set up a huggingspace area for it. So try it out right now with your own texts in the browser.
Yeah 🥳, the new ThorstenVoice dataset is available for public download. As the previous datasets this is CC0 licensed too so can be used by anyone.
If you use this dataset please cite/quote/reference it using
DOI: 10.5281/zenodo.7265581 – Thank you 😊.
@dataset{muller_thorsten_2022_7265581,
author = {Müller, Thorsten and
Kreutz, Dominik},
title = {ThorstenVoice Dataset 2022.10},
month = oct,
year = 2022,
publisher = {Zenodo},
version = {1.0},
doi = {10.5281/zenodo.7265581},
url = {https://doi.org/10.5281/zenodo.7265581}
}
More information and download link is available on Zenodo.
The existing free german TTS models „Thorsten“ Tacotron2 DDC and VITS are based on my free and newly recorded voice dataset that will be published soon.
It’s name is – totally creative: „Thorsten-22.10„.
You can listen to some samples from that voice dataset, that is recorded for TTS model training.
Number of recordings | 12.432 |
Audio duration | 11+ hours |
Samplerate | 22.050Hz |
Channels | Mono |
Normalization | -24dB |
Speed (Average) | 17,5 Chars / Second |
Example of TTS artificially synthesized phrases with a model based on this voice dataset.
YEARS of passion for open voice tech,
MONTH of recording sessions,
WEEKS of computed training time,
DAYS of audio optimiziation,
HOURS of disillusion.
All for that ONE MOMENT, to share next generation of open “Thorsten-Voice” with the community!
This model is based on a completely new recorded and optimized voice dataset (Thorsten-22.05-neutral).
It’s trained using Coqui 🐸 TTS (for all “TTS-Insiders”, it’s a VITS model).
tl;dr
- pip install tts==0.7.1
- tts-server --model_name tts_models/de/thorsten/vits
- Open webbrowser on http://localhost:5002
Dominik and i are still playing around to provide a new version of “Thorsten” voice to be used with Mycroft installations.
This is the current “work-in-progress” state we are working on
(thx Olaf for supporting us with compute power on HifiGAN training).
After I (again) invested months of my free time for audio recordings (this time with a good microphone and recording setup) and Dominik applied his “audio magic” things really started for both of us.
We have tried (and still try) various configurations, but want to share our current result with you.
Of course, this “Thorsten” model can still be generated offline and is available free of charge under CC0 license.
But how does it sound?
There is no date yet when the model and underlying dataset will be released as the “fine tuning” work is still ongoing. However, we are closer to the goal than to the beginning :-).
We would appreciate your feedback on the current status of the model. Either via the contact form or by email to tm@thorsten-voice.de.