What is a dataset
A dataset, at least in TTS (text-to-speech) is a combination from:
- WAVE-recorded files (one per sentence)
- A CSV file for transcribing texts and mapping the recorded audio file.
The best-known format is the LJSpeech format and serves as the de-facto standard in the TTS area. All “Thorsten” datasets are freely available in this format.
Why do i need a dataset?
It depends. Do you just want to text with the available TTS model? Yes? Then the simple answer is “not at all”.
However, if you would like to train your own TTS model based on my recordings and would like to experiment with (felt) 1,000 parameters? Then one or both of my datasets is a good basis for this.
Audio samples
These are samples from all available voice datasets.
Thorsten-Voice Dataset 2021.02 (Neutral)
Anzahl Aufnahmen | 22.668 |
Audiodauer | 23+ Stunden |
Samplerate | 22.050Hz |
Kanäle | Mono |
Normalisierung | -24dB |
Satzlänge (min/avg/max) | 2 / 52 / 180 Zeichen |
Sprechgeschwindigkeit (Durchschnitt) | 14 Zeichen / Sekunde |
Fragesätze | 2.780 |
Ausrufesätze | 1.840 |
@dataset{muller_thorsten_2021_5525342,
author = {Müller, Thorsten and
Kreutz, Dominik},
title = {Thorsten - Open German Voice (Neutral) Dataset},
month = feb,
year = 2021,
note = {{Please use it to make the world a better place for
whole humankind.}},
publisher = {Zenodo},
version = {3.0},
doi = {10.5281/zenodo.5525342},
url = {https://doi.org/10.5281/zenodo.5525342}
}
Download: https://zenodo.org/record/5525342
Thorsten-Voice Dataset 2021.06 (Emotional)
The emotional dataset consists of 300 distinct sentences. Each of them is spoken by me in the following eight emotions.
- Neutral
- Disgusted
- Furious
- amused
- Surprised
- Sleepy
- whispering
- Drunk (i was sober during the recording)
Anzahl Aufnahmen | 2.400 |
Samplerate | |
Kanäle | Mono |
Normalisierung | -24dB |
Satzlänge (min/max) | 59 / 148 Zeichen |
@dataset{muller_thorsten_2021_5525023,
author = {Müller, Thorsten and
Kreutz, Dominik},
title = {Thorsten - Open German Voice (Emotional) Dataset},
month = jun,
year = 2021,
note = {{Please use it to make the world a better place for
whole humankind.}},
publisher = {Zenodo},
version = {2.0},
doi = {10.5281/zenodo.5525023},
url = {https://doi.org/10.5281/zenodo.5525023}
}
Download: https://zenodo.org/record/5525023
Thorsten-Voice Dataset 2022.10 (Neutral)
Number of recordings | 12.432 |
Audio duration | 11+ hours |
Samplerate | 22.050Hz |
Channels | Mono |
Normalization | -24dB |
Speed (Average) | 17.5 Chars / Second |
@dataset{muller_thorsten_2022_7265581,
author = {Müller, Thorsten and
Kreutz, Dominik},
title = {ThorstenVoice Dataset 2022.10},
month = oct,
year = 2022,
publisher = {Zenodo},
version = {1.0},
doi = {10.5281/zenodo.7265581},
url = {https://doi.org/10.5281/zenodo.7265581}
}
Check my release video about this dataset.
More information and download: https://zenodo.org/record/7265581
Thorsten-Voice Dataset 2023.09 (Hessisch)
Number of recordings | 2.108 |
Audio duration | ca. 2 Stunden |
Samplerate | 22.050Hz |
Channels | Mono |
Normalization | -24dB |
Download: https://zenodo.org/record/5525342
@dataset{muller_2024_10511260,
author = {Müller, Thorsten and
Kreutz, Dominik},
title = {Thorsten-Voice Dataset 2023.09 Hessisch},
month = jan,
year = 2024,
publisher = {Zenodo},
doi = {10.5281/zenodo.10511260},
url = {https://doi.org/10.5281/zenodo.10511260}
}