Kategorie: Allgemeines Blabla

Hier findest du (unregelmäßig) Neuigkeiten rund um offene Sprachtechnologien im Allgemeinen und/oder Thorsten-Voice im Speziellen 💛.

Allgemeines Blabla

Freie KI-Stimmen für eine souveräne digitale Zukunft

Beitragsautor Von Thorsten Müller
Beitragsdatum 13. April 2025

Ob Sprachassistenten, Vorlesefunktionen im Zuge der Barrierefreiheit oder Content für Social Media – immer häufiger kommen synthetische KI-Stimmen (Text-to-Speech) zum Einsatz. Doch die meisten dieser Technologien stammen von großen Konzernen aus den USA oder China. Sie sind meist proprietär, intransparent – und machen uns abhängig von Diensten, die sich jederzeit ändern, abgeschaltet oder kostenpflichtig werden können.

Das Thorsten-Voice-Projekt setzt bewusst einen anderen Akzent.

Digitale Souveränität beginnt bei der Stimme

Wer Sprachsynthese nutzt, sollte frei entscheiden können, wie und wo sie eingesetzt wird – und wem man dabei vertraut.
Thorsten-Voice bietet genau das: eine hochwertige, deutschsprachige KI-Stimme, die vollständig Open Source, uneingeschränkt nutzbar und kostenfrei verfügbar ist.
Ohne Registrierung, ohne Lizenzbindung, ohne Cloud-Zwang.

Die Stimme „Thorsten“ kann lokal genutzt, weiterentwickelt oder in eigene Projekte integriert werden – ob als Vorlesestimme, für Lernplattformen, im Bildungsbereich, in der Forschung, in der Verwaltung (gerade auch im Zuge der Verwaltungsdigitalisierung ein Thema) oder für inklusive Anwendungen.

Technologie muss nicht exklusiv, teuer oder intransparent sein.
Thorsten-Voice zeigt, dass hochwertige Sprachsynthese auch frei und offen sein kann – für alle.

Gerne hier direkt ausprobieren 😊.

Allgemeines Blabla

Thorsten-Voice in den Medien

Beitragsautor Von Thorsten Müller
Beitragsdatum 13. April 2025

Ich freue mich sehr – und bin ehrlich dankbar –, dass das
Thorsten-Voice-Projekt in den vergangenen Monaten in verschiedenen Medien aufgegriffen wurde.
Von Fachzeitschriften über Blogs bis hin zur Tagespresse: Die Resonanz zeigt, dass das Interesse an freier Sprachsynthese und digitaler Souveränität wächst – und das motiviert mich enorm, den Weg weiterzugehen.

👉 Auf der überarbeiteten Medienseite findest du jetzt eine übersichtliche Liste aller bisherigen Beiträge – inklusive Links zu Artikeln, Podcasts und Printberichten.

Ich habe die Aktualisierung auch auf LinkedIn geteilt – vielleicht magst du reinschauen oder den Beitrag weiterleiten.

Bei Interesse an Interviews, Berichten oder allgemeinen Fragen zu Thorsten-Voice freue ich mich jederzeit über eine Nachricht per Kontaktformular. Insbesondere, wenn es um digitale Souveränität im Bereich hochwertiger Sprachausgabe geht, bin ich offen für Austausch!

Allgemeines Blabla

Coqui und Python > 3.11

Beitragsautor Von Thorsten Müller
Beitragsdatum 26. März 2025

Guude 👋🏼.

Da Coqui AI bereits Anfang 2024 geschlossen hat, wird ihre Open Source TTS Lösung Coqui TTS im entsprechenden Github Projekt nicht weiter gepflegt 😥. Dies zeigt sich jetzt bei der Abhängigkeit zur Python Version. So funktioniert das offizielle Coqui TTS Paket nur bis Python Version 3.11. Schon ab 3.12 lässt sich das Paket nicht mehr installieren.

Glücklicherweise gibt es einen Fork bei Github, welches die Lauffähigkeit auch bei neuerem Python Versionen ermöglicht 🥳.

Ich habe die Dokumentation entsprechend angepasst und hoffe, dass meine Thorsten-Voice Coqui Modelle so noch einige Zeit funktionieren werden.

Mittelfristig kann ich aber einen Wechsel zu meinen Piper TTS Modellen empfehlen. Die gibt es nicht nur in …

Hochdeutsch
sondern auch in emotionaler Betonung
und in charmantem südhessischen Dialekt

Ich wünsche euch ganz viel Spaß mit meinen „Thorsten-Voice’s“ 😊

Allgemeines Blabla

Artikel bei Netzpolitik.org

Beitragsautor Von Thorsten Müller
Beitragsdatum 22. März 2025

Es hat mich sehr gefreut, dass Netzpolitik.org einen Artikel zu meinem Thorsten-Voice Projekt veröffentlicht haben.

Dieser Mann hat seine Stimme verschenkt
https://netzpolitik.org/2025/text-to-speech-dieser-mann-hat-seine-stimme-verschenkt/

Allgemeines Blabla

📰 Wetterauer Zeitung: „Seine Stimme gehört jetzt allen“

Beitragsautor Von Thorsten Müller
Beitragsdatum 26. Januar 2025

Ich fühle mich geehrt, dass die Wetterauer-Zeitung, die Frankfurter Neue Presse sowie die Frankfurter Rundschau einen Artikel über mein Thorsten-Voice Projekt veröffentlicht haben.

„Seine Stimme gehört jetzt allen“
Wetterauer-Zeitung

Der Artikel der Wetterauer-Zeitung kann hier online gelesen werden.

Vielen Dank an Julian Wessel für das sehr angenehme Interview.

Quelle: Wetterauer-Zeitung

Möchtet ihr auch mehr über das Projekt und meine Hintergründe erfahren? Gerne freue ich mich über Presse- und Interviewanfragen per Kontaktformular.

Allgemeines Blabla

Your AI Voice Sounds WRONG! Here’s Why 🤖 → 🗣️

Beitragsautor Von Thorsten Müller
Beitragsdatum 9. Januar 2025

Transform your Text-to-Speech output from robotic to natural-sounding with proper text preprocessing (cleaning / normalization). My Youtube step-by-step tutorial shows you how to handle numbers, abbreviations, and special characters to significantly improve your TTS quality. This works for ANY TTS, not just fancy AI based text-to-speech models, but espeak / mbrola, too.

Video Tutorial

Why Text Cleaning Matters

When feeding text into a TTS system, certain elements can cause unnatural speech patterns:

Abbreviations like „Dr.“ or „Mr.“ are interpreted as sentence endings
Numbers are read digit by digit instead of naturally
Special characters and symbols may cause unexpected pauses
Time formats and dates might be misinterpreted

„Bad“ text input to TTS: „Dr. Smith paid $1,234 for 2 items at 3pm after waiting outside at 72°F on may, 15th, 2024. While waiting for the train to arrive at 15:45 he called a support hotline at 1-800-555-0123.„

Text NOT cleaned / normalized and spoken with Piper TTS.

This is hard for most TTS systems, because it contains lots of special characters that are hard to pronounce correctly for TTS.

„Better“ text input keeping the same sentence: „Doctor Smith paid one thousand two hundred thirty-four dollars for two items at three p m after waiting outside at seventy-two degrees Fahrenheit on May fifteenth, twenty twenty-four. While waiting for the train to arrive at fifteen forty-five he called a support hotline at one eight hundred five five five zero one two three.„

Text CLEANED / NORMALIZED and spoken with Piper TTS.

The Solution: Text Preprocessing

Below you’ll find a Python script that handles common text cleaning tasks. It works with any TTS system, including Piper, Coqui, eSpeak, and others.

Features:

Converts numbers to words (e.g., „123“ → „one hundred twenty-three“)
Expands common abbreviations
Handles time formats
Processes dates naturally
Converts temperatures and units
Supports multiple languages (configurable)

Download the Script

The script is on my Thorsten-Voice GitHub repository.

Usage Example

I created a jupyter notebook on Google Colab to show the concept of building your voice processing pipeline including text cleaning / normalization.

It uses NVIDIA NeMo framwork for text cleaning and Piper for text-to-speech.

The notebook can be found here and will be explained in my Youtube tutorial here.

Community & Support

Found a bug or have suggestions? Open an issue on GitHub
Questions? Comment below or on the YouTube video

Remember to subscribe to my Thorsten-Voice YouTube channel for more TTS tutorials and updates!

Allgemeines Blabla

#MyYearOnYouTube2024

Beitragsautor Von Thorsten Müller
Beitragsdatum 22. Dezember 2024

YouTube Success: Thorsten-Voice Celebrates a Remarkable Year 2024 in AI and Language Technology

2024 has been an exceptional year for my Thorsten-Voice YouTube channel, marking significant growth in our AI and language technology community. With over 355,000 views and 3,738 new subscribers, i’ve seen unprecedented engagement in open-source AI discussions and tutorials.

2024 YouTube Statistics Highlight Community Growth

My channel’s performance reflects the growing interest in AI and language technology:

355K total video views
3,738 new subscribers
34 in-depth uploads
6,708 likes showing content appreciation
898 engaging comments
2,077 shares expanding our reach

Building a Strong AI Technology Community

The numbers tell a story of community engagement and knowledge sharing. Each of the 34 uploads sparked discussions about open-source AI, language models, and their practical applications. The nearly 900 comments represent valuable exchanges and learning opportunities within our community.

Looking Forward to 2025

As we approach 2025, Thorsten-Voice remains committed to providing high-quality content about AI voice technology, open-source developments, and language processing innovations. Our growing community of developers, researchers, and AI enthusiasts continues to drive meaningful discussions and knowledge sharing.

Join my AI Voice Technology Journey

Whether you’re a developer, researcher, or AI enthusiast, we invite you to join our community. Subscribe to Thorsten-Voice on YouTube to stay updated with the latest in AI and language technology developments.

Allgemeines Blabla

Home Assistant Voice Preview Edition

Beitragsautor Von Thorsten Müller
Beitragsdatum 22. Dezember 2024

NEW VIDEO SERIES: The smart home community has long awaited a reliable, privacy-focused voice assistant solution. With the release of Home Assistant Voice Preview Edition, this wait might finally be over. I’m excited to present my comprehensive tutorial series that guides you through everything you need to know about this promising new device.

What’s This Series About?

This series walks you through the Home Assistant Voice Preview Edition from unboxing to advanced setup. Whether you’re new to Home Assistant or an experienced user, these tutorials will help you understand and implement voice control in your smart home setup.

Available Episodes

Episode 1: Unboxing & Tech Specs

In this first episode, we dive into the unboxing experience and examine the technical specifications of the Home Assistant Voice Preview Edition. Get your first look at the hardware and learn what makes it tick. Watch Episode 1

Episode 2: First Setup & Connection

The second episode guides you through the initial setup process. Learn how to power on the device and connect it to your Home Assistant installation. We’ll also explore the entities created during setup. Watch Episode 2

Episode 3: Local Setup with Whisper & Piper

In this episode, we tackle local voice processing setup using Whisper for speech recognition and Piper for speech synthesis. Perfect for those who want complete privacy and local control. Watch Episode 3

What’s Next?

I’m committed to creating more content based on community feedback. If you have specific aspects of Home Assistant Voice you’d like to learn more about, please:

Subscribe to my Thorsten-Voice Youtube channel
Leave your suggestions in the video comments
Share your experiences with the device

Stay tuned for more tutorials as we explore the capabilities of Home Assistant Voice!

Want to catch every new episode? Subscribe to my YouTube channel for the latest updates.

Allgemeines Blabla

F5 TTS | Local Voice Cloning

Beitragsautor Von Thorsten Müller
Beitragsdatum 9. November 2024

My step by step tutorial on F5 TTS or Text-to-Speech is now available on Youtube.

Containing following chapters:

Overview on license and supported languages
Using a Huggingface space to try things out
Installing F5 TTS locally on your computer
Using F5 locally to do voice cloning with just 10 seconds of audio input

We can not just clone our voice, but we can use multiple emotional inputs to have really fun and enjoy playing around with dialogues.

Here’s a sample that has been created with F5 and just a few seconds of audio input of my personal voice.

Allgemeines Blabla

🎉 Celebrating Thorsten-Voice’s 5th Birthday! 🎙️

Beitragsautor Von Thorsten Müller
Beitragsdatum 20. Oktober 2024

Since October 2019, Thorsten-Voice has been supporting the open-source voice technology community. As a birthday gift to our amazing community, I’m releasing all voice datasets (neutral, emotional, and Hessisch) in their original 44kHz sample rate quality – a significant upgrade from the previous 22kHz versions.

🎯 What’s New:

• All recordings now available in pristine 44kHz quality

• Complete collection unified in one place on Hugging Face

• Includes all variants: neutral, emotional, and Hessisch dialects

• Fully structured and transcribed

This consolidated release makes it easier than ever to access and work with the complete Thorsten-Voice collection. As always, everything remains under CC0 license, continuing our commitment to unrestricted open-source voice technology.

🔗 Access the unified dataset: https://huggingface.co/datasets/Thorsten-Voice/TV-44kHz-Full

#OpenSource #AI #SpeechTechnology #TTS #MachineLearning #GermanTTS #VoiceTechnology

Thank you for being part of this journey! Let’s build the future of voice technology together! 🚀