Local voice for AI agents finally sounds good
I finally found a local voice for my agent that I actually like: Supertonic 3 by Supertone.
I tried Edge TTS. I tried Chatterbox. I tried a few local options. Most were either too robotic, too slow, or too much setup for something I want to use every day.
Supertonic 3 is the first one that felt good enough to make Hermes speak in my Discord voice channel.
Hear it
This is the first sentence of this post, generated locally with Supertonic 3 at 95% speed:
Why use it
For agents, voice quality changes the product feel.
Bad voice makes the agent feel like a toy. Good local voice makes it feel like part of the workflow.
The important part is simple:
- it runs locally
- it sounds natural enough
- it is fast enough for agent replies
- it does not need a paid TTS API for every response
That is the whole point. If your agent can talk back, the voice should not be the worst part of the experience.
Prompt to apply it to your agent
Paste this into your coding agent:
1 | |
For Hermes, I wired it as a command TTS provider and set the default voice to M3 at speed 1.05.
When I would use it
Use this when your agent already works, but the voice makes it feel cheap.
Do not start with voice. Start with a useful agent. Then add a voice that makes people want to keep using it.
Supertonic 3 cleared that bar for me.