Open-source text-to-speech powered by QWEN3-TTS
**→ The app is already online. You can try it at anyspeak.ai and check repo on anyspeak-ai · GitHub
**
Features
-
TTS: Multiple voices and Qwen3-TTS models (1.7B CustomVoice, VoiceDesign, Base). Supported languages: Chinese, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish.
-
Speaker tags: Use
[SPEAKER_NAME]in text for multi-speaker output. -
Mood tags: Use
[SPEAKER:mood](e.g.[VIVIAN:happy]) for emotional expressiveness. -
Timing tags: Use
[SPEAKER:mood:timing](e.g.[RYAN:+1.5]for 1.5 s pause after,[VIVIAN:happy:-0.3]for shorter gap or even speaker overlapping) or the + TIMING button to control pacing between segments. -
Custom voices: Voice Design by selecting attributes (gender, language, old/young, slow/fast, high/low, loud/soft, warm/rough).
-
CLEAN: Analyze and clean text for better TTS output.
-
CHECK: Quality check with Whisper (compare original vs. synthesized).
-
IMPROVE: Post-process generated audio by chunk: selectively regenerate mispronounced segments, trim silence, add pauses, or edit text and re-generate. Compare original vs. improved before applying. You can also edit finished MP3s afterward (e.g. different wording or different speaker at a specific place) without regenerating everything.
-
VIDEO: Create video from audio and images (with optional subtitles).
-
Load & Save (MP3): Load and save via MP3. Very powerful: you can continue working at the same place, and files store the original text with tags in their metadata.
-
Import: Load text from URL or file.
-
Run local - easy installation