Text-to-speech
NobodyWho can synthesize text to WAV bytes with the Tts class.
Two backends are available:
kokoro: lightweight 24 kHz speech synthesis.supertonic: multi-stage ONNX speech synthesis with voice styles.
Models are downloaded from Hugging Face on first use and cached locally.
backend can be omitted for known official sources, including the examples below. For local directories or custom sources, pass backend="kokoro" or backend="supertonic".
Kokoro
Read more about Kokoro on its official Hugging Face page. NobodyWho currently supports Kokoro language codes en-us, en-gb, es, fr, it, and pt-br.
from pathlib import Path
from nobodywho import Tts
tts = Tts(
source="NobodyWho/Kokoro-82M",
voice="bf_emma",
language="en-gb",
)
wav = tts.synthesize("Hello from NobodyWho!")
Path("out.wav").write_bytes(wav)
For Kokoro, set voice and language together. They must agree with the model's available voices.
Supertonic
Read more about Supertonic on its official Hugging Face page, including supported languages and voice styles.
from pathlib import Path
from nobodywho import Tts
tts = Tts(
source="Supertone/supertonic-3",
)
wav = tts.synthesize("Hello from NobodyWho!")
Path("out.wav").write_bytes(wav)
By default, Supertonic uses voice="M1" and language="en". The upstream model includes voice styles M1–M5 and F1–F5.
Most users can start with the defaults. Optional settings include:
voice: voice style, e.g.M1orF1. Defaults toM1.language: input language code. Defaults toen.speed: speech speed multiplier. Values above1.0are faster. Defaults to1.05.steps: denoising steps. Higher can improve quality but is slower. Defaults to8.silence_duration: silence inserted between long text chunks. Defaults to0.3seconds.
Async
wav = await tts.synthesize_async("Hello!")
Device selection
By default, NobodyWho uses device="auto". You can also choose cpu or cuda:
tts = Tts(
source="Supertone/supertonic-3",
device="cpu",
)