Global Feed Post Login
Replying to Avatar Juraj

Do you know of any great text to speech models that do intonation well? Open weights. They do not need to clone voices.

I've tried suno bark, but it sometimes hallucinates. I need the reading to be literally what's written. Also tried f5-tts, intonation is not great and the speed varies a lot, so when it's reading multiple texts, the speed of output speech is different between generation. The duration predictor is also not great and sometimes causes cutoffs.

Have I missed something?

English only for now is ok.

Avatar
Phaedrus 1y ago

Eleven labs works great https://elevenlabs.io/ but not sure if the weights are open

Reply to this note

Please Login to reply.

Discussion

Avatar
Juraj 1y ago

They are not and it's crazy expensive for my use case

Thread collapsed