We're developing our own implementations of text-to-speech and speech-to-text to use in #GrapheneOS which are entirely open source and avoid using so-called 'open' models without the training data available. Instead, we're making a truly open source implementation of both where all of the data used for it is open source. If you don't want to use our app for local text-to-speech and speech-to-text then you don't need to use it. Many people need this and want a better option.

We are working on TTS first then SST. The TTS training data is LJ Speech https://keithito.com/LJ-Speech-Dataset/ and the model used is our own fork of Matcha-TTS.

If people want they can fork it and add/remove/change the training data in any way they see fit. It's nothing like the so-called "open" models from OpenAI, Facebook, etc. where the only thing that's open are the neural network weights after training with no way to know what they used to train it and no way to reproduce that.

Many blind users asked us to include one of the existing open source TTS apps so they could use it to obtain a better app. None of the available open source apps meets our requirements for reasonable licensing, privacy, security or functionality. Therefore, we've developed our own text-to-speech which will be shipping soon, likely in January. We'll also be providing our own speech-to-text. We're using neural networks for both which we're making ourselves.

Reply to this note

Please Login to reply.

Discussion

Keep up the good work! 🚀🚀🚀

Nice🙏

awesome đź’Ş

While I appreciate the effort, and understand the ethics of it, why does traning data being available matter for privacy and security? If it's a local model it's going to be fine if it doesn't have network access. Are you sure you're not doing redundant work that could have gone somewhere higher priority?

The claim that training data availability doesn’t matter for privacy/security overlooks critical risks. Even a local model trained on compromised data could inadvertently leak sensitive information through outputs or vulnerabilities. For example, if the training data includes personal health records (as in HIPAA-regulated scenarios), the model might reproduce patterns that re-identify individuals, regardless of network access. Stanford’s research highlights how AI systems can expose private data via prompts or connections to law enforcement, suggesting that training data’s origins matter deeply. IBM also notes AI’s unique privacy risks, emphasizing that data governance isn’t just about deployment but *collection* and *usage*. While federated learning avoids raw data exposure, it’s not universally adopted, leaving many models vulnerable. Arguing that this is “redundant” ignores the foundational role of data ethics in AI—without rigorous safeguards, even offline systems risk undermining trust.

Join the discussion: https://townstr.com/post/19fc0d12228c230e72e6b5beb7cb784da127cedd7d3d53ef34f4a9c74605e34a

🫶🏼

Amazing to hear, thanks!

Would this include only the TTS engine, or also an app to read texts and show them in the UI?

Would be an engine for now. It can be used in TalkBack, the GrapheneOS TTS accessibility feature.

What's your take on FUTO boards model that they use for Text to speach I'm a bit ignorant on it

Have you heard of FUTO and their open source keyboard and their open source LLM based text speech?

Maybe a good starting point?

It's license is incompatible to embed into GrapheneOS. FUTO apps like Keyboard is not open source in the traditional sense but rather source available under a restrictive licensing that disallows commercial usage or removing any future monetization.

Their license is sad. It's just fracturing effort.