Is there a FOSS app that takes microphone speech input, transcribes it, and then edits it for readability and coherence?

Reply to this note

Please Login to reply.

Discussion

Here's the answer my AI gave me, but I haven't checked for accuracy yet:

Yes, there are several open-source apps available that can perform speech-to-text transcription and editing. Here are some options:

1. OpenMic: OpenMic is an open-source audio and video recording tool with built-in speech-to-text functionality. It supports multiple languages and can be trained to recognize specific voices. The transcribed text can then be edited using the app's built-in editor.

2. Sox: Sox is a command-line tool for editing and manipulating audio files. While it doesn't have a built-in speech-to-text function, it can be used in conjunction with other tools to transcribe and edit speech recordings. For example, you can use Sox to trim or silence parts of an audio file after transcribing it using a tool like Festival or Atril.

3. Festival: Festival is a free, open-source speech recognition system developed at the University of Edinburgh. It can be used to transcribe spoken words in multiple languages and has a built-in editor for editing the transcripts. Festival also includes tools for training and customizing the system's performance.

4. Atril: Atril is another open-source speech recognition tool that supports several languages and has a built-in editor for editing transcripts. It can be used to create closed captions or subtitles for video files, as well as transcribe spoken words.

5. Kaldi: Kaldi is an open-source toolkit for speech recognition, language recognition, and speaker identification. While it doesn't have a built-in editor, it can be used in conjunction with other tools to transcribe and edit speech recordings. Kaldi supports multiple languages and has been used in various applications, including voice assistants and language learning apps.

These are just a few examples of the many FOSS (Free and Open-Source Software) apps available for speech recognition and editing. Each has its own strengths and weaknesses, so you may want to try out a few options to see which one works best for your needs.

Ask your LLM to give you more details and links to the source code for all these. I predict the vast majority will be broken/fake links.

Here's what links it spit out:

Certainly! Here are the links to the source code for each of the open-source speech recognition and editing tools I mentioned earlier:

1. OpenMic - The source code for OpenMic is available on GitHub at <https://github.com/openmic/openmic>.

2. Sox - Sox is a command-line tool, so it doesn't have a source code repository in the classical sense. However, you can find the source code for Sox on the official website at <https://sox.sourceforge.net/>.

3. Festival - The source code for Festival is available on GitHub at <https://github.com/festival/festival>.

4. Atril - The source code for Atril is available on GitHub at <https://github.com/atrill-project/atrill>.

5. Kaldi - The source code for Kaldi is available on GitHub at <https://github.com/kaldi-project/kaldi>.

I hope this helps! Let me know if you have any questions or need further assistance.

It seems the Sox one is valid and Kaldi is at kaldi-asr/kaldi but the others are not correct

Transcribal is quite fast actually. It's a pure text to speech engine though, so it doesn't do any editing of the text.

Transcribro is whisper on android, works nice, but no llm.

One day, you'll be able to input this into Zapstore (or it will pick it up from kind 1 notes?) and get an offer in sats.

Zap back, app produced, checked (builder reputation, build reproducibility, privacy, malware, etc) and ready to install without leaving the UI

nostr:nevent1qqsdrh55fxv05ghn8s06ch6y0qaz7maahvhru2esfeggdvake8tyqvgprpmhxue69uhkv6tvw3jhytnwdaehgu3wwa5kuef0qgst0mtgkp3du662ztj3l4fgts0purksu5fgek5n4vgmg9gt2hkn9lqrqsqqqqqpp396qg

Any notes app + FUTO keyboard with Voice Input might yield some results if you're on Android. Then pipe results into Ollama for cleanup.

(I've only tested this once, it's still a long way until Google Recorder's quality is within reach.)