If you mean transcription to text (speech2text) then try Whisper from OpenAI. It works like charm and is able to add exact timing to enable automatic subtitling.

Resynthesis via AI could bring “human inaccuracy” that makes result more pleasure to listen.

Reply to this note

Please Login to reply.

Discussion

This is how it looks like in the (great restoration and cleanup tool that is) Izotope RX. It transcripts the text and lets you to edit the file base on that in very rudimentary fashion. Which is fine. But once they add the speech synthesys in, it's going to be a different beast. All the video editing apps are going to do the same, adjusting the video to fit. It's only a matter of time.

https://www.youtube.com/watch?v=awNRXYaFAi4