In some ways you already have some of this built-in DAWs. So it's just a step away (and nothing really shocking).

What would be helpful tool, based on resynthesys? "Create new tracks from this voice track, one of them just a double of the original and the rest in 3 part harmony". "Ok nice, now let me automate where the voices should be in narrow harmony and where more spread out".

You can do this today with copying the track and tuning it to a different notes, and it works somewhat ok if youk know what you're doing. But the resynthesys would make it more natural sounding.

On the same note, once you can train a voice model on a specific voice (already available for some time), you could have a wave editor that transcribes the audio (already available - isotope rx does this very badly) and you could easily edit what's being said. I'm pretty sure this exists in several prototypes.

Very useful for podcasters even for film industry (the visual version of it already exists). And also of course very scary.

Reply to this note

Please Login to reply.

Discussion

If you mean transcription to text (speech2text) then try Whisper from OpenAI. It works like charm and is able to add exact timing to enable automatic subtitling.

Resynthesis via AI could bring “human inaccuracy” that makes result more pleasure to listen.

This is how it looks like in the (great restoration and cleanup tool that is) Izotope RX. It transcripts the text and lets you to edit the file base on that in very rudimentary fashion. Which is fine. But once they add the speech synthesys in, it's going to be a different beast. All the video editing apps are going to do the same, adjusting the video to fit. It's only a matter of time.

https://www.youtube.com/watch?v=awNRXYaFAi4