Subnostr

I agree, and I look forward to that moment. I would generally divide the use of AI into two groups.

1. Tools that enhance the quality of the recording - removing noise, correcting tuning or tempo... just post-production

2. Tools that develop creativity - you've already hinted at them, I'd still like tools that automatically suggest, for example, a bass line, drums or even a whole harmony to a melody. Although that might be too much :-). We'll see.

Anyway, what I've been longing and waiting for for several years now, as a person not blessed with absolute hearing, is automatic transcription into sheet music with separation of individual instruments. Instruments built on frequency analysis are not able to achieve such accuracy. Some properly trained model could do it. It's exactly the kind of demanding but uncreative activity that AI is made for.

Hurvajs Rumcajs 2y ago

In some ways you already have some of this built-in DAWs. So it's just a step away (and nothing really shocking).

What would be helpful tool, based on resynthesys? "Create new tracks from this voice track, one of them just a double of the original and the rest in 3 part harmony". "Ok nice, now let me automate where the voices should be in narrow harmony and where more spread out".

You can do this today with copying the track and tuning it to a different notes, and it works somewhat ok if youk know what you're doing. But the resynthesys would make it more natural sounding.

On the same note, once you can train a voice model on a specific voice (already available for some time), you could have a wave editor that transcribes the audio (already available - isotope rx does this very badly) and you could easily edit what's being said. I'm pretty sure this exists in several prototypes.

Very useful for podcasters even for film industry (the visual version of it already exists). And also of course very scary.

Reply to this note

Please Login to reply.

Discussion

Kuba 2y ago

If you mean transcription to text (speech2text) then try Whisper from OpenAI. It works like charm and is able to add exact timing to enable automatic subtitling.

Resynthesis via AI could bring “human inaccuracy” that makes result more pleasure to listen.

Hurvajs Rumcajs 2y ago

This is how it looks like in the (great restoration and cleanup tool that is) Izotope RX. It transcripts the text and lets you to edit the file base on that in very rudimentary fashion. Which is fine. But once they add the speech synthesys in, it's going to be a different beast. All the video editing apps are going to do the same, adjusting the video to fit. It's only a matter of time.

https://www.youtube.com/watch?v=awNRXYaFAi4