Replying to Avatar Hurvajs Rumcajs

I'm looking forward new AI based tools for sound and music production. The fun quirky ones and also the professional powerful time/grind savers. Not seeing anything all that great at this point though.

The restoration and cleanup tools are the obvious targets. As the noise reduction and re-synthesys are what the all latest AI development is based on. So maybe the next Izotope RX will be really a step forward and not just a meh improvement unless you need some niche thing the added.

What exites me is the possibilities in automatic sample instruments creation. You could play every key of the piano in all expressions you like. Hours of material, a lot of details, a lot of mistakes, some noise etc. Nowadays, that means a ton of postproduction time even when you use templates. AI could just grind through all this, assign everything to the right keys and velocities, expressions... in a matter of minutes. And then make changes if needed.

It doesn't have to stop there. The missing notes could be resynthetized, portamentos generated (for violins for example).

And the next stage could be if you just feed it any record, even with more than one instrument and you'd just ask it to create the instruments out of it.

This could allow for some new AI powered samplers that would use much more complex structures than it would be practical to use if human needs to provide the content for it.

In the end this has the possibility to give the music creators and sound pros much more fun productivity and less of dumb grind time.

I agree, and I look forward to that moment. I would generally divide the use of AI into two groups.

1. Tools that enhance the quality of the recording - removing noise, correcting tuning or tempo... just post-production

2. Tools that develop creativity - you've already hinted at them, I'd still like tools that automatically suggest, for example, a bass line, drums or even a whole harmony to a melody. Although that might be too much :-). We'll see.

Anyway, what I've been longing and waiting for for several years now, as a person not blessed with absolute hearing, is automatic transcription into sheet music with separation of individual instruments. Instruments built on frequency analysis are not able to achieve such accuracy. Some properly trained model could do it. It's exactly the kind of demanding but uncreative activity that AI is made for.

Reply to this note

Please Login to reply.

Discussion

As for your point 2, before that will happen, we can still look to our existing human music generators for help, such as #kumst guys, which is much more fun IMHO

https://youtu.be/aOERKhezW04

In some ways you already have some of this built-in DAWs. So it's just a step away (and nothing really shocking).

What would be helpful tool, based on resynthesys? "Create new tracks from this voice track, one of them just a double of the original and the rest in 3 part harmony". "Ok nice, now let me automate where the voices should be in narrow harmony and where more spread out".

You can do this today with copying the track and tuning it to a different notes, and it works somewhat ok if youk know what you're doing. But the resynthesys would make it more natural sounding.

On the same note, once you can train a voice model on a specific voice (already available for some time), you could have a wave editor that transcribes the audio (already available - isotope rx does this very badly) and you could easily edit what's being said. I'm pretty sure this exists in several prototypes.

Very useful for podcasters even for film industry (the visual version of it already exists). And also of course very scary.

If you mean transcription to text (speech2text) then try Whisper from OpenAI. It works like charm and is able to add exact timing to enable automatic subtitling.

Resynthesis via AI could bring “human inaccuracy” that makes result more pleasure to listen.

This is how it looks like in the (great restoration and cleanup tool that is) Izotope RX. It transcripts the text and lets you to edit the file base on that in very rudimentary fashion. Which is fine. But once they add the speech synthesys in, it's going to be a different beast. All the video editing apps are going to do the same, adjusting the video to fit. It's only a matter of time.

https://www.youtube.com/watch?v=awNRXYaFAi4