It would take a totally new way of doing things.
There is a nip which supports the advanced features needed, but no one uses it afaik. It would remove kind 1 compatibility, meaning most clientd would not be able to see them anymore. It would also require larger video files for multiple audio streams, and more time spent sourcing and encoding.
I mostly do this for my own benefit as a hobby. Taking into account my time and the hosting I pay for this it's a net negative for me 😂