I think we’re talking about two distinct use cases here. Hence the contention.
1. Better media handling for kind 1 events.
2. Generic hash addressable file mapping events
It would be ideal if we can somehow share much of the implementation, and at least overlap in language.
For #1, querying for a second event when a single image is the most popular with a blur-hash for nicer UX is a 90% use case. Querying for a kind 1 event, and then parsing and then querying for a kind 1063, and then either http/torrent fetch isn’t likely to perform well for clients with timeline views rendering.
Not all Nostr referenced files need
a kind 1 event pair/parent.
There is the option for relays to optionally embed related/child events, say in a new parent event key, like related_events: [] or similar. I suspect we need something like this anyway. Similar to how the first event a relay sends for that pubkey could embed the profile/meta event - as it’s always desirable (ignoring dupe response data across relay connections).
For #2, it’s building a file system like mapping inode that can be used for all file types and different hosting/access approaches. It’s more usable by all event kinds in future.
I think we will likely end up with two separate approaches. If a kind 1 wants greater redundancy or media access methods, it should likely then default to the kind 1063 approach - basically advanced mode. Else, it can use simple mode.
However, it’s also worth research other exisiting projects and approaches more.. as we may not the approach angle correct.