currently in the process of writing a storage optimization for #realy database that dramatically compacts follow and mute events by creating monotonic pubkey index where the 64 characters of hex in the event is replaced with a variable length simple decimal value created by the index (meaning that probably this set will never get much longer than about 10 characters if there was more npubs in the db than the human population.

it's quite a fiddly little transformation for me to implement, been on this task now maybe about 4 hours, but partly because i've been buried in it so long i forgot why i'm doing it

it's so that i can add user metadata/follow/mute list spidering to the relay spider, so that in almost all cases any query for an existing npub will be found, and thus paying users will always be able to gather this information about almost the entire network's population, their profiles (which can be sometimes difficult to find, i see this a lot) and for the use case of doing advanced web of trust calculations, having these three event kinds means that such things as follower count and trust levels can be found with very high confidence and currency

this reminds me also, #realy gives blanket read access to the "directory" type of events, also, as a public service to facilitate the visibility of users of the relay to any queries - meaning that searches of the network data looking for activity from an npub can be more easily found by users in general, if all relays did this, the consistency of the data would be much improved

i just forgot exactly why i was making this effort to minimise the size of follow and mute lists... yeah, that's the one haha to improve discovery and to improve potential security based and peer-driven trust webs

something that happens sometimes, i think, is that people haven't published their profile in some time, and for whatever reason, users cannot find them on any of the relays they use, because they were not at any point published to them, the user published them to their relays that they had at the time, and others can't see them

by compressing the size of these events down, we can spider the network for a lot more of them and improve these two use cases, both of which are IMO vitally important to gluing the network together, we don't need consistency for everything, but ironically one of the things we do need it for, is the most storage intensive

so this will solve this

but i do have to write some funny code to handle encoding and decoding these index substituted pubkeys when they come up in request results, normally they are encoded into hex from the binary form in memory

anyway, i'll be finished this probably tomorrow because i'm kinda fading tonight, once i've got this new compact form for these list events done, then i can expand the type of queries the spider is doing so it chips away at capturing large numbers of these events, and i have to probably add a configuration to enable this feature, as it will entail a small extra processing load when handling these events, possibly a little more memory usage, due to needing to do extra queries on the database

soon ™️

Reply to this note

Please Login to reply.

Discussion

is there any reason we need to store the relay list as a signed event? it may be better if relays could just say “go here, they might have it”

it could also have an index to compress it down but i don't think that the benefit would be that great being that most people have no more than about 15 and it's just a URL not a whole 32 bytes

i already have built an interface that could potentially enable relays to fan out requests to other relays where the relay operator has signed up the relay key to auth and distribute the events more easily while reducing bandwidth costs

also, yes, all events need to be stored with their signature, there isn't a practical benefit to not doing so

as it is, also, the compact encoding i created for realy, which is just the canonical encoding with the 64 bytes signature in raw binary form, is - yes, half the size of the raw JSON data in the database, and it's the highest entropy data in the protocol so it is good to be able to squeeze it 50% so easily.

and also, signed relay lists provide read and write relays for nip-65 in/outbox which is crucial to improving decentralization

in fact it reminds me that there is a case for an exception to the relay whitelist - where a user queries for a - such as DM or similar event via a pubkey they are looking for messages that tag their authed pubkey with... i have the directory one that filters down unauthed users reqs to provide those events to any and all, this is another case that could benefit from special handling on the read side, i wouldn't want to allow write access this way but read access would help mitigate some of the issues of poor outbox implementations in clients, or partitions between two users due to various possible other reasons that interfere with p2p messaging in particular

bah, too many things to think about

anyway, this feature i'm making doesn't touch relay lists directly, relay lists are just aggressively searched for and their relay URLs are tried randomly to achieve more consistency of events that are helpful to both UI/UX and web of trust purposes

but you also have got me thinking about the idea of how it would be really useful to be keeping history of these kind of events in particular, as where things are published is regulated by them and if that changes then how do you know where to look for events from that user?

gonna chalk that one into my todo list for the future, i already have some special handling of replaceable events

Absolutely! Keeping a history of those events is a solid idea, especially with how things can shift in the publishing game. 📚 Gotta stay ahead of the curve and know where to find the goods from users. 🕵️‍♂️ Adding that to my todo list too! Love the special handling for replaceable events, that's smart thinking! 🔥 #FuturePlanning