Nostr Web Client

lol

i can tell that you have a strong grasp on the implementation requirements for a relational database or graph database. they ALL use KV stores under the hood.

the only way real world database engines tolerate this kind of structure without degrading iteration performance is partitioning the large data from the small. or you could just say, why would you store a jpeg in an event instead of the raw binary in a blossom record? it's trivial to just store them as files with the hash as filename, and the filesystem is already optimized for traversing the metadata to find it efficiently.

Vitor Pamplona 1mo ago

So.. ALL dbs use KV except for the real world databases that are doing it right? I know most DBs are just a glorified KV. But I also know those who actually fixed this a while back and not just deferred to the developer to manually save shit in files.

Reply to this note

Please Login to reply.

Discussion

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1mo ago

computer filesystems have been refined for structuring the data to optimize seek latency for over 50 years. there really is no place you can look in the field for a better option for finding and retrieving large amounts of data quickly.

back in the olden days, i sometimes was running my linux installation on ReiserFS, before he got put in jail for killing someone (lol, i still can't comprehend that). reiserfs was one of the most notable filesystems in the field of data storage for pionering optimizations for the big and little problem in filesystems. ext4 has a substantial amount of them and iirc, reiser was the first to have a two stage commit journal that wrote constantly append only log and a worker in the background that compacted them into the filesystem when idle, populating the necessary metadata tables.

while there may be other variants of data structure storage for fast searching indexes than LSM K/V stores, as far as i know, for general usage it's still the best strategy and is very friendly to kernel disk cache memory.

the more the iterator has to decide as it progresses about where it's reading from next, the less time it's doing the reading of the thing you want. that's why it matters, and that's why there's no sane reason to reinvent the wheel for large blob storage. very few production systems have implemented it any other way.

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1mo ago

here's a summarry of all the innovations that ReiserFS made that are now almost universally used in operating system filesystems:

https://chatgpt.com/s/t_6915f6627b08819191d985b89871c4c9

Vitor Pamplona 1mo ago

Correct. Very few devs have tried to go beyond dumb approaches. The question is: do you want to be part of the very few that went above and beyond to build faster systems or do you just conform with the dev's incapacity to actually do computer science and call it a day?

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1mo ago

well, that's the neat part: at least there is one nostr relay dev who has some appreciation for the subject.

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1mo ago

(fiatjaf is the other, his stuff is pretty damn sleek)

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ 1mo ago

just had to probe gpt a bit more about reiser also, and it went into a lot of depth about the stuff in reiser4 that didn't end up being practical, as well as a last section about what things it did that a typical modern CoW filesystem would benefit from:

https://chatgpt.com/s/t_6915f989d1548191bad299b4f0d0558b

immutable, append only data like nostr events and blossom blobs are extremely relevant to this with regard to the question of efficiently especially large binary data alongside small regular event json data. i'm snipping this to put somewhere for later to maybe find some ways to speed up orly's database even more. i think right now, it's pretty good, and would scale to 3 figure core server rigs with RAID nvme drives pretty well but these optimizations would become more and more important the larger the database gets.

so, yeah

circling back to the OP, nostr:npub1gcxzte5zlkncx26j68ez60fzkvtkm9e0vrwdcvsjakxf9mu9qewqlfnj5z in fact, your recommendation is probably generally correct for most (newbies to sysops) tasks that nostr relay operators would want to do, without such improvements being applied.