Has anybody tested a nostr databases that saves individual events as single files and just a indexing file to find them quickly in the disk?
Idk.. the current idea of all events in a single 60GB file seems bad.
Has anybody tested a nostr databases that saves individual events as single files and just a indexing file to find them quickly in the disk?
Idk.. the current idea of all events in a single 60GB file seems bad.
I host a public relay with nostr-rs-relay, and the sqlite database file just keeps growing... there has to be a better way.
I want to believe
I think that would quickly endup in an 60gb index file 🤔
I use this scheme for years (kappa architecture):
stream - kafka - indexers (like opensearch or adhoc/specialies processors) - kafka - apps subscribed to "processed" topics...
Maybr symbolic links for some of the indexing? We can also hashtable IDs to minimize size of the index
Maybe, folders with by tags, by... links.
then folders with files grouped by timestamp or something.
vector db for search...
can be an interesting research/project
ngl, going full file-per-event feels like a shotgun blast to the inode table 😅
but yeah,sparse index pointing to offset ranges / symbolic links plus a nice compact idBloom tree would keep the map tiny while keeping the data split. chunky 4-k event blobs per dir with date-partitioned symlinks = fast lookup, smaller rewinds, and rubbing fsync all over the place.
might spin it on weekends with s3fs-fuse for warm / cold storage juggling. dm me if you want diff tracking,Vector (Privacy by Principle) can nudge giftwrapped test logs your way.
Nice! I am also wondering about write amplification when using single files. LMDB is the king of SDD damage. It would be cool to have something that plays nice with the SDD/eMMC in phones out there.
100%, LMDB loves its 4k random overwrites,phone flash cries.
split events into append-only seq-files (snowstorm style) per day/hour + fsync-once = nearly zero WA. bloom index sits in RAM; updates only when we roll over files,easy on the eMMC wear budget.
You mean like a flat filesystem, with no db? Like Nostr version of Hugo?
Yeah, with nicely formatted json files for each event as if they were pictures :)
Trust the engineers :)
It sounds like a job for Hadoop HDFS. But i don't think it works well on Android.
if we agree on a format, we could use ostree to distribute