so, i'm gonna do it

rewriting this database so instead of storing the events as values, it's just gonna put them in a filesystem in their binary form, with the filename being the event id hash

the memory leak is all about unpacking those out of the database, and it can be avoided and pushing it into the filesystem means i don't have to think about the read caching, that's already handled by the kernel

Reply to this note

Please Login to reply.

Discussion

That will be too slow. Use LMDB instead.

the indexes stay in badger, which is perfectly fast, as i already found, 8 seconds to make a census of 15gb of data, and that was with the shitty values in the database as well

fetching files by hashes will only be slow on the first access, and not that much slower because filesystems are generally fast

not windows, no, but ext4 is extremely fast at handling small files

now allow replacing it by a blob store like seaweedfs

i actually found that i had a failure to terminate queries so they kept on accumulating events over and over again in a loop

now the results are populating a map (so they can't get duplicated) AND the termination conditions are properly enforced... problem is now gone

only thing i need to do now is improve the parallelism

also, the base i worked from, of fiatjaf's eventstore implementation, makes assumptions that there won't be duplicates in the database and that events were stored in their contemporary order, this is broken in the case of restoring events while running of old events, so such a scheme of event ordering breaks the ability to keep histories of replaceable events while also only returning one especially in the case a client sets a limit 1 on it - the first version found would be returned in this case and that's not correct

currently it is all done in a series of single transactions in a loop sequence that is created by the ordering of the indexes generated out of the filter... i need to change it so it batches this up and runs it in one fell swoop... this is my task today

it is far less practical to retain iterator ordering in the database to match the timestamps than it is to gather results before dispatching them, and by doing this you also enable useful optimizations like not sending events from muted npubs from the authed user, for instance, and it eliminates any risk of sending duplicates, or outdated replaceables