Good question, I am not sure. It’s going to depend on a lot of factors but the sync protocol being used maintains state between peers without transferring events.
https://github.com/hoytech/strfry/blob/master/docs/negentropy.md
Good question, I am not sure. It’s going to depend on a lot of factors but the sync protocol being used maintains state between peers without transferring events.
https://github.com/hoytech/strfry/blob/master/docs/negentropy.md
nostrdb makes damus *more* efficient battery and bandwidth wise.
It can reject parsing and validating events it has seen in the past. Using negentropy it can only pull notes it doesn’t have
Nice! What about storage requirements and redundancy?
nostrdb stores notes in an optimized binary format. You can just delete everything past a certain date if the db gets too large.
My test db of 1 million notes is 2.5gb
i'm currently writing a testing tool for implementing a LRU cache pruning strategy, so you don't have to manually intervene, just rent your VPS with its given storage limit, set your max size, and when i've completed the code, it will also keep a data set of the level of data utilisation growth over time to adjust the GC sweep trigger and prune target (low and high water marks)
it uses a simple algorithm that i devised whereby every time data is accessed, a timestamp value is averaged from the previous state with the current timestamp and in this way, least recently used stuff will have the oldest stamps on it, if unused, the oldest of all (not very likely to happen often, but could be mitigated by comparing the stamp to the created_at then you could set a threshold for how long an unused entry can remain before its pruned, otherwise it would be swept up with all the oldest ones)
as for optimisation, there is a fair bit possible there... you can convert timestamps to varints, you can create a 64 bit index table value for pubkey, the text bodies can be compressed in a zstd based bulk repository - text usually compresses down to 10-20%, tag strings can be shortened probably to 32 bit values for the headers at least, probably you can do the same thing making an index table for relay URLs, unlikely that you will run out of space for 32 bits there too (4 billion)
even the event IDs can be squashed down because they ultimately derive from the canonical form of the event which means all you need to do is have the means to reconstruct them and generate that hash, plus a short table, again, 64 bit is likely fine, especially if you can use a database serial counter which is guaranteed collision free