btw, fiatjaf is wrong about badger, he just doesn't know how to use it or write good, bugproof binary encoding libraries... the batch processing functions are incredibly fast, like, 15gb of database can be measured in ~8 seconds and if a GC pass is needed that might take another 5-12 seconds deponding on how far over the limit it got
storing full events
you have to enable the GC size limit for it to have a high and low water mark, and you can additionally configure that if the defaults don't fit your case, and even further, you can create a second level data store, which would presumably be a shared data store that is accessed over the network, and the headroom above the high water mark will then store the indexes of events that have fallen out of the local cache but still allow fast filter searches
https://mleku.net/replicatr is the core, which is a fork of khatru, and https://mleku.net/eventstore is the eventstore with the GC enabled for the "badger" and there is a "l2" event store that lets you plug in two event stores, one is usually badger, and the other can be anything else, and there is a "badgerbadger" which i wrote using two levels of badger event store, one with GC on and L2 enabled that tests the GC once your event and index storage size exceeds the size limit
Discussion
also, yes, that will scale, on a 20 core threadripper with 40Mb of cache and a 128gb of memory it would probably zip through that job in less than half that time