i was able to re-enable the RWMutex - and was mostly working, but then it still was deadlocking, and i know it's because nostrudel extensively uses count queries and they aren't fixed yet
Discussion
nope, that RWMutex definitely can't be in there... so i'm just atomizing the queries now, that seems to be working perfectly
it's just an issue of preventing race conditions on concurrent database writes... previously as shown in testing heavy query load while adding lots of new events and updating the access counters on records was causing the database to report transaction errors, which is an atomicity failure
what that means is that at the same time, two transactions were modifying the same record at the same, or very nearly same time, specifically when the DB engine tries to resolve this conflict, after the transactions close, it can't, meaning both transactions have to be repeated
with the transactions made smaller and single purpose, the chances that this collision in time happens are now pretty much shrank down to zero, only takes a few milliseconds timing difference and the access event is atomically separated and no more problem
it really was also just about the last access time record that i added also... this is the only part of the data related to events that ever changes after being written, and secondarily, after being pruned by the garbage collector
so, the GC now has a mutex that locks out the rest of the accesses, and the rest can then run concurrently because transactions are not overlapping in time so much and thus should not get these race conditions any more
yeah, nah, can't put a mutex on that either, not sure why, but i think simply making the DB transactions concurrent and atomic ensures that it will almost never happen that one request stomps on data another request is accessing
it was worse before, heavy load would have definitely caused tx commit failures but i think now it's nearly zero chance of happening, so performance is maximal and reliability as well
it sorta seems like it doesn't make sense that you can have a data store being handled by multiple processes at the same time, but this is the wonder of ubiquitous multiprocessing inside extremely fast memory caches on modern CPUs, the queries can come in, and the multiple threads can literally be accessing the same pieces of memory at the same time (though usually from different copies that have reached L1 cache for a core) and voila, race condition
so it's really FKN fast, but has this problem that processing can get out of sync, and the main thing you have to do to resolve this issue is not do many things inside a DB transaction
to make an analogy, imagine if instead of bitcoin being entirely distributed, and instead there was a small group of aggregator nodes that everyone sends their transactions to... but they send them to different ones at different times
when the aggregators push everything together, it can happen that two transactions are different, a so-called "double spend attack"... yes, the problem i just fixed prevents the database equivalent of rewriting a record two different ways in too close a time period to isolate them - well, doesn't prevent it, but makes it shrinkingly unlikely because each individual write is now isolated in one item in the database log and thus the chances of them having a temporal overlap is now basically zero
a lot of waffle just to say "replicatr event store will handle extremely high demand when it comes"