just got done rewriting the event query logic in the #replicatr #layer1 event store

process is now broken up into separate operations

- first, scan the indexes for matches of the prepared query index searches

- then, for each index, spawn an independent goroutine that only does the work of digging out the actual event that the index search found the serial for (unique identifier in the database)

now, i can turn the RWMutex back on in the database transaction wrapper and it doesn't get deadlocked between the count and query functions

Reply to this note

Please Login to reply.

Discussion

need to still look at the count function make sure it's not combining operations

spoiler alert: yes, it is combining operations... will now rewrite

i was able to re-enable the RWMutex - and was mostly working, but then it still was deadlocking, and i know it's because nostrudel extensively uses count queries and they aren't fixed yet

nope, that RWMutex definitely can't be in there... so i'm just atomizing the queries now, that seems to be working perfectly

it's just an issue of preventing race conditions on concurrent database writes... previously as shown in testing heavy query load while adding lots of new events and updating the access counters on records was causing the database to report transaction errors, which is an atomicity failure

what that means is that at the same time, two transactions were modifying the same record at the same, or very nearly same time, specifically when the DB engine tries to resolve this conflict, after the transactions close, it can't, meaning both transactions have to be repeated

with the transactions made smaller and single purpose, the chances that this collision in time happens are now pretty much shrank down to zero, only takes a few milliseconds timing difference and the access event is atomically separated and no more problem

it really was also just about the last access time record that i added also... this is the only part of the data related to events that ever changes after being written, and secondarily, after being pruned by the garbage collector

so, the GC now has a mutex that locks out the rest of the accesses, and the rest can then run concurrently because transactions are not overlapping in time so much and thus should not get these race conditions any more

yeah, nah, can't put a mutex on that either, not sure why, but i think simply making the DB transactions concurrent and atomic ensures that it will almost never happen that one request stomps on data another request is accessing

it was worse before, heavy load would have definitely caused tx commit failures but i think now it's nearly zero chance of happening, so performance is maximal and reliability as well

it sorta seems like it doesn't make sense that you can have a data store being handled by multiple processes at the same time, but this is the wonder of ubiquitous multiprocessing inside extremely fast memory caches on modern CPUs, the queries can come in, and the multiple threads can literally be accessing the same pieces of memory at the same time (though usually from different copies that have reached L1 cache for a core) and voila, race condition

so it's really FKN fast, but has this problem that processing can get out of sync, and the main thing you have to do to resolve this issue is not do many things inside a DB transaction

to make an analogy, imagine if instead of bitcoin being entirely distributed, and instead there was a small group of aggregator nodes that everyone sends their transactions to... but they send them to different ones at different times

when the aggregators push everything together, it can happen that two transactions are different, a so-called "double spend attack"... yes, the problem i just fixed prevents the database equivalent of rewriting a record two different ways in too close a time period to isolate them - well, doesn't prevent it, but makes it shrinkingly unlikely because each individual write is now isolated in one item in the database log and thus the chances of them having a temporal overlap is now basically zero

a lot of waffle just to say "replicatr event store will handle extremely high demand when it comes"