nosql and kv stores also have scaling strategies too, and use one of the several common replication protocols, you just been in this a while ;)
Discussion
You might be right. But no relays are using scalable KV services though? At least none that i've seen?
nope but i know i could add a two tier fewer larg many smaller caching relays strategy it just requires writing one implementation of a library i wrote already and tested for a different second level "master" cache
I would really like to be able to "load balance" relays meaning they are stateless and use a backend that I can administrate on my own. I don't want to rely on the relay software itself to admin the database. Again where SQL wins. As an admin I can do optimization outside of the relay environment.
this is a bit like the debate about why bother enabling JWT bearer tokens (for dumb old epaper readers and postman btw) versus just use nip-98 (which doesn't have an extra expiry field)
i'm not sure what you mean by "stateless" relays, unless you mean they are dumb stores and the "master" pushes and pulls data from them, ie, it would subscribe to all new events from the slave relays, and then push them to the others
the tools to do this are not created but thety are also quite trivial, and can be built as separate pieces, you can have a replicator server that just subs on all the relays and pushes to the master, and the satellites can just forward queries when they come in and push the newly added updated events that came in from other satellites over the subscriptions that clients opened up for matching events
i think it would be better if you made them all simply replicators so each of the load balance relays simply pushes new events to the centre master and the master broadcasts them to all the others, instead of the master pulling them with subscriptions
subscriptions are cool and all, but broadcast is coolerer
it's all teh same to me though, point being that a relay IS a database server already
strfry can scale quite large.. larger than what anyone needs right now. you can setup replicas and load balance between them. if you wanted to you could shard them, but you would need a sharding proxy.
either way, i think nostr scales with more people running relays and using outbox, so thats my focus (not as much mega scale single relay DB because i dont want to be like bluesky)
Pleb relays is the way, but some of us are going to have to handle thousands of concurrent users (hopefully)
strfry uses and embedded db doesnt it? lmdb iirc?
How can I run multiple strfy instances from the same db?
It's also not just for scaling, but for redundancy too. Were talking experimental and poorly tested at best, software here.
yes, it uses lmdb. you can run multiple processes of strfry that use the same db directly on disk if you wanted that. to scale horizontally you can replicate the db and have a load balancer in front. sure its not transactional like mysql is, but it is eventually consistent like a larger nosql db would be.
well i'd have to share it across a nfs or cifs share which historically has been shit for things like that. Server disk space is expensive. I would much rather have "centralized" db servers in a cluster and have them available for services to connect to them.
Id like the instances to be deployable in an HA environment so the disk can't be shared since it's not on the same server.
you would not want to share the db this way no, you would use strfry protocol (negentropy/stream/router) to keep the replicas in sync. it would be ha. every node the same..
But that means the db would be duplicated on every node and I would be relying on strfry to keep things in sync. Outside of my ability to interact with it using standard tools like weve been using for decades with sql.
My issue is relying on a service that is designed to be a relay, also trying to be a database system. I think it's just too much to ask and can't be done well.
i don't know what you mean by "load balanced" to be honest
nostr relays are really just a database server in themselves
do you want to replicate the database, do you want to demand cache it, do you want to shard it? it's quite important what strategy you have in mind and whether that actually fits the use case
sharding is probably gonna need to use social graph and maybe combine that with geofencing
caching is the simplest, dumbest idea, where the relays that people use don't actually store much data but they proactively fetch new data and expire old data quickly to keep the space available
caching was the way i envisioned working it... you could extend it with broadcast so that new data is shared around immediately and then with the garbage collection, rapidly purged from the store when it gets no traffic
how you design that optimization depends entirely on how the data is going to be used, and propagated
inherently, replicated, RAFT/Paxos/pBFT databases ARE exactly replicas, you even used the word replica, you didn't mean replica because you just clearly said you didn't mean replica
replica is like a bitcoin node with the same blockchain, that's a replica, all replicated database protocols make replicas, and individual replicas don't have the option to decide not to store data arbitrarily
what you are talking about is a caching strategy, which means you want to do garbage collection and broadcast
there *are relays that use SQL.. like khaturu or ditto.
i know gleason spends lots of time and probably money on servers, performance tuning the queries 😁 nostr will absolutely slam sql databases and is typically much more expensive to run a sql backend than lmdb.
ditto uses postgress last i checked and seems like it still does. Closer I guess. khaturu seems to be the best option, although I have to manage a build myself if I want to connect it to my infra. Midly acceptable.
yeah, people seem obsessed with postgres, i think they must enjoy the pain. unsure if they can be adapted to mysql, probably not once theyre done tuning it.
there is also relays using mongodb..
Yes grain does. truely not a fan of mogo myself though. I guess I'm alone on SQLServer world XD. I use it for everything, but maria db is close second except it's column search sucks huge ass compared to sqlserver.
katuru has pluggable backends so it may not be too hard to add one..
yeah, except you get to have fun with fiatjaf's shitty concurrency and slow ass json codec along with khatru
i've firmly decided that after i finish this JWT bearer token stuff and implement the HTTP endpoints that all features cease after that and i build out a modular scheme for them all , and untangle all the entanglement
oh and i forgot to mention fiatjaf's shitty event store interface which assumes you want to deal with channels and several more goroutines than you need, which make an utter hellish mess
but have fun with htat anyway
i will try and build out a fully simple architecture to make it all easy to include or not include, very soon, it's driving me nuts, i made the first part nice, now feature adding, ok, too much, i'm getting claustrophobia
I have a feeling hes not using an orm system. Sketches me out considering SQL injection is still way higher up on the list of vulnerabilities than it should be.