and this is the entire fucking issue with strfry, realy, khatru, etc

they try to reimplement replication when it has been done many times, often at worse performance

one of the primary choices for the underlying DB were it must support automatic replication and healing, and that it also must not be complete shit to maintain

Reply to this note

Please Login to reply.

Discussion

Which many if not most database systems provide, even if they aren't ideal. Hell even SQLite has options for replication.

No one has been able to give me numbers. If you had to write it down, how many concurrent users, and total users would you expect to service with something like strfy? Meaning maybe 80% or better latency rolloff?

nostr:npub10npj3gydmv40m70ehemmal6vsdyfl7tewgvz043g54p0x23y0s8qzztl5h could likely answer this more specifically!

I’d say at a few thousands users you’d start having issues with strfry.

Nostr.land was less affected because it had less data to store than a standard public relay

it uses LMDB for DB which is both good but also has issues, maintenance tasks such as breaking changes require taking the entire DB offline, etc

biggest problem was it was a pain to do AUTH on, or search, or any sort of spam filtering that wasn’t basic

gathering numbers based on users is tough on nostr because what would you base it on? how do you tell if its a user, or a scraper gone wild?

I'm sure nostr:nprofile1qqs06gywary09qmcp2249ztwfq3ue8wxhl2yyp3c39thzp55plvj0sgprdmhxue69uhhg6r9vehhyetnwshxummnw3erztnrdakj74c23x6 could come up with a test plan, but all i can see is #connections, and periodic latency smokepings (doing a simple REQ)

ive seen one strfry handle up to 8000 simultaneous connections, but that just means someone left their connections open.

one performance thing is in my mind, there is a very real difference between a nostr aggregator and a relay. if you are doing aggregation and search, you have different needs for your data pipeline than normal client-relay nostr usage.

since outbox model is still barely used, not many actual users are connecting to regular relays, they only connect to primal or damus.

anyway, not sure why im bothering saying this, if you want to build a better relay I'm all for it. i guess im just annoyed at performance comparisons i cannot confirm myself (closed source). nostr clients dont even judge a relays performance other than a websocket ping, so i never have received a user complaint of slowness. I use the relays daily myself and it seems just fine with minimal resources.

If an aggregator thinks they're too slow, well, they're not paying me to make their aggregation fast so it makes no sense to scale up for them. There's a reason robots.txt exists because early web was the same, a crawler could take down your site, but you wouldn't want to scale up just to help them crawl faster.

distributed systems where you dont get any heads up what people are doing, is a hard environment to work in. but strfry handles it like a champ.

Most of the clients are so inefficient that a measurement would mostly be of their own slowness. 😂

We've added result time to our Alex client success message, so that might provide some transparency.

what could be done is, have test harness that mimicks exactly the common types of requests a user does (like record what amethyst does) and play it back x1000 against a test relay with a bunch of generated data to query.

or for alex client, the typical set of reqs or events that are expected to be happening.

My smoke test with Playwright will be timed, and I could run it in parallel across a set of relays, on an hourly basis, and then post pretty trend diagrams on https://status.gitcitadel.com .

if its a stress test, i wouldnt do it on live relays, but a periodic ping of what a client would do in separate geolocations x1 is useful

I said smoke test, not stress test. It's just one happy path e2e in a headless browser.

i call it "smokeping" because there was a cool monitoring tool called that, written in perl back in the day.. it was my #1 fav thing for seeing status of the end to end service and it's trends over time.

i do this and publish as nip66 latency events and to influxdb for querying. very useful, not on the same network as the relay (geo distributed). It connects and does a simple REQ, records total latency.

for a stress test of relay software itself you usually have to be running it on the same hardware configuration. maybe we can work together on benchmarking somehow though anyway, everybody loves benchmarks

Yeah so 8k open connections, not 8k in use fair enough. Yeah my concern is we need way more than a single instance and (I hope) way more than 8k active connections to support a product like Alexandria assuming we get anywhere close the reach were aiming for. I just want the software to exist for the time we DO need it. Like I said in my argument before. In just about any other software deployment, the software exists to scale, if needed, but not implemented _until_ needed. What happens when we do need it, and it doesn't exist?

Beyond that my concern, and I made this number up, something like a latency log scale showing some sort of "rolloff" - when load gets high enough to cause latency to creep to the point the service becomes interrupted or noticeable to the end user.

alexandria would be a read heavy load, books read and commented vs. uploading of the books

almost exclusively. The relay doesn't have to be a relay, kind if just needs a database (or files tbh) and some good caching.

http api like the one i made

badger has a good read cache on it as well, and it's designed to split the read/write load mainly in favor of writing indexes because events don't change, also, scanning the key tables is fast because it's not interspersed with value read/write

with any DB there is always going to be a ceiling where you must eventually start splitting into clusters

but NFDB makes this a very hard problem to reach, and even if you do there are many solutions:

1. You can shard indexes across clusters with very little changes

2. Events can also be stored in a blob store

this gives up strong consistency in certain contexts but also is extremely manageable, compared to say, trying to split events into 8 different relays

then you would have to query 8 relays for each request

but with NFDB the only difference is index reads now happen to a different cluster and there is no request or resource amplification