Even request pipelining, because no one has the time to wait for multiple round trips.

Reply to this note

Please Login to reply.

Discussion

Your stuff is mostly the only remote relays I'm bothering with, TBH. Everything else is slow as fuck or full of irrelevant garbage and life is too short to cater to cheapskates and spammy relays, fr. Sick of it.

People can login and then it adds in their mailboxes and that. is. enough.

Totally trimming everything to our own stuff and I got zero fucks to give, this shit gonna be lit. Fuck em. Rage coding FTW.

🔥

Relays like strfry, khatru, realy are great until they are not.

They work great for caching locally but when you get to scale it implodes.

You want to store 1TB of books. You have to rent a single server that can store 1TB.

But what if it goes down? So you buy a few more replicas. Then you try to shard events across servers and fail.

Have fun compacting the database or upgrading it every once in a while.

NFDB fixes this. Just like SQLite is great for small scale, Postgres is better for larger scale.

I'm using them for local caching. Doesn't have to be the only local relay. I currently have 4 local relays running on my laptop and Citrine on mobile. I do a lot of testing, but still. My new 10432 kind that contains all localhost relays means that you can have as many as you want, to do what you want.

I don't think it's appropriate to store everything on the local system; that's actually a risky data strategy. It's for making sure you can transition between online and offline and autosync when you're in a good network. Like OpenDrive and Sharepoint do, but not retarded and dirt-cheap.

I think it's safe to assume that someone handling large or important data stores will have the sense to hire a professional admin or be an admin, themselves, but that's what we have you and nostr:npub10npj3gydmv40m70ehemmal6vsdyfl7tewgvz043g54p0x23y0s8qzztl5h for. Above my pay grade and not my problem.

Good.

For an in-browser use case, use the SQLite in-browser relay I suggested too. You at least have a cache that is better than nothing, compared to no cache until they set up realy or something else.

I'm using indexeddb, as a mandatory cache, since it works on phones. Isn't that one you suggested something that has to be natively installed and _doesn't_ run in the browser? Or did I check the wrong link?

No it’s a web worker that uses OPFS and wasm to run a relay

Hmm... I'll look. What was the link, again?

the whole collecting IDs and comparing before downloading events and then just downloading what is missing, that's what negentropy does, that's why i thjnk its neat having it built into the protocol..

as for uptime and redundancy, this always comes in at least double the cost. obviously it will take a super long time to compact a 1tb db. possibly on the order of days.. but you can run a replica, and still do a zero downtime failover once it is complete as long as you have enough disk space.

ive been spec'ing out some server tiers that could handle it, while also keeping cost in mind. i think having as low a server cost as possible is really important for nostr businesses.

i also like that clients have the distributed mindset here it should help with uptime by decreasing the odds of both relays experiencing unexpected downtime.

badger is better because it has split key/value tables. a lot less wasted time compacting every time values are written, and easier and faster to use key table to store some of the data that has to be scanned a lot.

for whatever stupid reason, nobody else in database development has realised the benefit of the key/value table splitting, even though the tech has been around for 9 years already.

probably similar reasons why so many businesses are stuck with oracle

at least some sane people realized, tbh

FoundationDB’s new Redwood engine, underlying architecture of S3, NFDB’s IA store as well

that's great. i think badger is cuter and more mature tho

I’ll have a REST API if you want. You can ask it to only send IDs or the full events.

Want to get the events by ID? Just use an ids filter on the same endpoint, no one has a use for 2 endpoints doing 1 thing.

But no SSE. Browsers put a limit on the number of parallel HTTP requests and SSE may exceed that as they stay open for a long time

Also, NFDB will support cursors on all interfaces so you can paginate without the pain. Forget created_at based pagination, which can be very unreliable.

If you want to know how to do it, calculate the lowest created_at for each relay, and use the highest one as the next one to use. Or a relay with a very old event will say “nope nothing left”

Okay, that gives us two sane connections to tap, one local, one external. Good.

Feeling cute. Might fix Nostr.

yeah, i think as always hybrid is generally better than dedicated. smaller, simpler parts that are built with clean simple interfaces, are much more easy to scale per use case

yeah, i've been thinking about how to do SSE properly. it seems to me the right way is to open one to handle all subscriptions, starting from when the client wants a subscription to be open, one is opened, and everything goes through that, and the event format (as in SSE event) includes the subscription id related to it.

this avoids the problem of the limits, which i think are typically about 8. but even 8 is plenty, it's just not necessary because you can multiplex them just by using subscription IDs just like the websocket API does. it also simplifies maintaining the subscription channel open, and it also allows arbitrary other kinds of notifications to get pushed as well, that we haven't thought of yet, aside from subscriptions to event queries based on newly arrived events.

why i think use SSE instead of a websocket? because it's less complex, basically just a HTTP body that slowly is written to. this pushes everything to the TCP layer instead of adding a secondary additional websocket pingpong layer. the client also knows if the SSE has disconnected and can start it up again, and the subscription processing on the relay side should keep a queue of events so when a subscription SSE dies it pushes the cache of events that have come in between the last one sent to the client (so it also means there needs to be a top level subscription connection identifier, IP address is probably not going to always work for multiple users behind one NAT).

also, just keep in mind that the websocket API by default creates a subscription management thing for every single socket that is opened, whereas if you do the queries primarily by http requests, this is slimmed down to a single subscription multiplexer, which will make it more memory efficient as well.

i don't think that there is enough of a clear benefit in using websockets for everything, their only real utility is for highly interactive connections, the complexity of multiplexing them at the application layer compared to one shot requests most of the time and one subscription stream for pushing data back is a huge reduction in the required data for managing a single client connection

Yeah, been researching and Twitter and ChatGPT actually use SSE, not websockets. They're streaming out, not in.

You only need one-way streaming, as the client knows when you are writing. There is normally no use case on Nostr for two-way streaming, in fact, since we don't really have live chats.

my primary concern comes from the amount of parallel streams. WS excels in this as it is one connection according to the browser and in the underlying stack.

In Nostr you can have many active subscriptions for new events.

Any non-subscription queries go straight to HTTP though. Helps a ton with load balancing and it means a software upgrade doesn’t interrupt queries.

Under the hood it’s all REST between internal NFDB services.

Yeah, but we don't actually need that, as most relays are now aggregators.

haha, yeah... this is the flaw with the bigger is better strategy when you can get a lot of mileage out of horizontal replication, especially when the database has GC to contain disk usage and the concomittant search cost of a larger database

You can, but we're going Big Data and need something more enterprise-level.

We're going after Sharepoint and Oracle, Kafka, and the Internet Archive. Need the really big guns.

Or when you manage thousands of customers. The NFDB relay is optimized for thousands of small ones as well.

It has a medium fixed cost, but the marginal costs are lower.

Can it handle a nation-state-sized customer? Asking for a friend.

Probably. If it doesn’t it can be easily upgraded to be able to do that.

There are some intentional design choices made by NFDB that could be changed for larger scale, at the cost of requiring more time to implement.

But at large scale you have the resources to do that.

So the answer is yes.

Good. Don't need that, yet, but maybe think through the design choices, so that you could explain the possibilities to someone.

There is already a scaling plan.

I’m not doing it yet because there is a chance no one will end up getting to that scale, or it will prove insufficient. There are constant architectural changes happening to NFDB because of new information from deploying it.

Doing it when the time comes, if ever, is more effective.

Yeah, but it's important to have that possibility. If they ask, I can say, Oh, we can do that...

You could scale even more if we could constrain queries.

Say you are developing Alexandria and want to access wiki pages. But all you do is look up wiki pages by their d tag.

You don’t need anything else. People mostly search up articles by content, and things like author rarely.

You don’t need Nostr queries for that, you can use the semantic search engine.

Congrats this will allow scaling 10x more with no changes, I am not kidding.

yeah, the shitty filter query thing is shitty

Well, we look up by naddr and d-tag, but yeah.

having indexes for those makes the searches pretty fast

"Interesting perspective! 🤔 It's always a balance between design choices and scalability. Sometimes the best innovations come from unexpected paths. Excited to see how it all unfolds! #Innovation #DesignThinking"

decentralization and small business/orgs need the small stuff, but big business needs bigger stuff. the approaches are complementary. small simple relays that are easy to set up and manage have a lower marginal cost in HR for a small user, and for the overall picture, a more resilient network, the decentralization/censorship resistance (and smaller attack surface for taking down many small targets, big systems are easier to do more damage).

the way it's played out over the history of internet services has been very clear. the more you centralize, the more brittle the system becomes.

We're trying to offer a system that has both and stores the information they need more often more close to the invidividual person's computer, with the information getting more and more concentrated, the closer you get to the central servers.

That means that all data is still available on SOME RELAY SOMEWHERE IN THE SYSTEM, even if the middle drops out, and people wouldn't notice the central server going down for a while, as data could be repopulated from the outside-in.

If you look at it, for any large or small scale system, the probability of failure relative to scale becomes lower in larger systems.

For small scale systems, they individually seem more reliable, but if you put 500 customers on 500 systems compared to 1 system, the former will have a higher overall failure rate.

The probability of experiencing downtime as an individual with a large system is not much different than a small system. But if a big system fails, more people notice, and it feels bigger.

With small systems, the frustration is spread out over a large time period, and so it feels like it never happens.

The distributed systems world has figured this out ages ago and this is why there is fault isolation, so that failures are contained, and become another ignored blip.

Yes, but Nostr takes it a step further with the signed, atomic events. Adds another layer of possibility.

What a local relay does is allow you to work offline and create a local hub. I'm the test case, as I commute on a 2,5 hour train ride through internet black-holes, so I need an app that automatically syncs, when the Internet "turns on" and then switches back to local-only, when we go through a tunnel or something.

Also, just fuck the constant wss streaming. Nostrudel or Amethyst are polling over 100 relays, simultaneously, and Primal connects so horribly, that my browser crashes. GB and GB of text messages, every damn day. Mobile plan is empty within a week, and then I've got 3 weeks of snail-mail Internet connections. Great.

AND THE BATTERY USAGE OMG

Nostr is an AP system. And for many things, who the fuck cares? Nostr is not meant to handle financial TXs or other OLTP workloads anyway, if you want that go use a database.

Nostr is a communications protocol, so you always have a database. The data has to be parked, someplace, before it can be relayed or fetched, after all.

This is about the efficiency of moving the information around.

even WITH chats it still makes no sense to add all that extra complexity

if it was streaming audio/video, different story. but then you would use RTP instead anyway.

I don't think we need SSE from external sources, but it makes hella-sense from a local relay to a local client. The relay can be syncing/polling/streaming from other relays in the background, after all.

Those two connections can be different protocols.