I’ll have a REST API if you want. You can ask it to only send IDs or the full events.

Want to get the events by ID? Just use an ids filter on the same endpoint, no one has a use for 2 endpoints doing 1 thing.

But no SSE. Browsers put a limit on the number of parallel HTTP requests and SSE may exceed that as they stay open for a long time

Also, NFDB will support cursors on all interfaces so you can paginate without the pain. Forget created_at based pagination, which can be very unreliable.

If you want to know how to do it, calculate the lowest created_at for each relay, and use the highest one as the next one to use. Or a relay with a very old event will say “nope nothing left”

Reply to this note

Please Login to reply.

Discussion

Okay, that gives us two sane connections to tap, one local, one external. Good.

Feeling cute. Might fix Nostr.

yeah, i think as always hybrid is generally better than dedicated. smaller, simpler parts that are built with clean simple interfaces, are much more easy to scale per use case

yeah, i've been thinking about how to do SSE properly. it seems to me the right way is to open one to handle all subscriptions, starting from when the client wants a subscription to be open, one is opened, and everything goes through that, and the event format (as in SSE event) includes the subscription id related to it.

this avoids the problem of the limits, which i think are typically about 8. but even 8 is plenty, it's just not necessary because you can multiplex them just by using subscription IDs just like the websocket API does. it also simplifies maintaining the subscription channel open, and it also allows arbitrary other kinds of notifications to get pushed as well, that we haven't thought of yet, aside from subscriptions to event queries based on newly arrived events.

why i think use SSE instead of a websocket? because it's less complex, basically just a HTTP body that slowly is written to. this pushes everything to the TCP layer instead of adding a secondary additional websocket pingpong layer. the client also knows if the SSE has disconnected and can start it up again, and the subscription processing on the relay side should keep a queue of events so when a subscription SSE dies it pushes the cache of events that have come in between the last one sent to the client (so it also means there needs to be a top level subscription connection identifier, IP address is probably not going to always work for multiple users behind one NAT).

also, just keep in mind that the websocket API by default creates a subscription management thing for every single socket that is opened, whereas if you do the queries primarily by http requests, this is slimmed down to a single subscription multiplexer, which will make it more memory efficient as well.

i don't think that there is enough of a clear benefit in using websockets for everything, their only real utility is for highly interactive connections, the complexity of multiplexing them at the application layer compared to one shot requests most of the time and one subscription stream for pushing data back is a huge reduction in the required data for managing a single client connection

Yeah, been researching and Twitter and ChatGPT actually use SSE, not websockets. They're streaming out, not in.

You only need one-way streaming, as the client knows when you are writing. There is normally no use case on Nostr for two-way streaming, in fact, since we don't really have live chats.

my primary concern comes from the amount of parallel streams. WS excels in this as it is one connection according to the browser and in the underlying stack.

In Nostr you can have many active subscriptions for new events.

Any non-subscription queries go straight to HTTP though. Helps a ton with load balancing and it means a software upgrade doesn’t interrupt queries.

Under the hood it’s all REST between internal NFDB services.

Yeah, but we don't actually need that, as most relays are now aggregators.

haha, yeah... this is the flaw with the bigger is better strategy when you can get a lot of mileage out of horizontal replication, especially when the database has GC to contain disk usage and the concomittant search cost of a larger database

You can, but we're going Big Data and need something more enterprise-level.

We're going after Sharepoint and Oracle, Kafka, and the Internet Archive. Need the really big guns.

Or when you manage thousands of customers. The NFDB relay is optimized for thousands of small ones as well.

It has a medium fixed cost, but the marginal costs are lower.

Can it handle a nation-state-sized customer? Asking for a friend.

Probably. If it doesn’t it can be easily upgraded to be able to do that.

There are some intentional design choices made by NFDB that could be changed for larger scale, at the cost of requiring more time to implement.

But at large scale you have the resources to do that.

So the answer is yes.

Good. Don't need that, yet, but maybe think through the design choices, so that you could explain the possibilities to someone.

There is already a scaling plan.

I’m not doing it yet because there is a chance no one will end up getting to that scale, or it will prove insufficient. There are constant architectural changes happening to NFDB because of new information from deploying it.

Doing it when the time comes, if ever, is more effective.

Yeah, but it's important to have that possibility. If they ask, I can say, Oh, we can do that...

You could scale even more if we could constrain queries.

Say you are developing Alexandria and want to access wiki pages. But all you do is look up wiki pages by their d tag.

You don’t need anything else. People mostly search up articles by content, and things like author rarely.

You don’t need Nostr queries for that, you can use the semantic search engine.

Congrats this will allow scaling 10x more with no changes, I am not kidding.

yeah, the shitty filter query thing is shitty

Well, we look up by naddr and d-tag, but yeah.

having indexes for those makes the searches pretty fast

"Interesting perspective! 🤔 It's always a balance between design choices and scalability. Sometimes the best innovations come from unexpected paths. Excited to see how it all unfolds! #Innovation #DesignThinking"

decentralization and small business/orgs need the small stuff, but big business needs bigger stuff. the approaches are complementary. small simple relays that are easy to set up and manage have a lower marginal cost in HR for a small user, and for the overall picture, a more resilient network, the decentralization/censorship resistance (and smaller attack surface for taking down many small targets, big systems are easier to do more damage).

the way it's played out over the history of internet services has been very clear. the more you centralize, the more brittle the system becomes.

We're trying to offer a system that has both and stores the information they need more often more close to the invidividual person's computer, with the information getting more and more concentrated, the closer you get to the central servers.

That means that all data is still available on SOME RELAY SOMEWHERE IN THE SYSTEM, even if the middle drops out, and people wouldn't notice the central server going down for a while, as data could be repopulated from the outside-in.

If you look at it, for any large or small scale system, the probability of failure relative to scale becomes lower in larger systems.

For small scale systems, they individually seem more reliable, but if you put 500 customers on 500 systems compared to 1 system, the former will have a higher overall failure rate.

The probability of experiencing downtime as an individual with a large system is not much different than a small system. But if a big system fails, more people notice, and it feels bigger.

With small systems, the frustration is spread out over a large time period, and so it feels like it never happens.

The distributed systems world has figured this out ages ago and this is why there is fault isolation, so that failures are contained, and become another ignored blip.

Yes, but Nostr takes it a step further with the signed, atomic events. Adds another layer of possibility.

What a local relay does is allow you to work offline and create a local hub. I'm the test case, as I commute on a 2,5 hour train ride through internet black-holes, so I need an app that automatically syncs, when the Internet "turns on" and then switches back to local-only, when we go through a tunnel or something.

Also, just fuck the constant wss streaming. Nostrudel or Amethyst are polling over 100 relays, simultaneously, and Primal connects so horribly, that my browser crashes. GB and GB of text messages, every damn day. Mobile plan is empty within a week, and then I've got 3 weeks of snail-mail Internet connections. Great.

AND THE BATTERY USAGE OMG

Nostr is an AP system. And for many things, who the fuck cares? Nostr is not meant to handle financial TXs or other OLTP workloads anyway, if you want that go use a database.

Nostr is a communications protocol, so you always have a database. The data has to be parked, someplace, before it can be relayed or fetched, after all.

This is about the efficiency of moving the information around.

even WITH chats it still makes no sense to add all that extra complexity

if it was streaming audio/video, different story. but then you would use RTP instead anyway.

I don't think we need SSE from external sources, but it makes hella-sense from a local relay to a local client. The relay can be syncing/polling/streaming from other relays in the background, after all.

Those two connections can be different protocols.