The bottleneck seemed to be the D1 database. Replication was enabled. But how could sqlite ever possibly support such a high level of traffic? After a certain number of events were filled in it, it would slow down and eventually stop working (unless the D1 instance was nuked and restarted)

Reply to this note

Please Login to reply.

Discussion

Yes, in terms of writes, it should have theoretically sustained about 50k concurrents or maybe 1000/sec events. That’s of course all theory. But with read replicas it should be able to sustain much more overall traffic with regards to REQs. I wonder if it could be rebuilt to use Cloudflare’s Hyperdrive with an external DB. Tried to benchmark it all, but obviously have no insight into what amount of traffic were actually hammering the relay from the web and apps. In all honesty, I think they might need a multi-relay setup and break up certain parts handled by different relays in order to support a level of traffic a centralized system like TikTok or IG would handle.

Under a single-writer ceiling how could there be any chance? This multi-relay/specialised-types would be a neat but at some point though you have to question why nostr in the first place. There's a limit to the number of nostr qualities you can sacrifice before you get to the point where you're in a worst of both worlds situation, what could have been a performant centralised architecture is held back by a bunch of nostr tech debt that isn't enabling anything nostr-like in the first place.

A proper SQLite database shouldn't have issues with this amount of traffic. D1 isn't just SQLite though, it's SQLite + an API.

I would try using SQLite-in-DO instead of using D1. There's a good blog post about it here: https://blog.cloudflare.com/sqlite-in-durable-objects/

Secondly I would look at splitting up the database into multiple databases.

Yes, am already using SQLite-in-DO for subscription management with the Websocket Hibernation API. As visitors are geo-steered towards a regional DO. But I haven’t tested trying to handle the data independently across all of the DOs instead of a single D1, but I think that would add some serious latency and make it unnecessarily complex. I did just rollout an update that adds Cloudflare Queues to all EVENT messages, which should help write throughout and handle burst up to 5000/sec.

Cloudflare , actually I don’t know about this until I read this. The idea was great during hibernate period like me in the winter ( sleep a lot) still be able to allowing JavaScript to run Those durable object ( DO) taking from memory .

How can I help ?

I'd be curious to see how this runs on a bare metal VPS - I would expect way way higher throughput.

Another thing to try would be to batch writes - instead of writing every time something happens you collect writes in a queue and write every 50ms in one big transaction. This should help a lot with write throughput.

Reading can be done in the same way.

If you want to avoid parameter limits you can use JSON1.

Last but not least, analyze the queries using `EXPLAIN QUERY PLAN` to make sure your indices are correctly used.

Using a VPS is sort of antithetical to what I was trying to build 😅. By being entirely “serverless” it wouldn’t have a single-point of failure or other downsides such as needing to ensure you harden it, manage updates, fault tolerance, smalllr resource ceilings, etc. There’s none of the traditional nuisances with Cloudflare Workers, etc. Everything is managed.

With adding Cloudflare Queues, the write throughput is now 5x better and it does batches of 100 to the D1. The next thing on my list is to ensure further optimizing the handling of all queries and improve read performance even more efficiently than just with indices and read replicas.

Yeah - totally get that, just curious 😅

Nice! Hope it works out 💪

Thanks! I've also been working on a non-SQL version as well. Will is right and wrong about SQL for Nostr. At scale, SQL does start to crumble. With a medium sized relay (or even semi high trafficked), it's completely fine when indices are used. But at the scale of what diVine was trying to do it wouldn't be the best performance. There were also a lot of things wrong from the beginning too where customizations were added, but accompanying indices not built for the filters/tags that would be used for every REQ.

I built a working Postgres relay (Ditto) but the only way tag queries would work was with a jsonb column on nostr_events table with a gin index. I have never seen tag queries work correctly if the tags table is separate and needs a join. It only works if the data is together in the same table.

sql for nostr doesn’t work. It’s been tried.

I see! Is a document based DB the way to go? In memory?

strfry/nostrdb model is the only thing that has worked. lmdb + indexes that cover common use cases:

- created

- pubkeyKindCreated

- kindCreated

- tagKindCreated

Bonus points for storing json string and flatbuffer version of the note for optimal performance

Intersting! Did not occur to me to use a kv solution for this.