Subnostr

== How to configure Nginx to use a backup relay when one crashes ==

TLDR: More reliable relays, happy users.

This may be a tip for new relay operators. Experienced ones might find this boring.

I have been playing with some configurations in order to make my relays more reliable and closer to the users. Overall user experience should increase. While doing that, several days ago Northwest USA copy of nostr.mom crashed once. The reason was probably my spam filter using a lot of memory or the cloud server running 3 strfry instances (1 relay and 2 streams) at the same time. The server had 16GB memory and it was not enough for all of these it seems. Strfry depends on LMDB and LMDB uses a lot of caching, this temporarily results in less amount of completely free memory. Apps that cant find memory can be crashed by linux. I had to find a way to redirect users to the other server when a crash happens.

e.nos.lol has two copies now. I am going to use A and B for representing IPs. One is at A.A.A.A, and the other at B.B.B.B. Any user on the planet is redirected to one of these, whichever has lower latency for him. But what will happen if the relay software (strfry) crashes in one of the copies. Is there a way to send users to the other server automatically?

The DCs, hardware and nginx are reliable and ancient technologies. Strfry is very reliable too but if another process is eating a lot of memory linux can decide to close one of the processes on the same machine even though they were playing safe. By the way recent suggestion for strfry runners is to use some swap space to be on the safe side! I think this makes freely allocatable memory to be a big amount (that includes the free space in the swap).

I checked my latency based DNS service, bunny.net, they were offering a solution that involved pinging servers. But my hardware was fine and I needed a solution that checks the websocket server (strfry).

Then I figured I can use nginx reverse proxy feature to achieve my redundancy goals! If a relay crashes nginx can use the backup websocket server. Then users would not see the relay as offline. It would be slow to fetch notes from a distant server but it would still be functional until the other server was restarted.

This is the config on B.B.B.B that does that:

upstream backend {

server 127.0.0.1:7777; # this is the normal relay that runs on B.B.B.B

server A.A.A.A:7777 backup; # this is where nginx will fetch from, if the above strfry instance fails

}

server {

server_name e.nos.lol;

proxy_next_upstream error timeout http_502;

location / {

proxy_pass http://backend;

# ..... other stuff

}

# ..... other stuff

}

A similar config should be on A.A.A.A.

Both servers should have firewall entries that would allow the nginx on the other server to communicate:

sudo ufw allow from A.A.A.A

sudo ufw allow from B.B.B.B

If the local relay is running I can fetch 50 records in 0.3 seconds.

If the local relay fails, nginx uses the distant relay and then I can fetch 50 records in 0.7 seconds. This increased latency shows that the packets are moving between B and A.

You can say "why so much trouble, you could just make a service and it would restart the relay". But what if it doesnt restart for some reason? Or write policy plugin fails to restart, nobody can write? One of my scripts takes long time to load (thank you spammers)!

sommerfeld 2y ago

Ideally the nginx should be run on a 3rd high availability independent machine.

If machine A runs relay A + nginx, it's very likely that both get unavailable and nginx won't even be reachable to proxy to relay B on machine B.

Reply to this note

Please Login to reply.

Discussion

No replies yet.