Connecting to relays that are not working is pretty cheap. So, I just try to connect and ignore if the relay is offline.
Discussion
Iām doing the same. The real problem is telling apart flakey or temporarily down relays from ones that are actually dead. E.g, if I treat āfail to connect during startupā as dead, I risk missing relays that are just temporarily offline, which can be catastrophic for long-running processes like relays. But if I donāt, then I need increasingly complex retry algorithms (āpenalty boxā, exponential backoff, etc.), which still wastes time and resources retesting truly dead relays. Even worse, some dead relays just hang connections. Right now Iām using a 5-second timeout when testing relays, and even that adds up quickly when combined with retry algorithms across thousands of relays. A good blacklist would be very useful.
You can also check for HTTP status codes when opening the connection. Many relays are offline and sending 400 and 500 codes. Those you can re-test once a day or so.
Significnatly less efficient than NIP-66
I don't like blacklists here because that becomes another point of trust. I'd prefer to know for myself if a relay never works for me. So I want the client to do this and to remember.