Iām doing the same. The real problem is telling apart flakey or temporarily down relays from ones that are actually dead. E.g, if I treat āfail to connect during startupā as dead, I risk missing relays that are just temporarily offline, which can be catastrophic for long-running processes like relays. But if I donāt, then I need increasingly complex retry algorithms (āpenalty boxā, exponential backoff, etc.), which still wastes time and resources retesting truly dead relays. Even worse, some dead relays just hang connections. Right now Iām using a 5-second timeout when testing relays, and even that adds up quickly when combined with retry algorithms across thousands of relays. A good blacklist would be very useful.
Discussion
You can also check for HTTP status codes when opening the connection. Many relays are offline and sending 400 and 500 codes. Those you can re-test once a day or so.
Significnatly less efficient than NIP-66
I don't like blacklists here because that becomes another point of trust. I'd prefer to know for myself if a relay never works for me. So I want the client to do this and to remember.