Thanks again. So, if it’s all yes, then in practice Gossip is also trying to connect to broken relays, right?
What sort of timeout settings and retry logic does Gossip currently use? nostr:nprofile1qqsyvrp9u6p0mfur9dfdru3d853tx9mdjuhkphxuxgfwmryja7zsvhqpz9mhxue69uhkummnw3ezuamfdejj7qgswaehxw309ahx7um5wghx6mmd9uq3wamnwvaz7tmkd96x7u3wdehhxarjxyhxxmmd9ukfdvuv also mentioned that broken relays aren’t as much of a problem in Amethyst as they are in Haven. I get the feeling that each of us may be handling this differently.
Haven tests relays during startup. It also uses a Penalty Box: if a relay fails to connect, it retries after 30 seconds, and then exponential backoff kicks in. Out of curiosity I put a counter on it, and with my current implementation this meant ~110k failed connection attempts in the first 24 hours after restarting the relay (though as exponential backoff kicks in the number of connections reduce substantially). Of course that I could make the "broken relay detection" converge faster by tweaking exponential backoff, but this also means that the next time that one of the popular arrays go offline Haven will quickly give up on them.
This is where a good blacklist could be useful. You can do this without fully trusting the list as well. I.e., if I can use Nosotros list above, still try to connect to all relays during initialisation, but if the connection fails and the relay is in the "black list", then I set a much higher time until reconnect for the exponential backoff algorithm (e.g. 1 hour or even 24 hours). So if the list is wrong or one of the dead relays gets "ressurected" Haven will eventually connect to it, but it makes the whole process much cheaper.