That anecdote is a power issue though. This is where wholistic approach matters highly, one bad power system fucking higher layers is so damn frequent and it sucks when that’s what you’re stuck using.
No power, no revenue.
That anecdote is a power issue though. This is where wholistic approach matters highly, one bad power system fucking higher layers is so damn frequent and it sucks when that’s what you’re stuck using.
No power, no revenue.
The network outage was intermittent, and 2 nodes still had quorum (yes Im aware that's too low a vote), taking the ENTIRE cluster down over 5 seconds of network loss is absolutely nuts to me. The machines in the picture I shared had consumer UPSs, crappier network cards, configurations, and switches in comparison and still did better in terms of stability.
Whey can't proxmox just kill all services to accomplish fencing? It takes like 10 minutes for a single server to boot into the OS. I think even the kernel watchdog can do a reset without a full system reboot.
And no I'm not going to adjust my hardware to boot faster, it's old and needs memory checking and firmware updates. Hardware reboot should not be a normal condition.
I also cannot fix crappy power, that's a condition id expect to survive... Line interactive UPS are prohibitively expensive even for many businesses.