Fr

Reply to this note

Please Login to reply.

Discussion

That anecdote is a power issue though. This is where wholistic approach matters highly, one bad power system fucking higher layers is so damn frequent and it sucks when that’s what you’re stuck using.

No power, no revenue.

The network outage was intermittent, and 2 nodes still had quorum (yes Im aware that's too low a vote), taking the ENTIRE cluster down over 5 seconds of network loss is absolutely nuts to me. The machines in the picture I shared had consumer UPSs, crappier network cards, configurations, and switches in comparison and still did better in terms of stability.

Whey can't proxmox just kill all services to accomplish fencing? It takes like 10 minutes for a single server to boot into the OS. I think even the kernel watchdog can do a reset without a full system reboot.

And no I'm not going to adjust my hardware to boot faster, it's old and needs memory checking and firmware updates. Hardware reboot should not be a normal condition.

I’ll have to explore behavior more, most settings can be tweaked so there should be enough wiggle room.

I'm playing with pacemaker and SBF which appears to accomplish STONITH fencing via the kernel watchdog, but I'm not certain yet. I haven't gotten my cluster established yet.

I also cannot fix crappy power, that's a condition id expect to survive... Line interactive UPS are prohibitively expensive even for many businesses.