I find application level HA to be a significantly better investment of resources as well, as Proxmox does not have very fast failover and cannot respond accordingly to the application requirements.
A DB server would for example take a minute to come up from a fault and you’d incur the cost of the SAN for all operations, and even then it’s single node.
I use FoundationDB so it can tolerate failures of storage servers with 0 downtime (requests are rerouted to a replica and transparently healed); failures of some critical roles like logs take only a few seconds to recover from only interrupting writes or new read transactions.
When you have the underlying components be able to easily handle faults and the application stateless, it is an easy problem to solve.
Thanks for the reference to FoundationDB, I’ll take a squiz at that. With broader resilience objectives I’m looking at the Ceph route which delivers the storage+compute resilience .. at the cost of an increased HW base though
Thread collapsed