the last node from my multi-node proxmox cluster will be shut down pretty soon, as everything has been moved to docker containers on bare metal

this will also unlock significantly more flexibility for the future

it has a good run for the last 1.5 years but proxmox is not the right tool anymore

Reply to this note

Please Login to reply.

Discussion

Curious why? Am also running a proxmox cluster and value the HA aspect that provides continuity/resilience when the bare metal for one of the nodes fails..

It is designed primarily for architectures where there is a SAN involved (so HA actually is useful), the network latencies are fully predictable and controlled (otherwise HA makes your entire cluster collapse) and where workloads are less dynamic in terms of deployment and scaling.

While it is a great way to manage VMs or containers, even those aren’t very great when you want to simplify it a lot

For the cluster I am running it involves a lot of workloads that need the minimal amount of layers between the hardware and software (high performance databases) and where I may want to tear down and rebuild everything frequently.

HA is managed by the application layer in my stack, and there aren’t many colocated tenants that can’t demand stricter isolation than docker containers or users.

I use it on my homelab though to divide a single large machine up.

SAN or NAS, yes for sure .. but especially minimal network latency

I have that (10G+) so it’s working well for my objectives; ACK that there’s many other ways to deliver resilience in alt setups where that isn’t available or reliable

I find application level HA to be a significantly better investment of resources as well, as Proxmox does not have very fast failover and cannot respond accordingly to the application requirements.

A DB server would for example take a minute to come up from a fault and you’d incur the cost of the SAN for all operations, and even then it’s single node.

I use FoundationDB so it can tolerate failures of storage servers with 0 downtime (requests are rerouted to a replica and transparently healed); failures of some critical roles like logs take only a few seconds to recover from only interrupting writes or new read transactions.

When you have the underlying components be able to easily handle faults and the application stateless, it is an easy problem to solve.

Thanks for the reference to FoundationDB, I’ll take a squiz at that. With broader resilience objectives I’m looking at the Ceph route which delivers the storage+compute resilience .. at the cost of an increased HW base though

Relatable - I've been reworking homelab systems. It's such a high effort process but so with it