May I ask what is your current CEPH setup?
Discussion
4 hosts, 21 OSDs, a bit over 100TB total raw capacity.
OSDs are a mix of big expensive SSDs and "HDDs with an SSD as a DB device" (and thus it also serves as a WAL device). There are no HDDs without an SSD in front of them.
Workload is about 40 VMs which vary greatly in disk activity. DNS, basic web services are super lightweight, but heavier hitters like the InfluxDB server, GitLab, and Mastodon.
Sounds extremely overcomplicated for this workload if being honest :)
What about network equipment you use for CEPH? Also, did you configure RDMA?
I did it for the live migration capabilities. Probably overbuilt, but live and learn. It wasn't fast enough without the SSD DB devices. But then again, I feel that it's still ot fast enough, so 🤷
Separate 1Gb net for ceph backend
No RDMA. I have too many security concerns to enable that.
Bump to 10gb backend to start. I’ve read it’s extremely chatty.
I’ll look into this, I’ll need to anyways.
