4 hosts, 21 OSDs, a bit over 100TB total raw capacity.

OSDs are a mix of big expensive SSDs and "HDDs with an SSD as a DB device" (and thus it also serves as a WAL device). There are no HDDs without an SSD in front of them.

Workload is about 40 VMs which vary greatly in disk activity. DNS, basic web services are super lightweight, but heavier hitters like the InfluxDB server, GitLab, and Mastodon.

Reply to this note

Please Login to reply.

Discussion

Sounds extremely overcomplicated for this workload if being honest :)

What about network equipment you use for CEPH? Also, did you configure RDMA?

I did it for the live migration capabilities. Probably overbuilt, but live and learn. It wasn't fast enough without the SSD DB devices. But then again, I feel that it's still ot fast enough, so 🤷

Separate 1Gb net for ceph backend

No RDMA. I have too many security concerns to enable that.

Bump to 10gb backend to start. I’ve read it’s extremely chatty.

I’ll look into this, I’ll need to anyways.

Yeah, I've heard that too, but I've done bandwidth testing on that network and it has about 900-950Mb of bandwidth to spare.

I'm not trying to get 1Tb/second or anything crazy like that. I'm just frustrated at getting less than 1MB/second.

Hmm. Odd. This is where we rtfm lol