CheckMK HA cluster

Hello!

I want to confirm my use case for clustering checkMK appliances. Is this a suitable method of creating redundancy across separate datacentres? Or are there limitations that make it more suitable for within a single physical location.

My intention is to have a pair of clustered appliances, one at each datacentre, hosting a master site. The master site will use distributed monitoring to view sub-sites. These sub-sites would monitor hosts local their same physical location, and may themselves be on clustered appliances.

This would ideally give a monitoring platform always available from the same ip, with actual monitoring as close to the monitored hosts as is feasible.

Thanks

If you use the appliance cluster functionality you will have a classic DRBD based Linux failover cluster. I don’t know the constrains for DRBD if you want to use it on an geo redundant cluster.
The normal failover cluster in the same datacenter is no problem.

1 Like

DRBD and corosync really need low network latency. This will not work across separate datacenters.

1 Like

Thanks for the information. I guess I expected such limitations.

Altough DRBD might expect or require low latency, a multiple datacenter setup might be still achievable. I guess it depends really on the physical distance and the latency on the network. On my job we are using two datacenters, the are physically within 20 kilometers and network latency is below 0.4 ms. This setup allows for a split DC setup. It will all depend on the actual network latency, not if it is in one or two data centers. I’ve seen datacenters where the latency was bad within one datacenter (5-10 ms for a simple ping)

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact @fayepal if you think this should be re-opened.

I have to add one remark, because the question came up elsewhere: Do not cluster virtual appliances!
For physical appliances - as discussed here - it is the intended use case: To avoid downtime due to hardware failure. For virtual appliances, your hypervisor should take care of this!

1 Like