I was surprised not to discover any application based checkmk solution how to setup a cluster. IMHO this should be available for a product with a business critical approach. So I don’t want to monitor a cluster. I would like to be a cluster “myself”
We currently do have the following setup:
Here we are happy:
Managemgent server that just syncs config changes to the distributed boxes. Here we can afford downtime.
Frontend servers for the end users. These boxes just do a livestatus connection. Besides notifications and reports they are redundant. Here we are also ok.
Here we are unhappy:
With our distributed monitoring. In case of downtime we loose at lease performance data (when we retrieve an older backup). However and even worse is that during rebuild of these machines - might take up to an hour - even though it is up to 95% automated - the checks are simply unavailable.
Now we would like to have a solution on the application layer that helps us to use two checkmk instances with different IPs and without any DNS switch.
We simply would like to sync these “backend” instances and use them in parallel that there is no downtime for the user. Like we are able to do it with frontend systems.
So we simply would like to do the following: if distributed istance1 is gone just use distributed instance2.
I know about corosync and drbd. These were all cool tools in the past but in my case I don’t want to use them any longer.