Currently running CEE 1.6.0p9 using the virtual appliances. I have one master node and two slaves. One slave exists in my secondary data center and monitors everything in there while the other slave monitors everything within my primary data center along with all of my field offices equipment. Trying to figure out the best way to do either a high availability or disaster recovery setup for this environment. I did look at the clustering capabilities of the appliances but I’m not sure if this requires a layer 2 connection between the appliances or not and I don’t see that anywhere in the documentation. What I was thinking was just replicating the master server with Veeam daily to our secondary datacenter and, in the event of a disaster, bring the replica online and just change the monitoring source for all of my field equipment to the slave I have in my secondary data center. It’s not elegant but it should work.
I’m posting here to see if maybe someone has a solution that would require less work to be done when disaster strikes.
Under the hood a corosync/pacemaker with DRBD is created. DRBD is especially picky when it comes to latency.
When you are going with the cold standby solution just make sure that all your monitored hosts accept both the primary IP and your cold standby IP as request source (checkmk agent and SNMP agent config and/or local firewalls).
ok so this option is out of the question since I have only a layer 3 connection between data enters. Is it possible to connect to “master” servers, if you will, to the same distributed monitoring slave? Or would I be stuck with just replicating the master to my secondary data center and then just updating the site that does the monitoring in WATO?