What happens when one site is down?

Hi,
I have configured a site (let’s call it site2) in check mk to work as if it were a worker. This site will host those hosts with a particular feature, thus relieving the load on the central server.

But I have a doubt, what happens when the site2 stops the omd service or the server is shutdown? The hosts are no longer monitored? Is there any way to auto-assign the site that monitors these hosts by one that is active?

Hi,

one possible way is to Setup Checkmk on an HW Appliance:

So you can Setup a virtual appliance as a secondary node with a physical appliance as the primary for making an HA Cluster.

In this scenario the virtual appliance will take over only in case of hardware failure of the primary node. As soon as the hardware is up and running again, you should make it the primary again.

Regards,
Petra

Hi thanks for that @PetraH

Is there any other way to have another worker server running some tasks? Just like in nagios we had gearman…

Hi,
Just a short opinion on the subject (without wanting to start a fundamental discussion). Checkmk should offer at least for its enterprise version an integrated possibility to automatically detect the failure of a site and then migrate agents attached to it. As an enterprise customer we actually expected this.

Many greetings
Christian

1 Like

@CFriedrich do you know if a feature suggestion on the feature portal exists for that? Because I’d vote for that :slight_smile:

Hello @elias.voelker ,
actually there is no proper feature request for this yet. However, this one goes in that direction:

So far it was also the case that everywhere I asked for such a feature, I was told that you have to write an HA script yourself. In general it is also the case that by the fact that with an agent move a new discovery must be made any self-written scripts run so incredibly long that it makes no sense for us to use such a solution. An agent HA feature would have to ensure that a system does not have to be re-discovered when it moves to another site. There are other feature requests that maybe can help, where users want the agent data (rrd, etc.) to be available centrally and an agent can move from site to site without losing anything.

Nevertheless I have now created a new feature request:

regards
Christian

1 Like

Do you have that self-writen script or do you know how to obtain the value of which is the site that monitorize a specific host?

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.