Notifications by host criticality

CheckMK Version 2.3.0p27

After working on and off with CheckMK for a few years (and having a training for it), we are using it for the monitoring of a little number of PostgreSQL database servers. The servers are of different criticalities, we have test, integration and production servers. For the notifications we want the production servers to alert 24x7, all the other servers only during business hours.

For that, additional to the 24x7 time period we have created one for the business hours, listing business days (Mo-Fr) and business hours (6:30-15:30).

For all the servers that are in the “DB servers” folder we have configured them as either “Production”, “Test” or “Integration” criticality, using the Host tag that is pre-configured.

So, with all this in place, 2 host notification rules were created: One checking the “Production” host tag, alerting 24x7, one that checks that it is not a “Production” system and alerts only during business hours - matched by host event types. A third rule for services was created, relying on the time period configured on the services, matching by relevant service event types.

We thought that this setup is straight forward and should do what we expect it to do. However, somehow we get host alerts during night hours for non production systems. We assume that is because on the server the notification period of all servers is on 24x7. We thought that the rule on when to alert would overwrite this:

Match host tags → Criticality → Is → Production System
Match only during time period → 24x7

Match host tags → Criticality → IsNot → Production System
Match only during time period → office_hours

But that doesn’t seem to work like we expect it, which means our expectations are probably wrong. So that means that we have to change the setup to prevent these false-positives. But which way to go?

  • Find out how to change the check period on hosts and change for all the non production hosts to office_hours
  • Somehow fix the notification rules to not alert during the night for non production systems

We’d be grateful for tips on how to do either. Although it would be interesting to learn why our setup fails.

I would recommend to set the notification period based on the criticality and then just have one notification rule.

You need to use both rulesets “Notification period for hosts” and “Notification period for services”.

Thanks for your suggestion. I have altered the setup as you suggested. I am currently waiting for an event during off-hours to see if there will be an email.