Not alerting for reboots

We have a request to monitor a number of hosts that will reboot randomly during the day or at night and we should NOT alert on this.

What would be the best way to do this?
I know we could maybe negate notifications for these hosts but then that would stop all notifications.
We will want to alert on the other services.

Hi Marc,

you could write a powershell / bash script that is being triggered with the reboot from the underlying os that sets a downtime in checkmk through the REST API.
Or if the reboot is orchestrated by another tool, the tool could also set downtimes in checkmk via scripts or REST calls.

If that is to complex and there is no other indicator when the downtime is triggered, you could just raise the nr. of check attempts for the host checks.

Might be even easier:

1 Like

Yes but they can be rebooted at random times so schedules downtimes wouldn’t work then?

We know of Checkmk users that schedule a 10 minute scheduled downtime from 7:00 to 20:00 to allow their admins to perform any short maintenence during work hours. You might extend this to a 10 minute scheduled dowtime from 0:00 to 23:59 and will only get notified if the downtime covers midnight.

Edit: This will only work out of the box for a single reboot per day.

1 Like

These are personal boxes so I cannot predict when the downtime will occur. We want to essentially NOT alert for downtime but for everything else.

So from what you’ve said above this could be done by setting a scheulde from 0:00 to 23:59 but with a 24 hour schedule downtime instead of 10 minutes?

A custom range like this? And then scheduled downtime on host?
They still want to alert on items such as CPU, disk space etc…

Hi Marc,

I think there are a few options here, depending on the desired behaviour and the specifics of your situation.

The simplest solution is probably to disable the host-level notifications for these hosts, but that assumes that you do not want to alert on host availability at all. Note that host notifications and service notifications are different, so you should be able to disable host notifications while retaining alerts for all of the host’s services.

If the requirement is to ignore downtime only if it is temporary because of a(n) (un)planned reboot, I think Andre’s suggestion of increasing the number of check attempts before Checkmk flags these hosts as down is probably the way to go.

Alternately, if you can be sure that there is no more than 1 reboot per day for any 1 of these hosts, Mattias’ suggestion of recurring scheduled downtimes may work.

Hope this helps,
Jason

How can I disable the host-level notifications in that case? I think this might be the only solution here.

There may be better options, but, if nothing else, you should be able to create a rule set to “Cancel previous notifications” (rather than the default “Create notification with the following parameters”), then, under Conditions, select Match host event type and match all host event types. Set additional conditions, as appropriate, to ensure you still get host event notifications for systems that require them.

Sorry but how do I disable the host-level notifications firstly?

Besides what was mentioned already, some alerting tools (e.g. SIGNL4) offer delayed notifications. So, if there was a temporary issue, like a server reboot but the server is up again after a few minutes you will not receive the wake-up call. Also, filtering for certain alert types can be found here.

How can I disable host level notifications please?

Hi @mgillespie1981 ,

Please refer below rule settings for disabling Host Notifications.

Regards,
DD

2 Likes

And to confirm this means it ignores alerts for down hosts but still would monitor and alert on the services?

This Rule only disables host notifications. Monitoring will work as it is.

As per Rule description, Service notifications work as it is.

1 Like