CMK version: Checkmk Enterprise 2.2Op23 OS version: Debian 11
Error message:
Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)
We’re having trouble suppressing device notifications for extended periods. We use wireless links that sometimes go offline. Currently, every online offline status is reported, and this isn’t good for our uptime score. How can I change checkmk so that notifications that only go offline after 30 minutes or even 60 minutes are marked as offline?
Hi Paul, thank you for your answer. The connections vary. The wireless network is used for cameras and when a camera goes offline for let say 1h then I can get notified….. When I acknowledge a camera or netwerk device that is offline becaused it missed a couple of pings.. and shows up with good status couple minutes later….Normally it takes a long time to get permission on working on the network so I only want to do that when the offline status is for a long time and not spoil my beautifull map with no red dots…… Yet..
As you are using Checkmk Enterprise you can tweak Smart Ping and increase the monitoring interval using the following rules as well:
Settings for host checks via Smart PING
Normal check interval for host checks
Using that you might need to remove the Maxium number of check attempts rule along with the retry check interval. The notification delay will be useful anyway.
Basically only a hard state is generating a notification. To influence when a state is considered “hard” the rule is "Max. check attempt:
The maximum number of failed checks until a service problem state will be considered as hard. Only hard state trigger notifications.
So, if you want to be notified after 60 min if a host is down and your “Retry check interval for host checks” is 60 sec you need to define the value in the rule “Maximum number of check attempts for host” to 60 x 60 = 3600
Same for services but for hosts also consider:
This setting is relevant if you have set the maximum number of check attempts to a number greater than one. In case a host check is not UP and the maximum number of check attempts is not yet reached, it will be rescheduled with this interval. The retry interval is usually set to a smaller value than the normal interval. The default is 6 seconds for smart ping and 60 seconds for all other.
To calculate your uptime please use ‘Only Hard States’ .