Minimizing false positive monitoring alerts

Dhananjay · January 7, 2024, 7:24am

Dear All,

I am experiencing a high number of false positive alerts, even though the system being monitored is functioning properly. When the monitored system is up and working correctly, we receive notifications that it is down and then back up again. After checking the network interface, we have found no indication of any network interruptions during the specified time frame, and the system has not been rebooted. We would appreciate guidance on how to adjust the monitoring system’s notifications to avoid being overwhelmed with irrelevant alerts and instead receive only the truly pertinent ones.

I really appreciate any help you can provide.

Regards.
Dhananjay

andreas-doehler · January 7, 2024, 9:23am

Then you have some type of network problem. The down and up message only depends on ICMP echo requests.

For the notification you should consider the time you want before you get the first notification. If you say that you want the first notification only after 3 minutes down, you can configure a rule like “Delay host notifications”.
Another option to get the same result is the rule “Maximum number of check attempts for host”.

rons4 · January 8, 2024, 7:29am

If there is an incident only for a short time, you might also try to handle this from the alerting side. For example, SIGNL4 supports delayed notifications. So, if the error event is submitted it can wait a couple of minutes if there is a following OK event. Only if no OP / UP is received, you get the alert notification, e.g. a 24x7 wake-up call.

elias.voelker · January 8, 2024, 10:43am

Hi @Dhananjay

here are a few general strategies how to deal with false positive alerts in Checkmk: Minimizing false positive monitoring alerts with Checkmk | Checkmk

Best
Elias

system · January 7, 2025, 10:44am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.