False Positives

Hello,

I have a problem with active checks providing false positives to the amount that any real positive would not be noticed any more.

This is part of the “HTTPS Webserver OK > CRIT” Message.


|Date / Time|Tue Aug 30 00:09:07 CEST 2022|
|Summary|CRITICAL - Socket timeout after 10 seconds|
|Details||
|Host Metrics|rta=0.137ms;200.000;500.000;0; pl=0%;80;100;; rtmax=0.284ms;;;; rtmin=0.063ms;;;;|

And this is the OK message I get 50 seconds later.


Tue Aug 30 00:09:57 CEST 2022
Summary HTTP OK: HTTP/1.1 302 Found - 784 bytes in 0.733 second response time
Details
Host Metrics rta=0.086ms;200.000;500.000;0; pl=0%;80;100;; rtmax=0.209ms;;;; rtmin=0.050ms;;;;
Service Metrics time=0.733495s;;;0.000000;10.000000 size=784B;;;0

I get too many of these messages to believe that they are indicative of a real problem on my site. Besides, the server, not the Website on it, is alive and reachable via IP 100% of the time. This leaves me with two questions: 1) Why do I get so many false positives with the default settings? 2) Is there anything I can change globally to make all self-defined active checks a useful source of information again? I am using Checkmk 2.0.0p23 (CRE), Server and Client run on Proxmox VMs with Debian11.

Yours sincerely
Stefan Schumacher

Hello Stefan,

The HTTPS webserver check is using the nagios plugin check_http. You may run this on CLI for debugging.
The socket timeout indicate a problem in the communication between your Web Server and the monitoring server. In your situation I would capture packets on your monitoring server and wait until the issue happens. Then analyze the packet capture.

regards

Michael

Hi Stefan,

here a few pointers that might help against the symptoms (too many alerts), but maybe not the illness: Minimizing false positive monitoring alerts with Checkmk | Checkmk

Cheers
Elias

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.