CMK version:
2.2.0p21.cre default version (RAW)
OS version:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION=“Ubuntu 22.04.3 LTS”
Error message:
None
Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)
Hello there,
We’ve a recurrent issue with one particular host, let me explain.
We’re running CheckMK from a VPS, connected through wireguard VPN PEER TO PEER. VPN gateway is the monitoring server, it does both.
Clients are connected via VPN peer and registered via private IP not public one (it’s might be important for the following).
We had an issue this morning where
2024-03-20 06:11:48 - 7 h
HOST ALERT web_services
SOFT (DOWN)
CRITICAL - 10.0.1.6: rta nan, lost 100%
It has 3 checks attempt before entering HARD(CRIT) and this one is working.
But directly after this we’ve :
2024-03-20 06:11:59 - 7 h
SERVICE ALERT web_services [...] HTTPS -
HARD (CRITICAL) CRITICAL - Socket timeout after 10 seconds
-
The first thing is :
This HTTPs active check have a 3 times check before going criticals (see pictures above) why it’s trigger instantly as HARD(Crit) ? -
The seconds :
It’s an active check on the public IP (more precisely by the domain name) this time, even is the host has only his private IP in conf.
It shouldn’t be a host downtime and theses actives check in the same time ? We check our logs from the webservices (the monitored server) It appears that we didn’t receive the packet, but still, it shouldn’t be HARD(CRIT) as we’ve 3 checks attemps and we received the next one (1 min later and so on)
We don’t understand why we’ve a socket timeout here and why active check failed when host failed too.
We check logs on the failed host, it was up … So it’s look like a weird half false/positive/negative
I don’t know if it’s clear to understand as english is not my native language, if you need more informations i’ll be happy to give it ![]()
Thanks in advance,
Louis
