Hi everybody,
I have a bunch of hosts that show a flapping host state.
No ICMP echo requests are returned from those hosts, but apparently the Checkmk host can communicate with the agent via TCP.
I must then understand that successful communication with the agent results in a host being up, even tho my effective parameters for Host Check Command have Smart PING (only with Checkmk Micro Core) as its value?
Or is there some other setting that I’m not aware of?
This is a typical symptom of a local firewall blocking ICMP but allowing TCP on Port 6556.
The Smart Ping feature will look for any IP packets coming from the host’s IP address. This includes the packets that the Checkmk agent sends to the monitoring server when queried.
This is why right after an agent query the Smart Ping sees the host as UP and after 15 seconds it is DOWN for another 45 seconds until the next agent query.
You need to open ICMP on the network path from monitoring server to host (including its local firewall) or change the setting in the “Host Check Command” ruleset to use the state of the agent query for the host state.
After your input I read again the section about smart ping in the documentation, and with the benefit of hindsight I discovered that indeed there is a sentence buried inside a paragraph that suggests that hosts are marked as being up after receiving a TCP sync or reset packet.
It comes as a surprise to me as this behaviour is not documented in the main paragraph about how the smart ping works, which clearly says that only icmp responses are checked for saying a host as UP.
Since this mechanism of TCP sync/rst is a very important detail I would expect it to be highlighted in the first paragraphs.
Oh well, if contributions to the documentation are accepted, I will try to submit a PR here.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.