All hosts are showing as DOWN in WATO

**CMK version: 2.1.0p18
OS version:

Error message:

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

Hi All,
It appears I had a network glitch last night. Now all hosts except the ones in the same subnet as the CheckMK server are showing as DOWN in WATO. I have run the “connection tests” for several of the hosts, and both ping and agent tests are successful (green). I have even restarted CheckMK (on the server and a few of the monitored hosts) but nothing has made a difference. Note that the services on all the hosts are fine/OK. The monitored hosts are in about 5 different subnets. Strangely enough a couple of hosts in each of the other subnets appear to have either survived the issue or recovered and are showing UP as expected.
I found this, but it wasn’t really helpful: Server showing as down in Host groups but the Check_mk service is green

If you use smartping you need to check if processes icmpsender and icmpreceiver are running. I recently had the situation that membership in group omd vanished for site user

OMD[beta]:~$ id
uid=982(beta) gid=1005(beta) groups=1005(beta),979(omd)

Hi Mike,
My site user is still a member of the omd group. I am not clear on how to check for the two processes you mentioned. If I run a “ps -ef | grep icmp” on the server, I can see nagios check_icmp commands running against different IPs.
I noticed today that the number of hosts reported as down has gone up from 50 (yesterday) to 101, yet the services on all the systems are still OK.

Also I don’t have files icmpsender and icmpreceiver anywhere on my system, so I suspect they only apply to checkmk core and not the free RAW edition.
But I am encountering exactly the same issue by the OP at the forum post below where I can ping the target host(s), but check_icmp is returning critical for the same host(s) (when I execute it from the shell).
So what I have done is to change the “Host Monitoring Rules/Host Check Command” to “TCP Connect” (port 6556) so now all the hosts are back UP. But would definitely like to fix the check_ping issue.

Sample run:
OMD[mysite]:~$ /opt/omd/versions/2.1.0p18.cre/lib/nagios/plugins/check_icmp myserver.mydomain.edu
CRITICAL - myserver.mydomain.edu: rta nan, lost 100%|rta=0.000ms;200.000;500.000;0; pl=100%;40;80;; rtmax=0.000ms;;;; rtmin=0.000ms;;;;
OMD[mysite]:~$

OMD[mysite]:~$ ping -c 2 myserver.mydomain.edu
PING myserver.mydomain.edu (192.168.122.5) 56(84) bytes of data.
64 bytes from 192.168.122.5 (192.168.122.5): icmp_seq=1 ttl=63 time=4.12 ms
64 bytes from 192.168.122.5 (192.168.122.5): icmp_seq=2 ttl=63 time=6.40 ms

— myserver.mydomain.edu ping statistics —
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 4.123/5.264/6.406/1.143 ms
OMD[mysite]:~$

I’m running 2.0.0p6 raw on RHEL 8 and all my hosts are reporting “DOWN” though all service checks are checking Ok.

Solution: Click on setup and search in setup “Host Check Command” and create a rule and change “Host Check Command” to “TCP Connect” and apply to folder/host.

it worked for me, thank you! @mike1098
But i am still confuse why this issue is only when i was trying to add new host but not with the existing hosts.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.