All hosts are showing as DOWN in WATO

itababa · March 10, 2023, 3:58am

**CMK version: 2.1.0p18
OS version:

Error message:

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

Hi All,
It appears I had a network glitch last night. Now all hosts except the ones in the same subnet as the CheckMK server are showing as DOWN in WATO. I have run the “connection tests” for several of the hosts, and both ping and agent tests are successful (green). I have even restarted CheckMK (on the server and a few of the monitored hosts) but nothing has made a difference. Note that the services on all the hosts are fine/OK. The monitored hosts are in about 5 different subnets. Strangely enough a couple of hosts in each of the other subnets appear to have either survived the issue or recovered and are showing UP as expected.
I found this, but it wasn’t really helpful: Server showing as down in Host groups but the Check_mk service is green

mike1098 · March 10, 2023, 7:12am

If you use smartping you need to check if processes icmpsender and icmpreceiver are running. I recently had the situation that membership in group omd vanished for site user

OMD[beta]:~$ id
uid=982(beta) gid=1005(beta) groups=1005(beta),979(omd)

itababa · March 10, 2023, 6:23pm

Hi Mike,
My site user is still a member of the omd group. I am not clear on how to check for the two processes you mentioned. If I run a “ps -ef | grep icmp” on the server, I can see nagios check_icmp commands running against different IPs.
I noticed today that the number of hosts reported as down has gone up from 50 (yesterday) to 101, yet the services on all the systems are still OK.

itababa · March 10, 2023, 8:18pm

Also I don’t have files icmpsender and icmpreceiver anywhere on my system, so I suspect they only apply to checkmk core and not the free RAW edition.
But I am encountering exactly the same issue by the OP at the forum post below where I can ping the target host(s), but check_icmp is returning critical for the same host(s) (when I execute it from the shell).
So what I have done is to change the “Host Monitoring Rules/Host Check Command” to “TCP Connect” (port 6556) so now all the hosts are back UP. But would definitely like to fix the check_ping issue.

itababa · March 10, 2023, 8:20pm

Sample run:
OMD[mysite]:~$ /opt/omd/versions/2.1.0p18.cre/lib/nagios/plugins/check_icmp myserver.mydomain.edu
CRITICAL - myserver.mydomain.edu: rta nan, lost 100%|rta=0.000ms;200.000;500.000;0; pl=100%;40;80;; rtmax=0.000ms;;;; rtmin=0.000ms;;;;
OMD[mysite]:~$

OMD[mysite]:~$ ping -c 2 myserver.mydomain.edu
PING myserver.mydomain.edu (192.168.122.5) 56(84) bytes of data.
64 bytes from 192.168.122.5 (192.168.122.5): icmp_seq=1 ttl=63 time=4.12 ms
64 bytes from 192.168.122.5 (192.168.122.5): icmp_seq=2 ttl=63 time=6.40 ms

— myserver.mydomain.edu ping statistics —
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 4.123/5.264/6.406/1.143 ms
OMD[mysite]:~$

Ashishkt · May 18, 2023, 9:01am

I’m running 2.0.0p6 raw on RHEL 8 and all my hosts are reporting “DOWN” though all service checks are checking Ok.

Solution: Click on setup and search in setup “Host Check Command” and create a rule and change “Host Check Command” to “TCP Connect” and apply to folder/host.

it worked for me, thank you! @mike1098
But i am still confuse why this issue is only when i was trying to add new host but not with the existing hosts.

system · May 17, 2024, 9:01am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.