Windows Host not reachable sporadic

Hi

We have about 100 Windows Servers which we check over the agent.
Everything is running smooth.
But once a mont, we have a host, which is not reachable anymore from checkmk.
Sometimes, it is reachable from another host with port 6556 but not from checkmk.

We had that in 1.6 and also in 2.0. We tested several versions, also including with updating the agent on the host.

The only way to get this host back into an active state is to uninstall the agent.
Delete C:\Programdata\checkmk
install agent again.

I haven’t found anything which can lead to this timeout from checkmk.

any ideas?

Hi @righter,

this sounds like a problem related to firewall settings or anti virus programs which block the request to the agent. Especially because you say it happens in different versions of you monitoring server and agent. Have you also tried to restart the service on windows if the problem occurs?

Hi

A reboot of the server or restart of the service, has no effect.
I have to reinstall it and kill the programdata folder to get it back online.

I’ll check next time if there is maybe the firewall blocking, but normally we have not activated it.

Hi, now I have a host which is unreachable from checkmk:

I was able to connect 2 times but nothing returned, after that the connection gets dropped:

root@checkmk:~# telnet 192.168.11.228 6556
Trying 192.168.11.228...
Connected to 192.168.11.228.
Escape character is '^]'.
Connection closed by foreign host.
root@checkmk:~# telnet 192.168.11.228 6556
Trying 192.168.11.228...
Connected to 192.168.11.228.
Escape character is '^]'.
Connection closed by foreign host.
root@checkmk:~# telnet 192.168.11.228 6556
Trying 192.168.11.228...
telnet: Unable to connect to remote host: Connection refused
root@checkmk:~# telnet 192.168.11.228 6556
Trying 192.168.11.228...
telnet: Unable to connect to remote host: Connection refused
root@checkmk:~# 

Windows Firewall is disabled
Windows virus & threat protection is disabled
Windows app & browser control is disabled
no additional Anti Virus is installed

no idea what is going wrong here

Can you also check, if there is a setting at the windows service which only allows certain IPs to contact this service? There is an option inside the agent config too with this function. But this wouldn’t explain why it works at the beginning and stops working after some time.

Can you also check, if this situation occurs, if the post 6556 is open and listening on the windows machine? If it’s not, there is maybe something wrong with the agent. If it’s open and listening, the network traffic is blocked between the host and the agent.

HI @tosch

Haven’t found an IP restriction on the CheckMK Service.
Also the agent is not restricted to any specific IPs

The server still listens to the port
TCP 0.0.0.0:6556 0.0.0.0:0 LISTENING

I can successfully from the server on which is the agent installed.
Also from another host, I can connect to the agent.
But not from checkmk anymore. I’ve also checked the firwall, there is no block otherwise I would become an timeout.

Maybe a problem on the checkMK server host itself?

What does your monitoring server say about connecting to this port (not via telnet client).
nc -nuv <ip> 6556 (maybe on your server the command is called netcat)

A totally different question, have you enabled encrypted agent communication?

Netcat output:

root@checkmk:~# netcat -nuv 192.168.11.228 6556
(UNKNOWN) [192.168.11.228] 6556 (?) open

root@checkmk:~#

We have no encrypted agents, we use the default settings

I also did a sniff on the server with the agent.
It makes the 3-way handshake and then it sends a reset to the checkmk server

Hmm the service crashes, just found in the event logs

Faulting application name: check_mk_agent.exe, version: 2.0.0.0, time stamp: 0x61432dbd
Faulting module name: ntdll.dll, version: 10.0.17763.2145, time stamp: 0xa211e4d0
Exception code: 0xc0000005
Fault offset: 0x000000000003a252
Faulting process id: 0x1184
Faulting application start time: 0x01d7d7a418039bed
Faulting application path: C:\Program Files (x86)\checkmk\service\check_mk_agent.exe
Faulting module path: C:\Windows\SYSTEM32\ntdll.dll
Report Id: 2da34f70-e6f0-4f7a-a1b0-b577836b1c89
Faulting package full name: 
Faulting package-relative application ID: 
The Check MK Service service terminated unexpectedly.  It has done this 1336 time(s).  The following corrective action will be taken in 2000 milliseconds: Restart the service.
1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.