I have a recurring issue with CheckMK losing connection 1 one particular server. (rest works fine)
When I do a reinstall of the agent on the target server, it works for a period of time (not always the same duration) and then it stops reacting.
Any idea what could create such behaviour?
The target server is a “Ubuntu Linux 18.04.6” (fully updated)
No firewall is active
ping works fine in both directions
Purpose of the Linux server is VPN end point (if that is relevant)
How are you connecting to the server? Is that via xinetd, the systemd socket or ssh? If it’s one of the first two you may want to try and restart that service. A reinstall of the agent should be a bit drastic solution.
Also, see if you can get a connection from the Check_MK server to the client on port 6556:
nc -v <your server IP> 6556
If that fails, try it from the client, using localhost for the server IP.
Louis.
Also, you may want to check if there are hanging agent processes:
ps auxwww | grep check_mk
If the last command returns several check_mk_agent processes you may want to check if perhaps some took a very long time to complete and are using up the maximum number of available sockets for the check_mk agent (I believe the default is 3).
I restarted the service, which worked fine.
I don’t see any hanging processes with the command you provided
and for the last command, I get this output:
Well, it hasn’t started failing again.
As I stated in the first post, sometimes it takes 8 days before the communications fails and other times it takes 4 hours.
I will test this again when it fails again and I will report back then.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.