Agent Problems after update to 2.1.0

Hello,

Yesterday I updated my CheckMK firmware to 1.5.1 and the site version to 2.1.0. Since then I’ve been fixing errors so that everything doesn’t just show up in red :roll_eyes:

I now have the following error with various Linux servers (not with all).


At first I thought of firewall and DMZ, but I have servers in the DMZ that works and don’t works, as well as in the normal server network.
On Windows I have a single server (same agent as the others, not DMZ) that is causing problems. The message is here

I have already completely uninstalled and reinstalled the agents and also deleted them from CheckMK and created them again. Didn’t bring any success.
The servers are all equipped with agents that update automatically and have been activated with TLS since the latest agent.
Unfortunately I can’t find the error. Can someone please help me here.

Thanks and bye, Sascha

And also this Warning is new
image

Did you update the agents to 2.1.0? In case you did: Did you already run “cmk-agent-ctl register” on the clients to be monitored? In case you speak German, take the de version since the English version is partially machine translated:

Yes, I’m german :slight_smile:

I have already switched all agents to TLS with this command. And apart from these few servers, all the others work as well.

EDIT Windows: You might have encountered a potential bug currently under investigation. Could you provide us with some details on the Windows version used? As a workaround you might switch back to unencrypted communication for this host (two steps required, registration has to be revoked for both sides):

Linux: There might be cases when hosts were initially added to monitoring by using xinetd as means of accessing the agents. Please check whether ss -tulpn | grep 6556 shows xinetd claiming the port. If this is the case, remove the xinetd config for the CMK agent, restart xinetd and reinstall the agent package. Afterwards, 6556 should be claimed by cmk-agent-ctl and encrypted communication should be possible as intended.

Windows:
I just reversed the TLS connection, but I still get the same message in CheckMK. So that didn’t help.
Almost all servers (including the one with the error) are Windows Server 2022 Version 21H2, Build 20348.707. If you need more information, just ask :wink:

Linux:
On servers where I’m not getting any information, the command ss -tulpn | grep 6556 produces no output. As a test, I entered it into one that worked and then got this

tcp    LISTEN  0       1024                      *:6556                 *:*      users:(("cmk-agent-ctl",pid=1818639,fd=9))

I restarted one of the servers, but no improvement. It is a matter of Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-113-generic x86_64).

OK, I’ll pass on the information on the Windows server.

For the Linux hosts: Is the agent controller running? ps waux | grep cmk-agent-ctl Might the hosts be limited (by boot parameter) for a strict IPv4 only setup? Or might registration for TLS has been failed? You can check with cmk-agent-ctl status.

Just in case: We are prepared for cases where the agent controller cannot be started or crashes. You could switch to xinetd mode and disable systemd services for the agent controller while we are investigating:

Here the outputs of the two commands

I think this looks good.

Registration indeed is good. But if cmk-agent-ctl cannot be started, access fails. This might be a bug. Please provide me with the output of:

systemctl status check-mk-agent.socket
systemctl status cmk-agent-ctl-daemon.service

You might then switch to xinetd mode:

  1. Install xinetd
apt install xinetd
  1. Disable the systemd services:
systemctl stop check-mk-agent.socket
systemctl disable check-mk-agent.socket
systemctl stop cmk-agent-ctl-daemon.service
systemctl disable cmk-agent-ctl-daemon.service
  1. Install the xinetd service
/var/lib/cmk-agent/scripts/super-server/1_xinetd/setup deploy
/var/lib/cmk-agent/scripts/super-server/1_xinetd/setup trigger
  1. Disable TLS registration on the CMK server: Properties of host, menu entry Host > Remove TLS registration

Afterwards, ss should show xinetd claiming the connection test should work. We’ll come back to you to ask for details that we might not have considered in our test setups.

Edit: Added missing service to be able to re-use this answer.

Here the Output

1 Like

OK, thanks this works for me until solution.

I need another one:

systemctl status cmk-agent-ctl-daemon.service

But I just switched to xinetd mode. I hope this is no problem for this output.

1 Like

Thanks, that’s useful. Just to leave everything clean and have no conflicting services left, please also do:

systemctl stop cmk-agent-ctl-daemon.service
systemctl disable cmk-agent-ctl-daemon.service

After this command I always get the info

xinetd is already the newest version (1:2.3.15.3-1).

Then xinetd is already the latest version… It’s not installed by default, so I included it.

Done, now I only have this error

1 static service failed (fwupd-refresh)

Do you also have a solution for this?

image

The command

cmk -IIv hostname

should force service re-discovery of affected services.

To clean up this thread a bit:

  1. Are the Linux systems affected by cmk-agent-ctl not starting using a IPv4 only setup?
  2. For the Windows 2022 affected, please look into %programdata%\checkmk\agent\log

In case you do not find any useful hints, please start a new thread just covering the Windows issue. Me (or a colleague who actually works on Windows) will then contact you and might ask for a complete archive of %programdata%\checkmk\agent\