Linux Agent installation fails silently - service/socket unable to start

CMK version: 2.1.0p11
OS version: Debian 11, Ubuntu 20.04.4, Ubuntu 21.10

Error message: None however the agent service never starts

Output of “cmk --debug -vvn hostname”: cmk not found

I’ve installed checkmk in a virtual machine in my Proxmox environment (7.2-7) on Debian 11.4 which seems to work great. I then proceeded to download the agent for Linux (Setup > Agents > Linux) on a fresh installation of above mentioned versions of Debian/Ubuntu in a VM. I elevate myself by using sudo su - and proceed to login as root on the freshly installed server and then I run dpkg -i check-mk-agent_2.1.0p11-1_all.deb.

Adding that these OS images haven’t been touched by my otherwise default setup with cloud-init/terraform/ansible etc. Vanilla images. Same issue on Cloud images from either Ubuntu/Debian.

This is the output I get:

Selecting previously unselected package check-mk-agent.
(Reading database ... 71816 files and directories currently installed.)
Preparing to unpack check-mk-agent_2.1.0p11-1_all.deb ...
Unpacking check-mk-agent (2.1.0p11-1) ...
Setting up check-mk-agent (2.1.0p11-1) ...
Deploying systemd units: cmk-agent-ctl-daemon.service check-mk-agent.socket check-mk-agent-async.service check-mk-agent@.service

Deployed systemd

Upon checking if check-mk-agent.socket is running I can see that it’s loaded but not enabled:

● check-mk-agent.socket - Local Checkmk agent socket
     Loaded: loaded (/lib/systemd/system/check-mk-agent.socket; disabled; vendor preset: enabled)
     Active: inactive (dead)
     Listen: /run/check-mk-agent.socket (Stream)
   Accepted: 0; Connected: 0;

I manually enable check-mk-agent.socket and try to start it but I end up with:

Sep 06 04:48:17 check-ubuntoo systemd[1]: Starting Local Checkmk agent socket.
Sep 06 04:48:17 check-ubuntoo systemd[2196]: check-mk-agent.socket: Failed to resolve user cmk-agent: No such process
Sep 06 04:48:17 check-ubuntoo systemd[1]: check-mk-agent.socket: Control process exited, code=exited, status=217/USER
Sep 06 04:48:17 check-ubuntoo systemd[1]: check-mk-agent.socket: Failed with result 'exit-code'.
Sep 06 04:48:17 check-ubuntoo systemd[1]: Failed to listen on Local Checkmk agent socket.

I’ve tested this on three different freshly installed VM’s using above mentioned OS versions and end up with the same exact issue and error on the service. I haven’t found anything by googling other than:
xinetd might be blocking systemd: I don’t have xinetd
reboot the server: Does not make a difference. I’ve uninstalled the package, rebooted the server, installed the package, rebooted the server etc.

Any ideas?

What is the output of this command?

ss -tulpn | grep 6556

No output at all when looking for something listening on 6556. Pretty sure the socket is what should be listening on that port and it’s unable to run.

Could you check whether the cmk-agent user was created?

grep cmk /etc/passwd

Und the directory /usr/lib/check_mk_agent? This looks like a bit of a regression in post install snippets to me. (I did not yet do a clean install of p11 on Debian 11).

Nope, hasn’t added the cmk user either. /usr/lib/check_mk_agent directory exists though:
image

Please run:

/var/lib/cmk-agent/scripts/cmk-agent-useradd.sh

Then check whether the service can be started. (Easiest to reboot.)

This did work, as in, it created the user and enabled the socket service to start however checkmk is unable to communicate with the host. It fails on every connection test except ping and traceroute (I have no security at all in between checkmk and this host).

Does ss -tulpn | grep 6556 now indicate a listening Checkmk agent controller? If this is the case you should now proceed to the agent registration without caring whether unencrypted communication is possible at the moment.

Unfortunately no…
image
I apologize for not coming back with more things I’m trying but I’m sort of at my wits end on this one. I don’t know how checkmk works on the inside yet so it’s quite hard to troubleshoot what was supposed to be a carefree trial of the application.

I totally understand. Getting the first host into monitoring is crucial and you most obviously did hit a bug in our software. I am trying to sort out what this is, fisrt to help you, second to fix the bug with the next patch release.

So, one more test. What is the output of:

/usr/bin/cmk-agent-ctl --version

Indeed. No worries, I appreciate your efforts a lot. Feel free to ask for as much as you need and I’ll try to provide.
image

OK, getting closer.

/usr/bin/cmk-agent-ctl dump | head -n 20

(redacted for future users, in this case only the first 15-20 lines are important)

Currently working while doing this so haven’t checked the dump for stuff I wouldn’t want exposed but it seems fine during a quick look. I’m a new user so I couldn’t upload unfortunately…

So at this point we know that the local socket is working. What about the exposed dameon?

systemctl status cmk-agent-ctl-daemon.service

If this is missing you might want to run the systemd deployment script again:

/var/lib/cmk-agent/scripts/super-server/setup trigger

1 Like

It was lodade but disabled. I ran the setup trigger as you suggested and now it’s running. I can now communicate with the host:
image
Seems like the issue is it’s not running the setup trigger during post. I could verify this on another host (Debian, same setup) in a few.

Yup, works on Debian too. For future users, before a patch has been deployed, this solved it:

Run as root:
/var/lib/cmk-agent/scripts/cmk-agent-useradd.sh
/var/lib/cmk-agent/scripts/super-server/setup trigger

You could then reboot the server or manually restart the services as root:
systemctl restart cmk-agent-ctl-daemon.service
systemctl restart check-mk-agent.socket

2 Likes

Looks like some occasional problem where either /usr/bin is not in the path or the path for a freshly installed program is not added to the executing shells path cache immediately. Besides not affecting many users our developers are working on making the post install scripts more robust by replacing the command calls with full paths to the executable.

1 Like

The workarounds suggested here helped a lot.

However, I am still seeing some errors while registering the host. Everytime I try to register (without sudo), I am getting the following errors.

ERROR [cmk_agent_ctl] Failed to run as user 'cmk-agent'. Please execute with sufficient permissions (maybe try 'sudo').

Caused by:
    0: Failed to set group id 998 corresponding to user cmk-agent
    1: EPERM: Operation not permitted

However, when I run the command using sudo, I end up with the following output.

ERROR [cmk_agent_ctl] Error pairing with checkmk.example.com:443/site

Caused by:
    Request failed with code 404 Not Found: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
    <html><head>
    <title>404 Not Found</title>
    </head><body>
    <h1>Not Found</h1>
    <p>The requested URL was not found on this server.</p>
    </body></html>

Even though at the server side, I am able to see the host connected to it. Should I ignore these errors?

Same here on Debian 11.5 using CMK 2.1.0p11

Installing CMK agent using DPKG returns:

# dpkg -i check-mk-agent_*_all.deb
(Reading database ... 67622 files and directories currently installed.)
Preparing to unpack check-mk-agent_2.1.0p11-1_all.deb ...
Removing deployed systemd units: check-mk-agent@.service, check-mk-agent-async.service, check-mk-agent.socket, cmk-agent-ctl-daemon.service
Unpacking check-mk-agent (2.1.0p11-1) over (2.1.0p11-1) ...
Setting up check-mk-agent (2.1.0p11-1) ...

Deploying systemd units: check-mk-agent@.service check-mk-agent-async.service check-mk-agent.socket cmk-agent-ctl-daemon.service
Deployed systemd

Local user is missing and systemd services were not enabled. To solve this, I’ve to download and run cmk-agent-useradd.sh from monitoring server. Additionally, I’ve to enable services by running systemctl enable check-mk-agent.socket check-mk-agent-async.service cmk-agent-ctl-daemon.service and reboot afterwards.

Sorry, we now found a bug that describes this behavior.
As you may know, we recently made a couple of fixes and changes to the Linux agent packages, and by that we accidentially introduced a bug that affects the .deb package in Checkmk 2.1.0p11.

It’s fixed with 2.1.0p12: Regression: Activate agent controller daemon on Debian based systems (despite the Werk showing p13)

1 Like