Many systemd service checks are now Critical after update to agent 2.0.0p28 on CentOS 7

CMK version: 2.1.0p12
OS version: CentOS 7

after updating the CentOS 7 agents from 2.0.0p9 to 2.0.0p28 (latest) we got many systemd service Crit Problems, which look like the following:

or

I’m aware of the old systemd v219 on these systems, but CentOS 7 doesn’t update this piece of software out-of-the-box, and we must stick with v2.0.0p28 at the moment. There’s a facebook hack to compile systemd, but I don’t want to go that way.
(provided the old systemd version is the root cause of this, but I don’t know…)

All these now Crit services were Ok with the older agent v2.0.0p9

Any idea how to deal with that?

Thanks!

What is the systemctl status of these service units?

Let’s take a look at one example node.

root@sallapprfs03:[~]\> systemctl --all |grep -i fail
● ipmievd.service                                                                                                loaded    failed   failed    Ipmievd Daemon

root@sallapprfs03:[~]\> systemctl status ipmievd
● ipmievd.service - Ipmievd Daemon
   Loaded: loaded (/usr/lib/systemd/system/ipmievd.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2022-09-28 20:58:56 CEST; 12h ago

Sep 28 20:58:56 sallapprfs03.research.silicon-austria.com systemd[1]: Starting Ipmievd Daemon...
Sep 28 20:58:56 sallapprfs03.research.silicon-austria.com ipmievd[1356]: Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory
Sep 28 20:58:56 sallapprfs03.research.silicon-austria.com systemd[1]: ipmievd.service: control process exited, code=exited status=1
Sep 28 20:58:56 sallapprfs03.research.silicon-austria.com systemd[1]: Failed to start Ipmievd Daemon.
Sep 28 20:58:56 sallapprfs03.research.silicon-austria.com systemd[1]: Unit ipmievd.service entered failed state.
Sep 28 20:58:56 sallapprfs03.research.silicon-austria.com systemd[1]: ipmievd.service failed.

I mean, the service fails for a reason, and checkmk does it’s job, and I’m scratching my head why checkmk didn’t spot this with the older agent version… so I had the idea to re-install v2.0.0p9 again, but surprisingly this service stays in Crit state now. I expected to get into Normal state again.

At the moment I really would have to ssh to each node to investigate and repair the failed services.

Older versions of checkmk had issues with the "● " in systemctl’s output. Maybe that was the cause for not showing the failed service unit.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.