[BUG] 2.1 agent can cause significant disk read for systemd logs

CMK version: 2.1.0 Linux agent
OS version: Debian 11

I just upgraded from 2.0.0p18 to 2.1.0.
This included the following change: Add explicit status parsing to systemd checks · tribe29/checkmk@e5d0b1c · GitHub

Running systemctl status --all --type service --no-pager defaults to including the last few lines of logs for each service.
In some cases this can add significant time to command execution.
While some of my hosts can deal with this quite well, on others this is taking up significant amounts of time.
One extreme case for me is taking more than 3 minutes to get all the logs for all service units:

real    3m14.519s
user    0m0.348s
sys     1m30.642s

I didn’t read the entire diff, but based on the change notes it appears that this change was only intended to collect additional metadata from systemd, not logs.
Running the same command without fetching logs, using --lines=0, finishes almost immediately on the same host:

real    0m0.143s
user    0m0.051s
sys     0m0.014s
5 Likes

I saw the same problem on some hosts. As a first quick “solution” i removed the mentioned line from the agent.
I would classify it as a severe bug. If you have bad luck it results in a unusable agent and i saw on one host a very high CPU load caused by this agent line.

2 Likes

We are looking into this. :eyes:
Keep an eye on the next patch release.

1 Like

Meanwhile: You could also disable the “Systemd services” section entirely, if you prefer a “clickable” solution: Setup → Agents → Windows, Linux, Solaris, AIX → Agent rules → Disabled sections (Linux agent)

1 Like

I included the suggested fix in the next patch release. Thanks for reporting the issue.

4 Likes

Any ETA for the first patch release?
We also noticed this issue when testing the new agent on a select few hosts and have to halt all agent updates until this is solved.

Current plan is tomorrow, if internal testing goes well.

3 Likes

Hi @nir, the patch has been released. [Release] Checkmk stable release 2.1.0p1

1 Like

We have updated to 2.1.0p1 and can confirm that the issue is now fixed.
Thanks to the Team for a quick response to this!

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.