Checkmk agent high cpu usage with systemd

Well, it is still better than manually modifying the agent. Also, you always want to keep your agent and server version aligned.

But after reading this post one more time I get the following hunch:
Something is causing a systemctl process to become a zombie. There are very little calls to systemctl in the agent, so it might be some external factor, that is causing the systemctl process to misbehave, so the agent script cannot pick up the return of that. Maybe you investigate into that direction, to see what might be interfering with systemctl.

Hi all
Just chiming in here to confirm I am facing the same issues on Debian 11 systems.
I opened a separate (now closed) topic thinking this was checkmk 2.1 related, though :wink:

@moritz asked me to comment out line set_up_path in /usr/bin/check_mk_agent to test the issue (around line 1930), they’re currently working on it (FEED-6874 in their Jira). I opened an issue through the feedback-2.1-beta@checkmk.com email as I’ve never encountered that issue before on 2.0.

I’m currently waiting for the issue to arise again on all my VMs to see if the one with the modified agent still has the issue or not. It happens about once or twice a week in my case.

Hi all

We recently found a problem in our async linux agent that causesd 100% CPU load after about 1-2 weeks runtime. You can find a fix in the current master and 2.1 branches on github. The problem was not with systemd but one of our refactorings caused the the PATH variable to grow indefinitely. This bug does not exist in the 2.0 release.

Best Max

2 Likes

Hi @MaxL

You wrote that this bug doesn’t exist in Version 2.0, but if you have a look at this thread in detail you will see that users complain about this issue when using version 2.0 (including myself).
The fix for 2.1 won’t help for 2.0 because in my check_mk_agent binary I cannot find the set_up_path function, but maybe the underlying problem is more or less identical? Could this be verified, please?

The PATH variable is only updated once at the start of the agent in 2.0. There must be a different bug in 2.0. Unfortunately there is little I can do at the moment about this. For 2.1 we were able to reproduce this issue on our own installations and debug it with strace. Without a strace log of a process that has gone bad I have no idea where to start looking. If you experience the issue again you can capture a strace log using

strace -p <PID> -f -t 2> strace.log

We set -f to also follow any forks started. Replace with the main PID of the agent. You can find the main PID using systemctl status

❯ sudo systemctl status check-mk-agent-async.service | grep PID
   Main PID: 476659 (check_mk_agent)

I do not have a 2.0 agent running locally with systemd so your service name might differ.

1 Like

We are using check-mk-agent version 2.1.0p1 and after some time (~5-6 days) it happens that the PATH variable in check-mk-agent-async seems to be completely mangled…

We’ve installed CheckMK agent in a different location (/opt/check_mk/bin) and it looks like that /opt/check_mk/bin at the end is the only place where the agent searches for programs.

Running a strace shows that any command is expected to be in /opt/check_mk/bin:

.....
stat("/opt/check_mk/bin/sleep", 0x7ffc48c731e0) = -1 ENOENT (No such file or directory)
stat("/opt/check_mk/bin/sleep", 0x7ffc48c731e0) = -1 ENOENT (No such file or directory)
stat("/opt/check_mk/bin/sleep", 0x7ffc48c731e0) = -1 ENOENT (No such file or directory)
stat("/opt/check_mk/bin/sleep", 0x7ffc48c731e0) = -1 ENOENT (No such file or directory)
stat("/opt/check_mk/bin/sleep", 0x7ffc48c731e0) = -1 ENOENT (No such file or directory)
stat("/opt/check_mk/bin/sleep", 0x7ffc48c731e0) = -1 ENOENT (No such file or directory)
...
stat("/opt/check_mk/bin/grep", 0x7ffc48c72400) = -1 ENOENT (No such file or directory)
stat("/opt/check_mk/bin/grep", 0x7ffc48c72400) = -1 ENOENT (No such file or directory)
stat("/opt/check_mk/bin/grep", 0x7ffc48c72400) = -1 ENOENT (No such file or directory)
stat("/opt/check_mk/bin/grep", 0x7ffc48c72400) = -1 ENOENT (No such file or directory)
stat("/opt/check_mk/bin/grep", 0x7ffc48c72400) = -1 ENOENT (No such file or directory)
...
( and of course the wrong PATH then is inherited to the children ):

[pid 1642946] stat("/usr/local/bin/ipmi-sensors", 0x7ffc48c70a20) = -1 ENOENT (No such file or directory)
[pid 1642946] stat("/usr/local/bin/ipmi-sensors", 0x7ffc48c70a20) = -1 ENOENT (No such file or directory)
[pid 1642946] stat("/usr/local/bin/ipmi-sensors", 0x7ffc48c70a20) = -1 ENOENT (No such file or directory)
[pid 1642946] stat("/usr/local/bin/ipmi-sensors", 0x7ffc48c70a20) = -1 ENOENT (No such file or directory)
[pid 1642946] stat("/usr/local/bin/ipmi-sensors", 0x7ffc48c70a20) = -1 ENOENT (No such file or directory)

The only cure for it then would be to restart the check-mk-agent-async service

Did some debugging of the check_mk_agent (async) now by simply adding some echo’s to the main loop and here’s the result of the PATH variable during run:

2022-06-15 09:08:54 before main_setup: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
2022-06-15 09:08:54 after main_setup: /opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin
2022-06-15 09:08:54 before sleep: /opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin

2022-06-15 09:09:01 before main_setup: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
2022-06-15 09:09:02 after main_setup: /opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin
2022-06-15 09:09:54 before main_setup: /opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin
2022-06-15 09:09:54 after main_setup: /opt/check_mk/bin:/opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/bin
2022-06-15 09:09:55 before sleep: /opt/check_mk/bin:/opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/bin

2022-06-15 09:10:05 before main_setup: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
2022-06-15 09:10:05 after main_setup: /opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin
2022-06-15 09:10:55 before main_setup: /opt/check_mk/bin:/opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/bin
2022-06-15 09:10:55 after main_setup: /opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin
2022-06-15 09:10:55 before sleep: /opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin

2022-06-15 09:11:09 before main_setup: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
2022-06-15 09:11:09 after main_setup: /opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin
2022-06-15 09:11:55 before main_setup: /opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin
2022-06-15 09:11:55 after main_setup: /opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin
2022-06-15 09:11:55 before sleep: /opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin

2022-06-15 09:12:13 before main_setup: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
2022-06-15 09:12:13 after main_setup: /opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin
2022-06-15 09:12:55 before main_setup: /opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin
2022-06-15 09:12:55 after main_setup: /opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin
2022-06-15 09:12:55 before sleep: /opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin

2022-06-15 09:13:16 before main_setup: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
2022-06-15 09:13:16 after main_setup: /opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin
2022-06-15 09:13:55 before main_setup: /opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin
2022-06-15 09:13:56 after main_setup: /opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin
2022-06-15 09:13:56 before sleep: /opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/opt/check_mk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin

Why is the path to my checkmk installation always added to the path??

@MasopustC can you post the code of the agent you are running? This is the error we fixed during the beta issue. The function set_up_path has guards against adding the same path over and over again to the PATH variable.

set_up_path is not even called in my check_mk_agent, it is replaced by the agent bakery by the line:

PATH='/opt/check_mk/bin':$PATH:/usr/local/bin

in function main_setup.

Thanks for the hint. I found the code that is responsible for this. I wasn’t aware of the agent bakery patching when fixing the original issue. Regarding this issue here. If you install the agent in a non standard path the bakery wants to ensure we add the installation path to “bin” so all binaries used by the agent can be found. I’ll let you know once we have a werk with a fix.

So it may be already in 2.1.0p3 ?

No the fix is not in p3.

Here is the werk PATH update of linux agent when deployed via bakery . This will also land in 2.1.0p4

1 Like

Updated today and so far it looks good :slight_smile:

Thanks a lot!

1 Like