CMK version: 2.0.0p9
OS version: CentOS 7
Error message: A few times each day, mostly on busier servers of ours, the checkmk service will go stale and begin to return (null) across the board. Often this will cause almost all of the hosts on a given server to go stale.
One of the people on our team was recently doing a little bit of a look to try and see what happens to the server load at the same time. Whenever all of the hosts go stale, it’s not the fault of the load simply hitting a number where it can’t keep up and staying there. The load actually plummets down to next to nothing, stays there for an indeterminate amount of time, and then skyrockets when everything comes out of being stale to catch up. This was a bit of the opposite of what we expected it to do, we expected that load to always stay sky high.
We’re on 2.0.0p9 right now, planning an upgrade to 2.1.0p24 in the very near future. I’m curious if anyone’s had any new insight on this topic since the last forum post closed up in November of 2022 due to inactivity.
Thanks! Will post additional info as needed or requested here.
Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)