Good morning everybody,
I experience the following situations with some Windows server, where we get the error message due to the actual check taking too long.
Version: 2.0.0p19, Edition: cme
ubuntu 20.04.4 LTS
Output of “cmk --debug -vvn hostname”:[cpu_tracking] Stop [7fb1f84d5f10 - Snapshot(process=posix.times_result(user=0.15000000000000013, system=0.010000000000000009, children_user=0.0, children_system=0.0, elapsed=0.1600000001490116))]
[agent] Version: 2.0.0p17, OS: windows, execution time 99.2 sec | execution_time=99.210 user_time=0.150 system_time=0.010 children_user_time=0.000 children_system_time=0.000 cmk_time_agent=99.050
** (If it is a problem with checks or plugins)
I do not really understand why it suddenly starts doing this, as regular traffic/ workload does not has issues nor do the server misbehave in any other way.
Sometimes a restart of the said server helps, but in this case it’s a production SQL server, which I’m very unlikely to reboot coming weeks.
Many thanks in advance,
For troubleshooting I would first have a look at the response time of your Check_MK service.
I see that you have 418 service which is quite an amount if data which needs to be collected. I can imagine that agent ran after 60 sec in to a timeout. To avoid this you may change scheduling of the Check_MK check and increase timeout.
We have frequently issues with WMI which responds slowly and works better after server reboot. It could be that an repair of the WMI DB might help. I personally never did that but there are procedures described in the internet howto.
In case a plugin takes long run time you may have the option to run it asynchronously.
I hope that helps
Many thanks for your response,I willl look into the suggested solutions.
Beside the problems @mike1098 mentioned you can also look at the configuration of the plugin execution on this host. Best practice should be to execute all the plugins and local scripts asynchron.
This will reduce the runtime of the agent significantly.