WIN - random agents not getting data / taking very long

vtthomas · May 5, 2022, 10:19am

Good morning everybody,

I experience the following situations with some Windows server, where we get the error message due to the actual check taking too long.

CMK version:
Version: 2.0.0p19, Edition: cme
OS version:
ubuntu 20.04.4 LTS
Error message:

Output of “cmk --debug -vvn hostname”:[cpu_tracking] Stop [7fb1f84d5f10 - Snapshot(process=posix.times_result(user=0.15000000000000013, system=0.010000000000000009, children_user=0.0, children_system=0.0, elapsed=0.1600000001490116))]
[agent] Version: 2.0.0p17, OS: windows, execution time 99.2 sec | execution_time=99.210 user_time=0.150 system_time=0.010 children_user_time=0.000 children_system_time=0.000 cmk_time_agent=99.050
** (If it is a problem with checks or plugins)

I do not really understand why it suddenly starts doing this, as regular traffic/ workload does not has issues nor do the server misbehave in any other way.

Sometimes a restart of the said server helps, but in this case it’s a production SQL server, which I’m very unlikely to reboot coming weeks.

Many thanks in advance,

Thomas

mike1098 · May 5, 2022, 2:33pm

Hello,

For troubleshooting I would first have a look at the response time of your Check_MK service.
I see that you have 418 service which is quite an amount if data which needs to be collected. I can imagine that agent ran after 60 sec in to a timeout. To avoid this you may change scheduling of the Check_MK check and increase timeout.
We have frequently issues with WMI which responds slowly and works better after server reboot. It could be that an repair of the WMI DB might help. I personally never did that but there are procedures described in the internet howto.
In case a plugin takes long run time you may have the option to run it asynchronously.
https://kb.checkmk.com/display/KB/Asynchronous+execution+of+Windows+plugins

I hope that helps

Michael

vtthomas · May 19, 2022, 6:55am

Hi Mike,

Many thanks for your response,I willl look into the suggested solutions.

Kind regards,
Thomas

andreas-doehler · May 19, 2022, 7:52am

Beside the problems @mike1098 mentioned you can also look at the configuration of the plugin execution on this host. Best practice should be to execute all the plugins and local scripts asynchron.
This will reduce the runtime of the agent significantly.

system · May 19, 2023, 7:52am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.