CMK version: Raw Edition 2.1.0p24 OS version: RockyOS 91
Error message: Connecting via TCP to XXX.XX.XXX.XX:6556 (5.0s timeout)
Output of “cmk --debug -vvn hostname”: debug only shows that the agent is timing out
Hi everyone,
when I try to monitor our Citrix farm using the citrix_farm.ps1 it breaks the agent of our Delivery Controllers (Server 2019) and they start timing out or saying that the output is empty.
This happens as soon as I place the plugin in the plugins folder and restart the Check_MK service.
The plugin is located inside the correct folder (C:\ProgramData\checkmk\agent\plugins) and the agent is running with a Citrix Admin account.
If I remove the plugin the agent comes back to life instantly.
What I tried so far:
reinstalling the agent from scratch
restarting the Delivery Controller(s)
restarting the OMD site as well as the Check_MK server
modified the script trying to make it lighter to no avail
The same exact installation, with the same versions, is working correctly in another site.
The only difference is the number of Citrix VDAs and Delivery Groups monitored, here are a lot more (150 VDAs and around 30 Delivery Groups).
try to measure the runtime of the script when you run it manually in context of the specified citrix account.
If the runtime is around 60s or above you will run into problems…
If you are not able to lower the execution time of the script, you can raise the timeout for the script and cache the agent output for a defined timeframe, to deschedule the execution of the plugin.
thank you for the suggestion, I always run it using PowerShell ISE and in there it runs pretty quickly (around 25 seconds).
Based on your comment I ran the script from a “normal” PoerShell and noticed that it takes 1 minute and 36 seconds…
Time to understand what could be causing the issue, I can clearly see the console writing super slowly, which is really strange.
I will check if the AV or something else is degrading the performance.
I will update the topic shortly with my findings, if you have any other suggestion I’m all ears!
I can confirm that the problem was caused by the Anti Virus. As soon as it was deactivated the script started running super quickly and stopped timing out on the Check_MK.
We will now work with the AV team to create all appropriate exclusions.
Thank you again for the valuable advice, it was crucial to understand what the problem was!