CMK version: 2.1.0p11 CEE (currently on Trial)
OS version: Ubuntu 22.04
we have a problem with random stale local checks. It shows like this in the gui:
and apperas in 99% of the time on disk IO, dotnet or webservice checks. Never happens to any active snmp checks.
The active check to the agent seems ok:
and the runtime of the service seems fine to:
All the agents are baked clients with the following sections disabled by default:
- Execute legacy monitoring plugins
- MS Exchange counters (various)
- Web Services
- .Net/CLR Memory
- Hardware Sensors via OpenHardwareMonitor
- Skype for Business
Also no additional plugins except the agent updater.
Core statistics are looking fine:
It is a local site with one remote site setup. It was even worste when dotnet and Web Services were not disabled in the agent setup.
Anyone can help?
Here is the reason
These two sections are responsible for the webservice and dotnet checks.
Disk IO Summary is a little bit different. Is the stale status there only for one or two check intervals?
yes sorry my description was misleading.
First we had all sections enabled on all servers. During this time the stales for disk IO, dotnet or webservice appeared random.
After some further investigation we disabled “Execute legacy monitoring plugins, MS Exchange counters (various), Web Services, .Net/CLR Memory, Hardware Sensors via OpenHardwareMonitor, Skype for Business” globally and enabled them only there where it is needed.
After this change the dotnet and webservice stale checks are only appearing where they are configured. Stale Disk IO can appear on any machine, because its enabled everywhere.
Yes the stale status only appears for one check intervall. on the next check interval it disappers but is stale on an other machine. We have ~50 vm in the current config but are about to go up to ~300. So i am afraid it will get worse.
These are all checks performed by the Agent and are all some kind of local checks.
I suggest two things:
1.) Run the agent locally on your windows system. don’t remember the parameter now but you need to pass one to get output.
2.) run telnet or some other tool to connect to the server on 6556
in both cases run a stopwatch to count the time it takes to execute.
If this is fast you do the same test from the site itself with the omd command.