Update from 2.0.0P22 to 2.1 | Missing monitoring data for plugins: wmi_cpuload

Lasse · July 29, 2022, 9:27am

Okay will do and give you an update, if that resolved the problem.
Thanks for the help so far

Lasse · July 29, 2022, 10:22am

Looking good - no more errors so far! Should be resolved for me, thank u very much!
@all so maybe try updating the agent to the latest version

Lasse

MaPa · July 29, 2022, 12:45pm

Ich würde vorschlagen den Fehler bei der Verwendung des Agent kleiner 2.1 zu beheben.
Die cmk 2.1 sollte kompatibel zu Agent 2.0 sein und WMI können.

viele Grüße
Mario

albzundy · July 29, 2022, 12:55pm

I also think the problem is solved when using agent 2.1.
But most of our hosts still have Agent 2.0 or 1.6 … We use CRE and have to update all of them manually…

robin.gierse · July 29, 2022, 1:34pm

May I introduce you to my good friend, our Ansible Collection?

bitwiz · August 19, 2022, 8:37pm

I’m a facts / numbers guy myself, so after being annoyed with the constant yellow messages I’ve checked my logs because I wanted to find out how often it happens and whether there is a pattern:

OMD[mysite]:~$ grep "wmi_cpuload" var/check_mk/core/history | grep -o "SERVICE ALERT:.*Check_MK;WARN" |  sort | uniq -c | sort -n
      1 SERVICE ALERT: server022;Check_MK;WARN
      1 SERVICE ALERT: server009;Check_MK;WARN
      1 SERVICE ALERT: server036;Check_MK;WARN
      1 SERVICE ALERT: server065;Check_MK;WARN
      1 SERVICE ALERT: server01;Check_MK;WARN
      1 SERVICE ALERT: server04;Check_MK;WARN
      2 SERVICE ALERT: server067;Check_MK;WARN
      3 SERVICE ALERT: server072;Check_MK;WARN
      3 SERVICE ALERT: server03;Check_MK;WARN
    408 SERVICE ALERT: server049;Check_MK;WARN
    461 SERVICE ALERT: server073;Check_MK;WARN
    477 SERVICE ALERT: servers01;Check_MK;WARN
    482 SERVICE ALERT: server039;Check_MK;WARN
    492 SERVICE ALERT: server011;Check_MK;WARN
    493 SERVICE ALERT: server018;Check_MK;WARN
    497 SERVICE ALERT: server013;Check_MK;WARN
    497 SERVICE ALERT: server062;Check_MK;WARN
    498 SERVICE ALERT: server041;Check_MK;WARN
    500 SERVICE ALERT: server034;Check_MK;WARN
    500 SERVICE ALERT: server070;Check_MK;WARN
    519 SERVICE ALERT: server06;Check_MK;WARN

This particular site contains about 50 Windows hosts and one can clearly see that 12 of them are (badly) affected (the outlier with “only” 408 events was patched to 2.1 yesterday already for different reasons, therefore no longer any WARNINGs after that).

Unfortunately I don’t see any pattern as to the specific servers affected (and unaffected), other than it’s only happening with 1.6.0 and 2.0 agents, not earlier ones (did the old agent even implement such a check?). Who knows which spurious Windows Registry bit bothers the WMI service on those hosts, the WARNING in Checkmk likely only highlights an issue that has always been there.

Good news to hear that the “wmi_cpuload” does no longer use WMI in 2.1 agents after all.
Updating the agents to 2.1 fixes the issue, but there is obviously a server-side change in 2.1 as root cause of the issue, as nobody changed anything on the agents to make the check break and CMC 2.0 was content with the results of the same agents, while 2.1 sometimes isn’t.

I’ve downloaded the new agents from the bakery and updated the 12 servers manually and now expect only sporadic occurrences of this in my logs (as can be seen with the amount of agents where the error happened exactly once), I can totally live with that. The sporadic entries should cease to occur once I’ve managed to update all of the remaining Windows servers to 2.1

keren · August 28, 2022, 9:18am

I’m having the same WMI messages since i upgraded the OMD and the agents to 2.1.0p3- then updated to 2.1.0p8 and i still have too many warnings from windows servers every day.

Lasse · August 30, 2022, 8:19am

@keren Try the latest release, my agent and omd version is 2.1.0p9 and the issue is resolved for me

keren · August 30, 2022, 12:02pm

Thanks Lasse, i will do that.

keren · September 11, 2022, 1:27pm

Hi Guys, problem still exist even after upgrading to 2.1.0p11 and also Appling the rule: [Disabled sections (Windows agent)]
any suggestions?

andreas-doehler · September 11, 2022, 3:49pm

If you disable the section inside the agent and don’t do a discovery on the affected hosts the message will stay the same as before. On the affected hosts the wmi_cpuload service should be no shown as vanished at discovery time.

team-it · October 3, 2022, 11:14am

Same issue with 2.1.0p13. Agent and Server are using p13 and i get missing wmi_cpuload in piggyback for some of the Hyper-V Clients. There must be an issue within checkMK itself.

robin.gierse · October 4, 2022, 6:43am

I have to emphasize this: The issue lies within Windows or WMI more specifically.
We are doing our very best to properly monitor metrics, we only get through WMI, but it is a pain.
There might always be room for improvement on our end, but again: We are working around issues in WMI and we can only do so much.

team-it · October 4, 2022, 7:10pm

The problem is it was not there before 2.1. We changed dashboard now from “service states” to “service hard states” and set hard state limit to 3 so it can fail 2 times till we get this warning, cause it is always missing just once. The next agent call wmi_cpuload (and it is ALWAYS only the cpuload) will be back. Regardless if it is direct on the server OR as piggyback. Problem now is we get warning for more critical services delayed by 3 minutes.

andreas-doehler · October 4, 2022, 8:54pm

It looks really like a problem with the data for the WMI checks how it is processed in CMK 2.1.
This is no agent problem as the data for the check is reported.
But CMK has, under unknown circumstances, a problem to process the received data for some WMI checks correctly.

robin.gierse · October 20, 2022, 12:10pm

@SergejKipnis maybe you can add something of substance here? I got nothing to be honest.

SergejKipnis · November 1, 2022, 6:49pm

I can’t imagine how you could get missing cpu_load in 2.1p13.
This version uses performance counters, and performance counters (if found) are quite stable.
Maybe Windows Agent can’t find required performance counters? This is possible, at least theoretically.

I need the log from the Windows agent. If possible, MSI + zipped Programdata/checkmk to see what happened.

Just FYI, piggyback, dashboard hard state limit doesn’t imply on Windows agent functionality:
You may easily validate how good is output from wi_cpuload running
check_mk_service section wmi_cpuload
Expected output with performance counters

<<<wmi_cpuload:sep(124)>>>
[system_perf]
Name|ProcessorQueueLength|Timestamp_PerfTime|Frequency_PerfTime|WMIStatus
|0|1725542868066|10000000|OK
[computer_system]
Name|NumberOfLogicalProcessors|NumberOfProcessors|WMIStatus
KLAPP-0336|20|1|OK```

SergejKipnis · November 7, 2022, 4:16pm

You mean, that despite agent did deliver the data, the check for some unknown reasons may not process correctly?

andreas-doehler · November 7, 2022, 4:30pm

Exactly, this happens very often for older agents if you upgrade to 2.1.

bkuhn · February 2, 2023, 4:40pm

Can confirm the Problem here too, 1.4.0 Agent working fine in a 1.5.0 installation, but in the new 2.1, where we not have updated the agents yet (old site also still running fine), we get the missing agent section for wmi_cpuload over and over.