Update from 2.0.0P22 to 2.1 | Missing monitoring data for plugins: wmi_cpuload

I’m a facts / numbers guy myself, so after being annoyed with the constant yellow messages I’ve checked my logs because I wanted to find out how often it happens and whether there is a pattern:

OMD[mysite]:~$ grep "wmi_cpuload" var/check_mk/core/history | grep -o "SERVICE ALERT:.*Check_MK;WARN" |  sort | uniq -c | sort -n
      1 SERVICE ALERT: server022;Check_MK;WARN
      1 SERVICE ALERT: server009;Check_MK;WARN
      1 SERVICE ALERT: server036;Check_MK;WARN
      1 SERVICE ALERT: server065;Check_MK;WARN
      1 SERVICE ALERT: server01;Check_MK;WARN
      1 SERVICE ALERT: server04;Check_MK;WARN
      2 SERVICE ALERT: server067;Check_MK;WARN
      3 SERVICE ALERT: server072;Check_MK;WARN
      3 SERVICE ALERT: server03;Check_MK;WARN
    408 SERVICE ALERT: server049;Check_MK;WARN
    461 SERVICE ALERT: server073;Check_MK;WARN
    477 SERVICE ALERT: servers01;Check_MK;WARN
    482 SERVICE ALERT: server039;Check_MK;WARN
    492 SERVICE ALERT: server011;Check_MK;WARN
    493 SERVICE ALERT: server018;Check_MK;WARN
    497 SERVICE ALERT: server013;Check_MK;WARN
    497 SERVICE ALERT: server062;Check_MK;WARN
    498 SERVICE ALERT: server041;Check_MK;WARN
    500 SERVICE ALERT: server034;Check_MK;WARN
    500 SERVICE ALERT: server070;Check_MK;WARN
    519 SERVICE ALERT: server06;Check_MK;WARN

This particular site contains about 50 Windows hosts and one can clearly see that 12 of them are (badly) affected (the outlier with “only” 408 events was patched to 2.1 yesterday already for different reasons, therefore no longer any WARNINGs after that).

Unfortunately I don’t see any pattern as to the specific servers affected (and unaffected), other than it’s only happening with 1.6.0 and 2.0 agents, not earlier ones (did the old agent even implement such a check?). Who knows which spurious Windows Registry bit bothers the WMI service on those hosts, the WARNING in Checkmk likely only highlights an issue that has always been there.

Good news to hear that the “wmi_cpuload” does no longer use WMI in 2.1 agents after all.
Updating the agents to 2.1 fixes the issue, but there is obviously a server-side change in 2.1 as root cause of the issue, as nobody changed anything on the agents to make the check break and CMC 2.0 was content with the results of the same agents, while 2.1 sometimes isn’t.

I’ve downloaded the new agents from the bakery and updated the 12 servers manually and now expect only sporadic occurrences of this in my logs (as can be seen with the amount of agents where the error happened exactly once), I can totally live with that. The sporadic entries should cease to occur once I’ve managed to update all of the remaining Windows servers to 2.1

1 Like