Update from 2.0.0P22 to 2.1 | Missing monitoring data for plugins: wmi_cpuload

Kami0815 · June 3, 2022, 11:51am

Since the update from 2.0.0P22 to 2.1.0p1 we get the following warning message every few minutes spuradically on different hosts:

–SNIP
[agent] Success, [piggyback] Successfully processed from source ‘veeam.domain.local’, Successfully processed from source ‘vsphere.domain.local’, Missing monitoring data for plugins: wmi_cpuloadWARN, execution time 5.8 sec
–SNAP

Nothing other change.

Does anyone have an idea how I can stop the warning messages that appear now (always so 2-5 at a time for a short time)?

CMK version: 2.1.0p1
OS version: Ubuntu 22.04

Error message:
[agent] Success, [piggyback] Successfully processed from source ‘hostname.domain.local’, Successfully processed from source ‘vcsa.domain.local’, Missing monitoring data for plugins: wmi_cpuloadWARN, execution time 5.8 sec

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)
<<<wmi_cpuload:sep(124)>>> / Transition HostSectionParser → HostSectionParser
→ Add sections: [‘check_mk’, ‘df’, ‘dotnet_clrmemory’, ‘fileinfo’, ‘logwatch’, ‘mem’, ‘ps’, ‘services’, ‘systemtime’, ‘uptime’, ‘veeam_jobs’, ‘veeam_tapejobs’, ‘win_license’, ‘windows_updates’, ‘winperf_if’, ‘winperf_phydisk’, ‘winperf_processor’, ‘wmi_cpuload’, ‘wmi_webservices’]

chauhan_sudhir · June 3, 2022, 7:13pm

Is the agent also updated to 2.1 ?

Kami0815 · June 7, 2022, 4:33am

Not all. I must reconnect all Agents with the 2.1 Agent Version.

Bert · June 8, 2022, 9:56am

Since upgrading to 2.1 I have the same problem with the windows-hosts in my installation. But only hosts with upgraded clients are subject to this problem, but not all of them. And its flapping. If it appears, it always disappears with the next check intervall.
I tried to extend the WMI timeout but with no luck. For now I moved the Processor Queue to Disabled Services on the affected hosts.

CMK Version 2.1.0p1 EE

robin.gierse · July 15, 2022, 7:31am

Well, WMI is not known for its blazing performance, so this message came up in the past already.
That it now correlates with your upgrades might be coincidence after all.
However, as already suggested, you can fine tune the timing of WMI queries. Find below some resources on that:
Without agent bakery: Increase WMI Timeout - Checkmk Knowledge Base - Checkmk Knowledge Base
With agent bakery: Add possibility to change Windows Agent WMI timeout in WATO

albzundy · July 19, 2022, 7:14am

I think the improved WMI timeout is just an (semi good working) workaround.

With CMK server version 1.6 and 2.0 I never had these problem and now almost every server brings on these error.

Even with 6 seconds (thats + 100%) timeout the wmi timeout error occurs sometimes.

And it looks like the agent version doesn’t matter. 1.6, 2.0 or 2.1 … I know that this sounds a bit strange but I think the problem isn’t on the agent side but on the cmk server. I only updated the server cmk version and suddenly these problem appears O.o

(Running on debian 11)

mdz0r · July 22, 2022, 2:19pm

Same here since 2.1.0p8 upgrade from 2.0.0p23.

Agents are 2.1 or 2.0.

athomaidis · July 22, 2022, 7:54pm

Hi there,

as Robin already mentioned: If Checkmk complains about a missing agent section the problem will be on agent side. If the agent is not able to collect data for the section or if the section is taking too much time, the will be missing in the agent output.

We should verify, that the section is available in the agent output and the data are collected properly.

To verify this, you could check or provide the check_mk.log from the windows server stored in C:\ProgramData\checkmk\agent\log?

Br
Thanos

andreas-doehler · July 22, 2022, 8:20pm

Not every time
If the parsing of the section fails then you get also this message.
I think here is a real problem inside the parsing of all WMI data sections. The cpuload section is the victim of some programming errors for the parsing.

mdz0r · July 25, 2022, 9:33am

Here are my logs :

2022-07-25 11:19:20.611 [srv 2644] [Trace] Provider 'wmi_cpuload' is direct called, id '1965870890507600' port [mail:\\.\mailslot\Global\WinAgent_0]
2022-07-25 11:19:22.488 [srv 2644] Object 'Win32_PerfRawData_PerfOS_System' in 1876ms sends [775] bytes
2022-07-25 11:19:22.504 [srv 2644] Object 'Win32_ComputerSystem' in 14ms sends [1336] bytes
2022-07-25 11:19:22.505 [srv 2644] [Trace] Sending data 'wmi_cpuload' id is [1965870890507600] length [2195]
2022-07-25 11:19:22.506 [srv 2644] perf: Section 'wmi_cpuload' took [1894] milliseconds

2022-07-25 11:29:20.646 [srv 2644] [Trace] Provider 'wmi_cpuload' is direct called, id '1966470921583200' port [mail:\\.\mailslot\Global\WinAgent_0]
2022-07-25 11:29:25.656 [srv 2644] [Err  ] Timeout [5] seconds broken  when query WMI
2022-07-25 11:29:25.657 [srv 2644] [Warn ] Object 'Win32_PerfRawData_PerfOS_System' in 5009ms sends NO DATA
2022-07-25 11:29:25.658 [srv 2644] [Warn ] On timeout in sub section 'system_perf' try reuse cache
2022-07-25 11:29:25.670 [srv 2644] Object 'Win32_ComputerSystem' in 11ms sends [1336] bytes
2022-07-25 11:29:25.672 [srv 2644] [Trace] Sending data 'wmi_cpuload' id is [1966470921583200] length [2200]
2022-07-25 11:29:25.673 [srv 2644] perf: Section 'wmi_cpuload' took [5025] milliseconds

albzundy · July 26, 2022, 1:44pm

Another problem appearance is that the Check_MK Discovery Service sometimes produces also a timeout error on some snmp devices (e.g. switches).

Running on 2.1.0p7 or 2.1.0p8 (same behavior) and Debian 11

With 2.0.0p23 there were no problems.

KarlKlammer · July 26, 2022, 3:47pm

Same problem with upgrade from 2.0.0p22 to 2.1.0p8.

robin.gierse · July 28, 2022, 12:37pm

Gentlemen, please do not mix different issues in one thread.
SNMP is a completely different story than WMI (although about as painful ).

Lasse · July 29, 2022, 6:04am

Hey there,

same Problem with the sporadic missing data of wmi_cpuload on some specific hosts after updating the checkmk-server to the newest Version 2.1.0p9.
Never had the problem before. Like some others I think there is a problem within the newest release.

SergejKipnis · July 29, 2022, 6:51am

Do you have a log from those sporadically failing hosts?
Since 2.1.0p1, we do not use anymore WMI Api to get access to the wmi_cpuload data.

SergejKipnis · July 29, 2022, 6:59am

WMI may work quite unstable. Especially when we are using wmi_cpuload counters.
Since 2.1 wmi_cpuload data are obtained using perf counters. According to testing this is quite stable.
Still, errors are possible.

Background:
This above-mentioned timeout(5sec) hod came direct from the MS WMI subsystem. You may try the query in powershell or even manually written code in C# to read wmi_cpuload data. In any case, you will get sporadically timeout from WMI.
The real reason of such sporadic timeouts is not known. We can’t solve problems of MS Windows, still for 2.1+ we have got workaround.

Lasse · July 29, 2022, 7:30am

Im not quite sure, but this should be the part of the log from a failing host:

2022-07-29 06:30:04.260 [srv 5400] [Trace] Provider 'wmi_cpuload' is direct called, id '30282300867653549' port [mail:\\.\mailslot\Global\WinAgent_0]
2022-07-29 06:30:07.251 [srv 5400] [Err  ] Timeout [3] seconds broken  when query WMI
2022-07-29 06:30:07.251 [srv 5400] [Warn ] Object 'Win32_PerfRawData_PerfOS_System' in 2991ms sends NO DATA
2022-07-29 06:30:07.252 [srv 5400] [Warn ] On timeout in sub section 'system_perf' try reuse cache
2022-07-29 06:30:07.259 [srv 5400] Object 'Win32_ComputerSystem' in 7ms sends [1349] bytes
2022-07-29 06:30:07.259 [srv 5400] [Trace] Sending data 'wmi_cpuload' id is [30282300867653549] length [2210]
2022-07-29 06:30:07.260 [srv 5400] perf: Section 'wmi_cpuload' took [2999] milliseconds

Looks like its the same output from @mdz0r

SergejKipnis · July 29, 2022, 7:53am

This log looks as 2.0 agent output. Correct?

Interesting is a pattern i.e. please, search in whole log ‘Timeout [3] seconds broken when query WMI’ and check how often it happens.

Lasse · July 29, 2022, 8:38am

Yeah, the agent is 2.0. Updating it to 2.1 might be the point?
In the RAW edition I have to do it manually, correct?

Regarding the wmi_cpuload - it happened 3 times in like 3 hours.
In total the error occurs 35 times in the 3 hours.

SergejKipnis · July 29, 2022, 8:54am

I would suggest to update to 2.1.
Defaults in 2.1 (Raw too) are configured to avoid WMI quering