Monitor used vCPUs per ESXi host

I’ve reapplied the changes for werk #10627

to my Checkmk 1.6 instance and have been able to get the desired values just fine.

HOWEVER, I know why the werk has likely been reverted: the values reported are far too high, because they are summation values: a value like 85321.13 means that within the last 20 second interval, the sum of “CPU ready time” for all VMs running on this ESXi host at this point in time (!) is 85 seconds, cumulatively.

The calculation in the Checkmk python check accounts for the 20 second timeframe (therefore divides by 200 correctly), but ALSO needs to divide by the number of then-running VMs on this host, too!

In other words: if you’ve got 20 VMs running on an ESXi host and Checkmk reports (with the original werk calculation) a CPU READY value of 80%, then the real value is 4%

Unfortunately, I was unable to ascertain whether “number of active VMs on the host” is a variable easily accessible from within the calculation or not, therefore I have been unable to extend the check in a way to take the “running VM count” into account for the check.

Is this possible to add and therefore “fix” the check?

(EDIT: this also explains, by the way, why percentages > 100 % were possible as check results - WATO doesn’t allow adding percentages higher than 101.0, therefore I was wondering how/why Checkmk would report percentages of 321% for my test host)

2 Likes