Windows NVIDIA GPU monitoring

CMK version: Raw Edition 2.2.0p24
OS version: Ubuntu 20.04.4
Client OS version: Windows 10 22H2

Hi, how to enable Nvidia GPU monitoring on Windows machines? I’ve downloaded nvidia_smi.ps1 and put it into C:\Program Files (x86)\checkmk\service\plugins

Do I need more steps to do? Create some config files, etc?

image

Windows 10 is VM with pass-through GPU on VMware environment running on top of Lenovo NVIDIA datacenter graphics.

>nvidia-smi
Thu May  2 14:51:51 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 529.19       Driver Version: 529.19       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A40-12Q     WDDM  | 00000000:02:00.0  On |                  Off |
| N/A    0C    P8    N/A /  N/A |    989MiB / 12288MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     12752    C+G   ...xxxxx    N/A      |
|    0   N/A  N/A     12904    C+G   ...xxxxx    N/A      |
|    0   N/A  N/A     15784    C+G   ...xxxxx    N/A      |
|    0   N/A  N/A     18408    C+G   ...xxxxx    N/A      |
|    0   N/A  N/A     18876    C+G   ...xxxxx    N/A      |
+-----------------------------------------------------------------------------+

I see in nvidia_smi.ps1 that it is searching for config file

$MK_CONFDIR = $env:MK_CONFDIR
if (!$MK_CONFDIR) {
    $MK_CONFDIR = "%PROGRAMDATA%\checkmk\agent\config"
}
$CONFIG_FILE = "${MK_CONFDIR}\nvidia_smi_cfg.ps1"

Documentation is rather poor. From where I can find out how this config file should look like?

There was a parsing error which was fixed as part of this werk NVIDIA Graphics Card: Fix parsing error on new data format. This version 2.2.0p26 is not out yet. YOu have to wait.
However, the fix has already been mentioned by Stefan
Monitoring NVIDIA SMI From linux host - #15 by mayrstefan

1 Like

In the 2.3.0 RAW, published on monday, it’s already working.

IMO the nvidia_smi.ps1 is not loaded by agent at all. If I run check_mk_agent.exe test I don’t see section "<<<nvidia_smi:sep(9)>>>" in output.

I never used checkmk agent on Windows until now.

Ah, sorry for the confusion. It was working for me because my
RAW-Site used an Enterprise Agent … it’s all mixed up here. Now I’m using the RAW agent and the Script and it’s not working anymore.

Okay, then you’ll have to wait for the mentioned fix.

1 Like