Windows NVIDIA GPU monitoring

CMK version: Raw Edition 2.2.0p24
OS version: Ubuntu 20.04.4
Client OS version: Windows 10 22H2

Hi, how to enable Nvidia GPU monitoring on Windows machines? I’ve downloaded nvidia_smi.ps1 and put it into C:\Program Files (x86)\checkmk\service\plugins

Do I need more steps to do? Create some config files, etc?

image

Windows 10 is VM with pass-through GPU on VMware environment running on top of Lenovo NVIDIA datacenter graphics.

>nvidia-smi
Thu May  2 14:51:51 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 529.19       Driver Version: 529.19       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A40-12Q     WDDM  | 00000000:02:00.0  On |                  Off |
| N/A    0C    P8    N/A /  N/A |    989MiB / 12288MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     12752    C+G   ...xxxxx    N/A      |
|    0   N/A  N/A     12904    C+G   ...xxxxx    N/A      |
|    0   N/A  N/A     15784    C+G   ...xxxxx    N/A      |
|    0   N/A  N/A     18408    C+G   ...xxxxx    N/A      |
|    0   N/A  N/A     18876    C+G   ...xxxxx    N/A      |
+-----------------------------------------------------------------------------+

I see in nvidia_smi.ps1 that it is searching for config file

$MK_CONFDIR = $env:MK_CONFDIR
if (!$MK_CONFDIR) {
    $MK_CONFDIR = "%PROGRAMDATA%\checkmk\agent\config"
}
$CONFIG_FILE = "${MK_CONFDIR}\nvidia_smi_cfg.ps1"

Documentation is rather poor. From where I can find out how this config file should look like?

There was a parsing error which was fixed as part of this werk NVIDIA Graphics Card: Fix parsing error on new data format. This version 2.2.0p26 is not out yet. YOu have to wait.
However, the fix has already been mentioned by Stefan
Monitoring NVIDIA SMI From linux host - #15 by mayrstefan

1 Like

In the 2.3.0 RAW, published on monday, it’s already working.

IMO the nvidia_smi.ps1 is not loaded by agent at all. If I run check_mk_agent.exe test I don’t see section "<<<nvidia_smi:sep(9)>>>" in output.

I never used checkmk agent on Windows until now.

Ah, sorry for the confusion. It was working for me because my
RAW-Site used an Enterprise Agent … it’s all mixed up here. Now I’m using the RAW agent and the Script and it’s not working anymore.

Okay, then you’ll have to wait for the mentioned fix.

1 Like

I installed Enterprise edition. What everything do I need to configure?

Created rule:

Copied nvidia_smi.ps1 into C:\Program Files (x86)\checkmk\service\plugins

Do I need to create some additonal config file in checkmk agent?
Do I need to bake new agent now?

The agent plugin needs nvidia-smi.exe binary . You can put the path to the nvidia-smi.exe executable here, e.g. C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe

There were some parsing issues with nvidia agent plugin which were recently fixed in 2.2.0p26.
Are you using this patch release?

I had to change nvidia-smi path, on our environment is in different location. However after upgrade from Raw to Enterprise I forgot to update agent. :man_facepalming:

After agent update from Enterprise 2.3.0p2 it works now:

2 Likes

I am new on CheckMK. I can not get NVIDIA services to appear on my host (2 nvidia cards with nvidia-smi installed). Please, steps!!

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.