Please suggest plugin for monitoring nvidia gpu on linux server

Hi,

I’m new here.
I need to monitor a GPU server (multiple GPUs per host)
I was trying to find a service for nvidia-smi monitoring.
found nvidia-gpu-2.0.mkp and installed it thru the command line.

However, I don’t see it in the GUI.

Please suggest what I’m missing.

Hi, which edition are you using? Raw or Enterprise? If unsure, please post the output of omd sites.

Currently, i’m on the raw edition

@mschlenker i’m on raw addition.

For my use case, it’s critical to monitor GPUs.
Does CheckMK have such support?

The plugin you mentioned consists of two components: The Checkmk server side and the agent side. You have to take the file local/share/check_mk/agents/plugins/nvidia_smi from your Checkmk site and deploy it to the directory /usr/lib/check_mk_agent/plugins/ on the host to be monitored. To test, just run it from there.

In the next step you can run the service discovery in Checkmk.

Hi,

Thanks for helping.
I will try and report.

Also, is there a way to deploy on multiple hosts at once?

The enterprise editions have a mechanism called “Agent Bakery” to provide tailored agent packages and an automatic agent updater to distribute them.

You can use the “Enterprise Free Edition” for free also in commercial setups if you stay below 25 hosts (changes to 750 services for 2.2).

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.