Monitoring NVIDIA SMI From linux host

We already use a plugin which is like what is recommended in this thread. It also includes a litte hack because nvidia did some changes to the XML elements of the nvidia-smi output

#!/bin/bash
#
# Make Werk 14723 available to Linux
 
inpath() {
     # replace "if type [somecmd]" idiom
     # 'command -v' tends to be more robust vs 'which' and 'type' based tests
     command -v "${1:?No command to test}" >/dev/null 2>&1
}
 
section_nvidia_smi() {
     if inpath nvidia-smi; then
         echo '<<<nvidia_smi:sep(9)>>>'
         nvidia-smi -q -x
     fi
}
 
# Zeile für den Agent
#[ -z "${MK_SKIP_NVIDIA}" ] && _log_section_time section_nvidia_smi
 
section_nvidia_smi \
        | sed \
        -e 's/power_readings>/power_readings>\n<power_management>Supported<\/power_management>/' \
        -e 's/gpu_power_readings/power_readings/'

That pipe through sed makes the XML output parsable by the current python code.

There are two pull requests:

  1. Add this code (without the sed hack) to the Linux agent: Add feature: auto-discovery for nvidia-smi on linux by mayrstefan · Pull Request #680 · Checkmk/checkmk · GitHub
  2. Address changed XML structure of current nvidia-smi versions: Fix parsing of current nvidia_smi section by mayrstefan · Pull Request #681 · Checkmk/checkmk · GitHub
1 Like