Get Proxmox host data including SMART drive check results

I am running Proxmox on as Dell Poweredge server.
So far I have two drives fall over and want to be able to get results of the SMART driver check.

At the moment I have CheckMK up and it is monitoring a few VM.
RAW version - 2.2.0p24 ChecmkMK installed
These are Ubuntu server VM’s with the Agent running on them.

The proxmox does not have a FQDN
The only way I got it to work is raise a host with the name of IP address
(I tried naming it PVE and setting the IP address but could not get it to work)

My questions are

  1. Is it possible to get the SMART drive information from the physical machine hosting proxmox
  2. Do I need to install the agent

…it would also be good to know my work around to name a host in checkmk is OK or if this is going to lead to trouble in the longer run.

  1. Yes, if 2. you install the agent.

Stuff like mapping host harddrive status to VM guests should be discussed separetely.

The workaround you mentioned is OK, everybody does it, but it is going to lead to trouble of course in the longer run. Just do not think about it today.

OK…so the agent is running on proxmox and I am getting data

Is there a way of getting the JSON string or something…the reason I ask is I cannot find that SMART drive data result. (Was hoping it would be a line stating SMART results but that would be too easy)

Try a cmk-agent-ctl dump on the Proxmox server:

You have to install the agent plugin for smart checks on the Proxmox host.

1 Like

Hi

Was not clear but I am running the RAW version in a home lab
(That may be turn out to be a mistake and should use the free version or something)

I have done that and I am getting data come in to checkMK

In addition I also dumped out the info using < check_mk_agent >
Current cannot spot the SMART data but there is reference to the check running in 5 mins.

Can’t help thinking I missing something obvious as loads of people have done this.

I have added a service monitoring rule & Enforced service for SMART…In both cases I set it for specific host.

Not sure if I need to “re make and install the agent” as I am running on RAW which I believe does not support the bake / update function.

Lots of research seem to indicate that I need to install some kind of add, make a new agent and deploy but I am totally lost.

I would monitor the server hardware through the idrac management board.

1 Like

Interesting input.

I could use IDRAC or Proxmox and those to send emails out.
My preference is to use CheckMK send notification out via telegram based on the rules I set.

In addition my issue is that I have NVMe drive which is on a PCIE card.
It has died twice and want to monitor it.

I tried this and got a dump of information.

There is reference to the SMART check and when it will run next but not the actual SMART check data

What i meant was i would use checkmk to monitor the hardware status of the server through the idrac.

https://exchange.checkmk.com/p/redfish

Thanks I will take a look at this

OK…after some sleep and re-reading the documentation I have made progress and have the agent dumping out SMART data on the machine I want to monitor.

In addition I now understand a little more about what the RAW version of CheckMK is…
(I am stupid but for anyone else reading this you need to make changes on the machine you want to measure.

The problem is that it does not include the drive I am interested in.

This command gets the results I want
smartctl -a /dev/nvme0n1p1

This is my code for the custom plugin which is not getting the data

#!/bin/bash

# Get a list of SATA disks
sata_disks=$(ls /dev/sd[a-z]*)

# Loop through each SATA disk and get SMART data
for disk in $sata_disks; do
    echo "<<<smart:$disk>>>"
    smartctl -a $disk
done

# Get a list of NVMe disks
nvme_disks=$(ls /dev/nvme* | grep -v '[0-9]$') # Exclude partitions

# Loop through each NVMe disk and get SMART data
for disk in $nvme_disks; do
    echo "<<<smart:$disk>>>"
    smartctl -a $disk
done

managed to solve this…

If there is a better code please let me know but for reference this is what worked for me

#!/bin/bash

# Get a list of SATA disks
sata_disks=$(ls /dev/sd[a-z]*)

# Loop through each SATA disk and get SMART data
for disk in $sata_disks; do
    echo "<<<smart:$disk>>>"
    smartctl -a $disk
done

# Get a list of NVMe disks
nvme_disks=$(ls /dev/nvme*)

# Loop through each NVMe disk and get SMART data
for disk in $nvme_disks; do
    echo "<<<smart:$disk>>>"
    smartctl -a $disk
done