Smart_stats plugin reporting wrong values

smartctl output:

root@server02:~# smartctl -A /dev/disk/by-id/ata-SDLF1DAM-800G-1HA1_A02A4508 | grep Power_On_Hours
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       3556 (10 65 0)

Checkmk output:

SDLF1DAM-800G-1HA1_A02A4508 ATA SDLF1DAM-800G-1H 9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 10449655434724

The actual raw hexadecimal value:

1300000DE4

0xDE4 = 3556 hours

I asked on Reddit on r/datahoarder and people have been saying that it’s totally normal for the first few bits to represent something different. smartctl -A takes this into account correctly, but the command that the checkmk plugin use doesn’t, most likely because it takes the raw value directly instead. This is a bug in the plugin for some drive manufacturers.

I think there is no real and easy solution for this problem.
On one of my test machines the Output from “smartctl -A” looks like this 59675h+10m+30.640s for the power on hours.
In your case, you have also some extra output beside the real hours.
I would think this is the reason why they fetch the RAW value for the power on hours with the agent plugin.
Also this text output must not be correct. It looks like there are Intel SSDs there outside who output something like this. 931146h+18m+05.460s over 100 years :smiley:
The power on hours are not relevant on my systems only the error or wear counter.
https://utcc.utoronto.ca/~cks/space/blog/tech/SMARTWeirdPowerOnHours
In the end don’t trust smart values too much.

Haven’t had much time to put into this but the easy workaround is to edit the smart agent plugin. There are already exceptions in the agent plugin for specific vendors with different parameters for smartctl. Adding an exception for my drives and doing a pull request would be the best outcome overall, but I suspect there are too many specific cases for Checkmk to start merging pull requests like this.

The ideal would be to add an option in WATO to either use the output of smartctl -a as-is or the raw values. There would be a default option and if it doesn’t work with your drives, you can try the other one. I haven’t (yet) learned how to code to add WATO settings, so the quick fix for me now is just to remove the option to use raw values in the plugin for this specific server I’m deploying the plugin on:

sed -i -e 's/-v 9,raw48 //g' /usr/lib/check_mk_agent/plugins/smart

Since the plugin deployment is all scripted on my side, this should work for the foreseable future.