Including other counters from smart plugin

I’m familiar with customizing counters and tresholds from other plugins, but the smart plugin seems to be hard coded, is that correct? I don’t see how I can customize it in WATO.

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   133   133   054    Pre-fail  Offline      -       108
  3 Spin_Up_Time            0x0007   177   177   024    Pre-fail  Always       -       345 (Average 397)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       88
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       165
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       88
 22 Helium_Level            0x0023   091   091   025    Pre-fail  Always       -       91
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       107
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       107
194 Temperature_Celsius     0x0002   176   176   000    Old_age   Always       -       34 (Min/Max 25/42)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

I can see the following in check mk:

Pending sectors
Power cycles
Reallocated events
Reallocated sectors
Spin retries
UDMA CRC errors
Uptime

Some of these are totally uninteresting to me and I would like to remove them. The following I would like to include:

Raw_Read_Error_Rate
Seek_Error_Rate
Helium_Level

Especially the Helium_Level is of incredible importance. This number tends to fluctuate during use, but when it drops below 25 it will change to “FAILING NOW”. If this happens within the warranty period, the drive can be sent back and will get replaced under warranty. So I would really like to monitor this value and keep the graphs to monitor fluctuations in helium levels.

Regarding the other 2, I am aware some vendors (Seagate!) uses some fields for their own purpose. HGST doesn’t and raw read errors are actually real and important values. I recently replaced a disk because read performance was extremely slow, while writing was still fast. I noticed that this counter was going up into large numbers, while a second disk, exactly the same model had fast read performance with this number still on 0.

Thanks for the help!

You are right. The only thing that is configurable via WATO are the temperature levels (if any) from the S.M.A.R.T. check. The rest is hard coded (both the “interesting” variables and their WARN/CRIT levels).

If you want to change this, you will need to copy the check plugin from ~/share/check_mk/checks/smart to ~/local/share/check_mk/checks/smart and then modify that copy.

1 Like

Yeah thats python, I can edit a bash script but no python. I glanced over it and I broke it just by looking at it.
Edit:
Actually its not that complicated. I’m gonna give it a shot :P.

1 Like

So, I copied and edited it, but Check MK is not picking up the new values, I doubt its picking up the new ‘smart’ check.

How do I check which its using?

That’s tricky. I usually add some (nonsense) text to the regular output. Sometimes it helps to run cmk -R because checkmk precompiles the checks.

1 Like

I haz python skillz :stuck_out_tongue_closed_eyes:

Honestly that was really easy. The only thing that bothers me is the underscore in the name, the other graphs have pretty names. I figured this line was responsible for it:

('', 'Helium_Level', 'Helium level'),

But as you can see that is without an underscore.

Any ideas?

Thanks!

Congratulations! :clap: :slight_smile:

What you need is a metric plugin or an extension to the existing metric plugin.

Add the following code in the new file ~/local/share/check_mk/web/plugins/metrics/smart.py:

# define a new metric:
metric_info["harddrive_helium_level"] = {
    "title": _("Helium Level"),
    "unit": "count",
    "color": "24/a",
}

# map the performance counter 'Helium_Level' to the new metric:
check_metrics["check_mk-smart.stats"].update({
    "Helium_Level": {
        "name": "harddrive_helium_level"
    },
})

This usually requires a cmk -R. Occasionally the graph is shown twice because checkmk caches some things. If that is the case, try omd restart.

The original metric definition can be found in ~/lib/python/cmk/gui/plugins/metrics/check_mk.py. See there for more examples. Search for check_mk-smart.stats.

1 Like

Thanks for the help Dirk!

Unfortunately, the new code throws an error:
Error in plugin file /omd/sites/abyss/local/share/check_mk/checks/smart: name '_' is not defined

This had me chasing a rabbit because I read “name is not defined”, but I think after rereading, it means the underscore in this line:
"title": _("Helium Level"),
However, when I remove that underscore, it comes back with the following:
Error in plugin file /omd/sites/abyss/local/share/check_mk/checks/smart: name 'metric_info' is not defined

I tried some guesses and repositioning the code within the file but I’m stuck :frowning_face:.

Notice the different directory for the metric plugin.
Also, it is (for now) ok to drop the underscore.

Sorry, selective reading.

One more if you don’t mind, you’ve been a great help already.

How can I set tresholds for this new metric? Temperature can be overridden with the defaults in the local smart script. Below the factory default setting for temperature, there is a codeblock with the other counters with levels for warning and critical, but that doesn’t seem to do anything for helium. Besides, the helium counter goes down from 100 to 0, with a SMART failure trigger at 25. I would want a warning at 50, and a critical at 30.

Can you help me set that up please?

Thanks again!

I’m sorry. I don’t get this check plugin. It’s full of weird code and pretends to define thresholds (even though they aren’t configurable via WATO) but that’s fake. Although the variable

smart_stats_default_levels = {
    'realloc_events': (1, 1),
    ...
}

in the upper part of the check suggests there are thresholds for some values, that variable is never used.

A hint is even given in the check itself as a comment:

# TODO: Need to completely rework smart check. ...

So true. As it currently is, only two variables can trigger a CRIT: Available_Spare and Reallocated_Event_Count.

You may try the following: after this block:

if field == "Available_Spare":
    state = 2 if value < ref_value else 0
    hints = ["during discovery: %d (!!)" % ref_value] if value < ref_value else []
else:
    state = 2 if value > ref_value else 0
    hints = ["during discovery: %d (!!)" % ref_value] if value > ref_value else []

add the following code:

if field == "Helium_Level":
    if value <= 30:
        state = 2
    elif value <= 50:
        state = 1
    else:
        state = 0

I have no idea if that works and I don’t intend to re-write the check plugin to make it work and be configurable.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.