CMK version: Checkmk Raw Edition 2.1.0p16
Plugin version: Plugin “smart” 2.1.0p16
OS version: Debian 11 “Bullseye” 5.10.0-23-amd64 SMP Debian 5.10.179-1 (2023-05-12) x86_64
This post was originally posted in german CMK forum: https://forum.checkmk.com/t/plugin-smart-service-state-flappt-aufgrund-pending-reallocated-sectors-bei-ssd
Dear Checkmk Community,
for monitoring the SSDs of our Debian servers, we use the Checkmk plugin “SMART”. Generally speaking, this works well and we get the SMART data of the SSDs shown up in Checkmk monitoring.
Nevertheless we have issues with the service state, sometimes flapping from OK to CRIT due to “Pending Sectors” or “Reallocated Sectors”. This is caused by e.g. “Pending Sectors: 1” being greater “Pending Sectors: 0” during discovery of the service and thus this is show as CRIT state in monitoring.
Occasionally there are more than 1 sectors pending, but the SSD ist not damaged. In the next monitoring cycle, the pending or reallocated sectors go back to 0 and thus service state is OK again.
What we have tried without any success:
-
It’s not possible to set any different thresholds for sectors to be WARN/CRIT by Checkmk-rules. To do so, probably the plugin has to be enhanced by such a feature.
-
We can bring the service state to a soft-CRIT state, but just for notifications and not for the event history dashboard, so the flapping service spams our event history.
Questions:
-
Do we have a wrong understanding of the SMART plugin? Is it intended for our use case?
-
What else can we do to quiet the flapping service state?
-
Is there any possibility to get the SMART parameters alternatively? (SNMP polling is no option for us due to design restrictions)
-
How do you use the SMART plugin?
-
How do you monitor SSD wear?
Thanks for your help!