CMK version: 2.2.0p18.cre OS version: Rocky Linux release 9.3 (Blue Onyx)
Error message:
We have recently started evaluating Checkmk. We have configured Checkmk to discover our Cisco switches and generally it’s working really well. We do however have some monitoring issues regarding the temperature sensors in the switches.
For example, one switch reports this from its CLI:
#sh env temperature
Switch 1: SYSTEM TEMPERATURE is OK
Inlet Temperature Value: 28 Degree Celsius
Temperature State: GREEN
Yellow Threshold : 46 Degree Celsius
Red Threshold : 56 Degree Celsius
Hotspot Temperature Value: 46 Degree Celsius
Temperature State: GREEN
Yellow Threshold : 105 Degree Celsius
Red Threshold : 125 Degree Celsius
But Checkmk reports warnings against both of these sensors. Taking the first one as an example, the summary listed against the service in Checkmk states:
Temperature: 28 °C (warn/crit below 31.0 °C/-10.0 °C) WARN
It appears to be detecting the critical threshold incorrectly. However, the service graph shows the correct values for warning and critical thresholds (46°C and 56°C)!
Has anyone else seen this and know how to resolve it? Many thanks.
The warn says, that the lower level detect that temperature is lower threshold 31 C. This is correct, your temp is 28. You need to correct the lower level values in the rule.
Rg, Christian
Many thanks for your reply. It appears that the check is detecting the warning threshold as 31°C and the critical threshold as -10°C, therefore the value (28°C) is between warning and critical. However, the switch knows that the warning threshold is 46°C and the critical threshold is 56°C (from the CLI output)! These values are correctly reflected in the graph but not the check result!
I’m happy to help to try to locate the issue with the check command but I’m a complete novice when it comes to Checkmk. I have years of experience monitoring with a different product, though. I’ve found the check command here cisco_temperature.py but sadly my Python skills are lacking!
Unfortunately I have no Cisco device to reproduce the behaviour.
But maybe you can create a rule for the temperature monitoring and play a bit with the setting “Interpretation of the device’s own temperature status”. There are options to use either “your” thresholds or the device’s.
Many thanks for the pointers. I hadn’t appreciated that the low thresholds were actually coming from the switch as they’re not exposed through the CLI.
It would appear that Cisco have got the warning low threshold wrong on this particular switch (3850). I shall endeavour to discover if this has been fixed in a later version of IOS and upgrade if there is.
In the meantime, I’ve added a rule to set the low thresholds to a value that will never be reached!
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.