HPE Integrated Lights-Out (ILO) 5 3.0 breaks storage monitoring on SNMP Management Board

If they still support SNMP, they should not answer “other” for status “OK” in an SNMP walk in my opinion, not for any device.

Still not fixed in iLO5 3.0.4

It now counts the physical drives starting from 0 to 2 and not 1 to 3 anymore, but this is not an issue for me.

1 Like

I have the same issue here and after the upgrade the issue still exist. Any news here, or should I change to other checks/plugins?

@andreas-doehler
You wrote with latest firmware and Redfish plugin it looks ok?

It’s still an issue HPE’s end… We need more customers raising cases and putting pressure on them to correct the firmware really :frowning:

1 Like

It is now with 2.3.0 very easy to try if it is working with Redfish API as the needed Python modules and the Redfish MKP is included inside CMK.

The only problem what i saw was a very inconsistent behavior, between different iLO5 systems. All had the same 3.04 firmware but shown different behaviors.
It looks like that different used storage controllers, have the most impact if it is running smoothly or not.

In the end i would say, if you see your storage controller inside iLO web interface then you can also retrieve the data over the REST API. I had also 1 or 2 systems where the storage controller data was missing inside web and then there where also no data over API.

More than 3 months after HPE introduced this problem, and still no final fix. Amazing.
If there is one hardware piece you would like to know a correct status of, it’s your storage controller…

1 Like

HPE ILO 5 Version: 3.05 (21 Juni 2024)

  • Support for Fan speed in terms of percentage from SNMP OID Get and Walk.
  • Fixes for specific SNMP OIDs for RDE capable storage controllers for controller, physical drive, and logical drive properties.

I can acknowledge that some drives a are back. But there are a lot of changes in the dicovery. And the following services changed to crashed (I think it’s about the changes with the “Support for Fan speed”:

2 Likes

I can confirm this, ILO5 firmware version 3.05 still has issues.
The drives (logical & physical are now detected correctly.
But the hardware controller & fan checks now crash.
Amazing that this takes so long to fix. The quality of the releases is below par.

1 Like

same error in ILO6 version 1.60

3 Likes

Facing same issue with CMK2.3.0p9 and iLO5 3.05

Fan crashing with KeyError (30) (thats the percentage)

Sensor {index} "{label}", Speed is {hp_proliant_speed_map[int(speed)]}

EDIT: Got curious what this map is…

Yes, that makes no sense if the speed is set manually? :smiley:

Hi folks,
I am currently looking into this. The ‘fans’-map is not the only one that needs updating; I know I have to exend this one with ~15 more entries. But I doubt that we have 30+ distict named descriptions of how fast the fan is going :slight_smile: .
@KluthR , can you elaborate? Or, even better, point me to resources? It makes very much sense, but how do you know it’s percentage?

Because I set it:

It seems the „speed“ is either the mode or the manual set percentage.

I set the value to 35 and the crash reported number changes as well. So it must be my manual set percentage.

Hi, thank you for the explanation.

According to HPE’s latest MIBs, the OID in question should still report the states other(1), normal(2) and high(3), rather than percentage values. It seems that their latest firmware release introduced some bugs though:

REMOVED The Release version of iLO 5 v3.05 has exposed a bug, when the host server is installed with NS204i boot controllers. A new iLO 5 version fixing the issue will be released in the next couple of days.

Therefore, the situation is rather unstable at the moment. We’ve already contacted HPE. Until we get further information from them, we’ll stick to the information from the official MIB.

Hi, we have received a first response from HPE. They’re currently investigating if this is an error in the firmware.

1 Like

I realized that from time to time I had ‘not monitored drives’ in CheckMK from my ILOs. They probably appeared magically, because nobody changed the drives in the servers.

Has anyone tried 3.06, which was released recently?

1 Like

I would say - don’t ask :wink: HPE iLO5 v3.06 breaks HW check

1 Like

Hi, we are still waiting for a definite statement from HPE.

Unfortunately, they introduced breaking changes with their latest firmware updates which aren’t documented in their latest MIB. At the moment, they are investigating if these changes have been applied intentionally or not.

Therefore, the ball is in their court at the moment. We’ll keep you posted – as soon as we receive any further feedback from them, we’ll act on it.

2 Likes

In the meantime, we would recommend trying out Andreas Döhler’s fantastic Redfish integration.

It gets shipped with Checkmk 2.3. All you have to do is enable it via Setup > Maintenance > Extension packages.

For Checkmk 2.2, you can find it on GitHub.

Don’t forget to update to the latest version :wink:
There where some nice changes since the shipped version.

The 2.2 version is also available beside the 2.3 on the exchange. It can only take 1-2 days after a GitHub release.

1 Like