HPE Integrated Lights-Out (ILO) 5 3.0 breaks storage monitoring on SNMP Management Board

I upgraded to CheckMK 2.3 and migrated all HPE SNMP queries to the Redfish special agent. Works better and faster.

I agree, but my HPE support ticket ended somewhere in the space between level 2 and engeneering…

I can only say - for all big manufacturers - they dropped SNMP support to something like this - if it works it’s fine, if it does not work it is also fine.

ILO5 firmware 3.10 is out now and mentions a fix for storage MIB… not tried yet :smiley:

** Fixed the issue with missing storage controller condition values with the IDA condition for the MIB status array for the SR Controller Gen 10 Smart Array and MR Controller Gen 10 Smart Array.*

EDIT: now tested, drives are detected, fans are detected, controller is detected, but “board condition” still gets status ‘other’ so orange… hopeless :-o

Same boat, getting orange “unknown” on controllers. The OID ( cpqDaCntlrBoardStatus / .1.3.6.1.4.1.232.3.2.2.1.1.10) is now responding with value 8 which is unknown to the version of cqpida.mib I already had downloaded (only went up to 7), the latest MiB goes all the way up to 19, 8 translates to “Enabled” on this new MiB.

@pirx if you’re getting yellow exclamation on overall status but ticks on controller/drives ect may be your cache module battery that’s failed, iirc it’s tricky to find the actual warning about that within iLO
Edit: Power & Thermal > Power I think will show you the cache module battery status, though seems to be called “Smart Storage Energy Pack” now

cache module is fine, HPE had to admit a few months ago that this is a bug that was introduced with ILO 5 v3.01. Before that I had to do a dozen fw updates for this and that. It really nice to use customer environment as their lab.

Advisory: HPE Integrated Lights-Out 5 -In iLO 5 Version 3.01 or Later, the iLO GUI Health Summary Displays an Error as “Storage Degraded” When No Storage Issue Has Actually Occurred

2 Likes

Their advice of the clean shutdown of the host is nonsense, at least on the HPE MR416i-a Controller, this makes no difference at all to the SNMP value of the controller status.

As info, I just tried iLO5 firmware version 3.11 on a new HP DL380 Gen10+ server with MR416i-a array controller and still the same “other” condition coming back from SNMP in the CheckMK monitoring. Ever more worse, I just took out 1 drive, and ILO5 website clearly says the disk is now absent, but SNMP in CheckMK still states all is OK and the drive is present.
The drive details also don’t match the real values. Unbelievable how buggy this is for such a long time :frowning:

I can only say it again - don’t use SNMP - change to IPMI or Redfish booth are using the same data from the management controller with different detail grade.

2 Likes

Yes I finally gave up and switched to Redfish :smiley:

1 Like

With Redfish and iLO5 i have only one problem that the identifier for controllers or drives can change after reboot or firmware upgrade. I don’t know why and i found no persistent ID inside the data that i can use.

1 Like

Hello,
I have upgraded one iLO to release 3.15 and after a re-scan of the host services the only issue remained is RAID controller board-condition status, which is “other”
For iLO 2.99 the SNMP transalate

.1.3.6.1.4.1.232.3.2.2.1.1.10.0 2 --> CPQIDA-MIB::cpqDaCntlrBoardStatus.0
.1.3.6.1.4.1.232.3.2.2.1.1.11.0 1 --> CPQIDA-MIB::cpqDaCntlrPartnerBoardStatus.0
.1.3.6.1.4.1.232.3.2.2.1.1.12.0 **2 --> CPQIDA-MIB::cpqDaCntlrBoardCondition.0**
.1.3.6.1.4.1.232.3.2.2.1.1.13.0 1 --> CPQIDA-MIB::cpqDaCntlrPartnerBoardCondition.0
.1.3.6.1.4.1.232.3.2.2.1.1.14.0 1 --> CPQIDA-MIB::cpqDaCntlrDriveOwnership.0
.1.3.6.1.4.1.232.3.2.2.1.1.15.0 PEYHC0DRHBZ2E0 --> CPQIDA-MIB::cpqDaCntlrSerialNumber.0

For iLO 3.16 SNMP translate

.1.3.6.1.4.1.232.3.2.2.1.1.10.0 8 --> CPQIDA-MIB::cpqDaCntlrBoardStatus.0
.1.3.6.1.4.1.232.3.2.2.1.1.11.0 1 --> CPQIDA-MIB::cpqDaCntlrPartnerBoardStatus.0
.1.3.6.1.4.1.232.3.2.2.1.1.12.0 1 --> CPQIDA-MIB::cpqDaCntlrBoardCondition.0
.1.3.6.1.4.1.232.3.2.2.1.1.13.0 **1 --> CPQIDA-MIB::cpqDaCntlrPartnerBoardCondition.0**
.1.3.6.1.4.1.232.3.2.2.1.1.14.0 1 --> CPQIDA-MIB::cpqDaCntlrDriveOwnership.0
.1.3.6.1.4.1.232.3.2.2.1.1.15.0 PEYHC0DRHBZ2E0 --> CPQIDA-MIB::cpqDaCntlrSerialNumber.0

There is any chance that CheckMK would acknowledge this condition as “normal” condition ?