HPe iLO6 disk items end up as "UNKNOWN" in cmk

CMK version:
Checkmk Raw Edition 2.2.0p12

OS version:
Debian 12.4

Error message:
We are running iLO 1.52 on ProLiant DL360 Gen11 (with MR408i-o Gen11).
Monitoring the management board over SNMPv3.

Some items end up as “unknown” in cmk, while in iLO itself everything is OK.

Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)

I am posting the relevant part of it:

Management Interface: HW Controller 14 Condition: ok, Board-Condition: other, Board-Status: other, (Role: other, Model: 1, Slot: 14, Serial: XXX)
Management Interface: HW Phydrv 14/1 Bay: 1, Bus number: -1, Status: ok, Smart status: ok, Ref hours: 0, Size: 3200631439360MB, Condition: other
Management Interface: HW Phydrv 14/2 Bay: 4, Bus number: -1, Status: ok, Smart status: ok, Ref hours: 0, Size: 3200631439360MB, Condition: other
Management Interface: HW Phydrv 14/3 Bay: 2, Bus number: -1, Status: ok, Smart status: ok, Ref hours: 0, Size: 3200631439360MB, Condition: other
Management Interface: HW Phydrv 14/4 Bay: 3, Bus number: -1, Status: ok, Smart status: ok, Ref hours: 0, Size: 3200631439360MB, Condition: other
Management Interface: Logical Device  239 Status: other(?), Logical volume size: 8.73 TiB

The important information here is the direct output of an snmpwalk over your iLO interface. It is possible that these devices show an unknown status code entry.
To check the data you can also access the iLO Rest API to look for the problematic controller and drives.

Looks like a HPE firmware problem. Have a look in this thread https://forum.checkmk.com/t/hpe-integrated-lights-out-ilo-5-3-0-breaks-storage-monitoring-on-snmp-management-board/44182/19

Looks like this new “feature” was backported to iLO5, too. :frowning:

The difference is that there was no problem with iLO6 and this data over REST API.
For SNMP i cannot say anything.

HPE yesterday released a new iLO6 1.57 which fixes HW Phydrv and Logical Device.
Software Details - Online ROM Flash Firmware Package - HPE Integrated Lights-Out 6 | HPE Support

Board condition is still reported as other
HW Controller 14 Condition: ok, Board-Condition: other, Board-Status: other, (Role: other, Model: 1, Slot: 14, Serial: xxxx)

1 Like

Is this problem also there if you query the data with the redfish REST interface?

Sorry Andreas. I missed your reply :frowning:

I haven’t tried it, because we are only using SNMPv3.

HPE releases 1.59 and it is still not fixed.
Online ROM Flash Firmware Package - HPE Integrated Lights-Out 6 1.59| HPE Support
On iLO6 the check now crashes instead of getting the status “other” :worried:

Dear all,

As I can see, iLO6 1.59 and 1.60 are both worked for us. the version of cmk I am using is raw 2.2.0p17, SNMP V2

I can confirm it. With iLO6 1.60 and SNMPv3, queried by CheckMK 2.2p32, it works.

Currently on Checkmk 2.3p21, iLO 1.65 and latest firmware on both raid controllers.
Getting the Unknown message, because the board-condition is reported as “other”:

HW Controller 16     Condition: ok, Board-Condition: other, Board-Status: enabled, (Role: other, Model: 1, Slot: 16, Serial: PXTYH0ARHXXXX)
HW Controller 2      Condition: ok, Board-Condition: other, Board-Status: enabled, (Role: other, Model: 1, Slot: 2, Serial: PZDLA0B52JXXXX)

Doing a snmpbulkwalk, I am receiving the “1” value (corresponding to the condition “other”), so shouldn’t be a problem of the check script:
.1.3.6.1.4.1.232.3.2.2.1.1.12.2 = 1
.1.3.6.1.4.1.232.3.2.2.1.1.12.16 = 1

However, I do not see any strange status related to the controllers in the iLO GUI.
Is anyone else experiencing this?

Thanks.
Regards.

In case anyone is having this trouble, you can switch to the Redfish plugin. Problem is not present with it, showing the RAID controllers with status OK.
The Redfish plugin seems to be working fine, except for the timeout issues commented on other threads (I had to increase them).

Same Problem:
Integrated Lights-Out 5 3.10 Dec 12 2024

“Condition: ok, Board-Condition: other, Board-Status: enabled, (Role: other, Model: 1, Slot: 0, Serial: XXX”

HP GEN10 Servers

We tried the Redfish implementation but that is missing stuff we would like to see.

Josef

What is missing there, compared to the SNMP output?

for example die “Seriel Number” of the HDDs (shown in the inventory with ILO-Check)
or the perf. data of the network interfaces. The “Service description” of the Drives is also a bit confusing. The inventory is not that cool at all.
If I should give some feedback here can do that, I think that Post here is not the right place.

Josef

The beta thread for the Redfish plugin is now closed but you can create a new topic with your points you want to see inside the Redfish integration.

Serial number for devices (HDD/memory/PSU) should be not the biggest problem as the data is available.

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.