Output of “cmk --debug -vvn hostname”: (If it is a problem with checks or plugins)
Network adapter DE080000 Model: P42046-001 / Mellanox MCX631102AS-ADAT Ethernet 10/25Gb 2-port SFP28 Adapter, SeNr: *******, PartNr: P42046-001
Network adapter DE082000 Item not found in monitoring data
I have this problem on one ILO Host in CheckMK which I used to test the redfish Plugin before rolling it out to all ILO Hosts. Occasionally, the Network Adapters are vanishing. I think it is always just one, on the next check the other and than both are back for some time. I could find any more logs so far, the cmk --debug -vvn hostname doesn’t show any error, just the message above. Any idea how I can provide more information?
I only know this to be a problem after reboots of the iLO interface or the server OS.
It only happens for interfaces as it looks like HPE is not using “stable” names for the devices.
The “DE080000” is the id of this device and it changes sometimes after reboot. I don’t know why.
And it only happens on some iLO versions not all.
Ok, but when this happens there is no reboot, neither the ILO or OS. The ID also doesn’t change. After the next check the service is back to OK with the same ID. The intervals this happens is less than 1h, more like couple of minutes, but I haven’t found an pattern yet.
It’s a iLO 6 FW Version 1.67 with iLO Advanced license. It might be another problem i just noticed. There is a reboot pending for firmware update. I get back to you tomorrow after the restart if the problem is still present.
Please define for the “NetworkAdapters” and “NetworkInterfaces” section in the special agent config a cache time of 600 seconds or so.
We will see if it also goes missing for larger query intervals.
It can also be a timeout issue for this section.
You can also try with the “Timeout for connection” setting. This timeout applies to every section for fetching the data.
Well, it looked to be working but the problem is now occuring rarer but longer. It tried changing it to 300s and 1200s but the higher you go, to longer and rarer the problem is occuring.
That’s clear as it uses then the cached section without the data for the time being.
Can you please have a look at the data (cached agent output) at the time of such a problem. It would be helpful to see if the data for this one object is completely missing or not.
So the agent outpus says: {"error": "NetworkAdapters data had a JSON decoding problem\n"} instead of {"@Redfish.Settings": {"SettingsObject": {"@odata.id": "/redfish/v1/Chassis/1/NetworkAdapters/DE082000/Settings"}}[...] when the error occours.
After Updating Our iLO from 2.9 to 3.13, we have the same problem. Message in Events:
[special_redfish] Agent exited with code 1: ERROR 2025-06-18 07:49:39 redfish.rest.v1: Service responded with invalid JSON at URI /redfish/v1/Chassis/1/NetworkAdapters/DE07A000
One minute later it goest to green. In our case error accours approx. every 60 minutes - I think this is our cache time
If the firmware is buggy and don’t output valid JSON, you can only disable the “NetworkAdapters” from the agent sections.
Already done. It really is due to the newer iLO version, since the message only appeared afterward. Interestingly, the network adapters exist twice: once as Network Adapter X (single-digit) and once as Network Adapter with postfix DEXXXX.
I’ve now disabled the DE; the single-digit adapters are present and monitored.