iLO6 Services UNKOWN after firmware update to 1.76

CMK version: 2.4.0p28
OS version: Debian 12

Error message: all services unkn

Output of “cmk --debug -vvn hostname”: all services are displayed correctly

After updating the firmware to 1.76, iLO6 shows all services as “unkn” in CMK. However, when I enter ‘snmpwalk’ or “cmk --debug -vvn” in the terminal, all services are displayed correctly.
The credentials are correct. Even when creating a completely new host, all services still show as “unkn.”
Does anyone have any idea what might be causing this?
Thank you for your support.

1 Like

Is your iLO monitored with SNMP or with Redfish integration?

Our iLO is monitored via SNMP.

Then it is strange that it is working on command line with “cmk --debug -vvn iLO” and inside the web gui not.

I’ve already disabled the active checks and the bulkwalk.

The two services, check_mk and check_mk discovery, also update occasionally. However, the other services are now stuck at “Warn.”

Check_MK and Check_MK Discovery you cannot disable as the “Check_MK” service produces all the other data used for the check.
All the other checks are “Yellow” as you tried an active check on these. I would first reset all changed attributes of these services.
It is very important to not play around with activate and deactivate active/passive services on such a host.

After I undid all the changes, Check_MK remains in the “Service Check Timed Out” state.

Should I increase the timeout?

Unfortunately, we haven’t been able to resolve the issue yet.

What’s strange is that it used to work without having to increase the timeout or make any special settings in general. On one server, it currently works without any additional adjustments, such as changing the timeout.

That is the server that is currently not working:

and this is the server that works, with the same settings:

We changed the timeout to 60 seconds as a test, since the service check keeps ending with a timeout, but that did not solve the problem. We received the following error message during the connection test:

“API Error: Error running automation call <>diag-host: Your request timed out after 110 seconds. This error may be related to your local configuration or a request that is processing too many objects at once. If you believe this is a software bug, please send us a crash report.”

We are using the latest 2.4.0p30 version of Checkmk Community.

If i see the graphs for the working and not working iLO interfaces i would say - booth are way too long on answer times. Do you have interface checks on these interfaces? If yes please remove these interface checks from the monitoring. The data for these interface checks is pulled from the OS and not from the iLO, this takes a very long time.
Do you have a complete list of services you monitor on such an iLO interface?

Also please check if the usage of the Redfish integration would decreases the time needed.

Removing the interfaces did help initially.

However, we would like to monitor the most important interfaces, just like on the other server where the service check works—even with more interfaces. Why do the checks work on one server but not on the other? Is there anything else we can configure, or would RedFish be the only solution?

We are currently still monitoring everything that is provided to us, including some hardware fans, hardware memory, and many temperature readings. In total, including the interfaces, there are 83 services. We still need to determine which of these services are truly important to us.

Thank you for your support.

Then you should do this from the OS not over the management interface as this also pulls the data from the OS. Only if you do it with SNMP. The interface checks inside Redfish only get the status from the management directly.

This depends more on the host OS and the installed or not installed management agents.

Here you will “only” get the status data of the components no traffic as the management interface has no own information about performance data.

If you want only one high level view of your management interface then you can do this with Redfish and only fetch the system state data. There you have then only a hand full of services. But these few services give you the roll-up state of the whole system.