HPE iLO Restful Redfish API Agent

natrix · August 22, 2022, 4:15pm

It looks like degraded fans are resulting in a crashed check:

Exception: TypeError (value for metric must be float or int, got 'undef')
Traceback: File "/omd/sites/intel/lib/python3/cmk/base/checking.py", line 581, in get_aggregated_result
    result = _aggregate_results(check_function(**kwargs))
  File "/omd/sites/intel/lib/python3/cmk/base/checking.py", line 812, in _aggregate_results
    perfdata, results = _consume_and_dispatch_result_types(subresults)
  File "/omd/sites/intel/lib/python3/cmk/base/checking.py", line 856, in _consume_and_dispatch_result_types
    for subr in subresults:
  File "/omd/sites/intel/lib/python3/cmk/base/api/agent_based/register/check_plugins.py", line 89, in filtered_generator
    for element in generator(*args, **kwargs):
  File "/omd/sites/intel/local/lib/python3/cmk/base/plugins/agent_based/ilo_api_fans.py", line 65, in check_ilo_api_fans
    yield Metric("perc", perc, boundaries=(0, 100))
  File "/omd/sites/intel/lib/python3/cmk/base/api/agent_based/checking_classes.py", line 219, in __new__
    raise TypeError("value for metric must be float or int, got %r" % (value,))

Local variables:

{'__class__': <class 'cmk.base.api.agent_based.checking_classes.Metric'>,
 'boundaries': (0, 100),
 'cls': <class 'cmk.base.api.agent_based.checking_classes.Metric'>,
 'levels': None,
 'name': 'perc',
 'value': 'undef'}

Could you please take a look at this?

Screenshots:

andreas-doehler · August 22, 2022, 7:28pm

Can you please test the version 3.5 here

There was a little bit of obsolete code in the function.

natrix · August 23, 2022, 6:58am

This looks pretty good, thanks for the quick support

MightyDinosaurus · September 30, 2022, 5:31am

Hi Andreas,
I tested your extension on some HPE servers. Working very good, only one system (ProLiant DL380 Gen10) does not work.

local/share/check_mk/agents/special/agent_ilo -u user -p pass 1.2.3.4
Traceback (most recent call last):
  File "local/share/check_mk/agents/special/agent_ilo", line 347, in <module>
    get_information(REDFISHOBJ)
  File "local/share/check_mk/agents/special/agent_ilo", line 94, in get_information
    ilogen, iloversion, prefix, res_dir = get_gen(redfishobj)
  File "local/share/check_mk/agents/special/agent_ilo", line 83, in get_gen
    if ilogen.split(' ')[-1] == "CM":
AttributeError: 'NoneType' object has no attribute 'split'

It seems that it has a problem with the version number string. Another server with same hardware and same iLO settings is doing fine. I also rebootet iLO.

Do you have any idea what I could do?

version numbers:

cmk 2.0.0p28
hpe_ilo 3.4
iLO fw 2.72
redfish 3.1.7

Update: fixed by reseting ilo to factory defaults and configured again with same settings

andreas-doehler · September 30, 2022, 5:48am

What you can do is, before line 76 insert the following line.

    print(response_data)

The output you can sent me as PM. Then i see what the raw response is from the iLO and also what’s wrong with the data.

eric1 · October 10, 2022, 1:53pm

Hello,
I’m getting exactly the same error as MightyDinosaurus when I try to monitor a hpe synergy blade with the following constellation:
cmk: 2.1.0p9.cee
hpe_ilo-3.6.mkp
iLO 5 2.60
redfish-3.1.7

andreas-doehler · October 10, 2022, 1:55pm

What happens with the extra debug output I mentioned in the last post?
I saw that there is an change of the API output between v2.71 and v2.72.
They changed the API version and with this the output schema.

With 2.72 i need an other way to extract the version number as it is not available inside the direct response.

eric1 · October 10, 2022, 2:19pm

I’d like to share the output of print(response_data) with you but I can’t find a button to send a PM.

andreas-doehler · October 10, 2022, 7:47pm

@eric1 and @MightyDinosaurus i uploaded a slightly modified version of the special agent with package version 3.7.
Can you please test if this works. I have only in some days the possibility to test it on one of my “own” systems.

If it works as expected i will also upload this then to the exchange.

eric1 · October 11, 2022, 6:44am

Hello Andreas,

with version 3.7 the command gives a lot of good looking output with temperature, memory, cpu, drives, raid and so on. Only section fans is empty. But in checkmk I see no such services.

I set up a folder with “Agent HPE iLO Configuration” rule where username and password are entered. In this folder I added the host and entered its management board IP. Service “Check_MK” is in state OK and service summary says “[special_ilo] Success, execution time 0.3 sec” but no other services are discovered. Did I miss something?

andreas-doehler · October 11, 2022, 6:53am

What do you see on the command line with a “cmk --debug -vvI hostname”?

eric1 · October 11, 2022, 7:59am

From “cmk --debug -vvI hostname” I could see that the hpe ilo special agent tried to query the host’s IP, not the management board IP. After changing the servers IP to its management board IP the services are successfully discovered.
I’m used to have the server and its management board as one single host in monitoring which shows output of the checkmk agent installed in the servers OS and iLO data together. Isn’t this possible/intended with iLO special agent?

MightyDinosaurus · October 11, 2022, 12:30pm

updated from 3.4 to 3.7.
Looking good, to be honest, I do not see any change beside the version number

andreas-doehler · October 11, 2022, 12:33pm

Only your “Check_MK Agent” should show no complete version number of the iLO firmware anymore.
That is the only change.

MightyDinosaurus · October 11, 2022, 12:35pm

Yes. changed from “iLO 4.28” to “iLO 2.80”, which is the correct version number

andreas-doehler · October 11, 2022, 12:35pm

This is not possible with the broken management board function of CMK. I would also not recommend to use the management board function.
It is also better from a management standpoint to separate hardware (iLO) from software (OS).

MightyDinosaurus · October 11, 2022, 12:40pm

Yes, this would also be my recomendation. Create the host as host and another host for the management board.

I lowerded the cpu load of two cmk systems by 20% just by moving the management board to a seperate host. Same hosts, same data, same polling interval, same credentials, same snmp version and settings. CPU load was increased with the update from cmk 1.6 to 2.0 and the switch from net-snmp to their own snmp implementation. So something seems messed up with the internal fetcher, maybe by different timings to put cmk agent and snmp data together.

foo · November 2, 2022, 5:29pm

Andreas,

My environment is 90% IPv6 only and I was unable to get the “agent_ilo” to work correctly with IPv6 addresses because it looks like it was missing the [] around the address. I made some mods to the “agent_ilo” to make it work. Am I missing something and that’s why I can’t get IPv6 addresses to work properly?

andreas-doehler · November 7, 2022, 7:26am

I tested with one server with ipv6 and it is needed to include the interface in the address.
This looks like a problem of the Python requests library.
The following IP format was working on the command line

[fe80::9640:c9ff:fe41:1234%eth0]

eth0 is the interface name on my monitoring server

r.sander · November 7, 2022, 8:55am

IPv6 Link local addresses always need the interface name attached as otherwise the kernel is not able to know which link to use to send out the packets. This is independent from the operating system.