It looks like degraded fans are resulting in a crashed check:
Exception: TypeError (value for metric must be float or int, got 'undef')
Traceback: File "/omd/sites/intel/lib/python3/cmk/base/checking.py", line 581, in get_aggregated_result
result = _aggregate_results(check_function(**kwargs))
File "/omd/sites/intel/lib/python3/cmk/base/checking.py", line 812, in _aggregate_results
perfdata, results = _consume_and_dispatch_result_types(subresults)
File "/omd/sites/intel/lib/python3/cmk/base/checking.py", line 856, in _consume_and_dispatch_result_types
for subr in subresults:
File "/omd/sites/intel/lib/python3/cmk/base/api/agent_based/register/check_plugins.py", line 89, in filtered_generator
for element in generator(*args, **kwargs):
File "/omd/sites/intel/local/lib/python3/cmk/base/plugins/agent_based/ilo_api_fans.py", line 65, in check_ilo_api_fans
yield Metric("perc", perc, boundaries=(0, 100))
File "/omd/sites/intel/lib/python3/cmk/base/api/agent_based/checking_classes.py", line 219, in __new__
raise TypeError("value for metric must be float or int, got %r" % (value,))
Hi Andreas,
I tested your extension on some HPE servers. Working very good, only one system (ProLiant DL380 Gen10) does not work.
local/share/check_mk/agents/special/agent_ilo -u user -p pass 1.2.3.4
Traceback (most recent call last):
File "local/share/check_mk/agents/special/agent_ilo", line 347, in <module>
get_information(REDFISHOBJ)
File "local/share/check_mk/agents/special/agent_ilo", line 94, in get_information
ilogen, iloversion, prefix, res_dir = get_gen(redfishobj)
File "local/share/check_mk/agents/special/agent_ilo", line 83, in get_gen
if ilogen.split(' ')[-1] == "CM":
AttributeError: 'NoneType' object has no attribute 'split'
It seems that it has a problem with the version number string. Another server with same hardware and same iLO settings is doing fine. I also rebootet iLO.
Do you have any idea what I could do?
version numbers:
cmk 2.0.0p28
hpe_ilo 3.4
iLO fw 2.72
redfish 3.1.7
Update: fixed by reseting ilo to factory defaults and configured again with same settings
Hello,
I’m getting exactly the same error as MightyDinosaurus when I try to monitor a hpe synergy blade with the following constellation:
cmk: 2.1.0p9.cee
hpe_ilo-3.6.mkp
iLO 5 2.60
redfish-3.1.7
What happens with the extra debug output I mentioned in the last post?
I saw that there is an change of the API output between v2.71 and v2.72.
They changed the API version and with this the output schema.
With 2.72 i need an other way to extract the version number as it is not available inside the direct response.
@eric1 and @MightyDinosaurus i uploaded a slightly modified version of the special agent with package version 3.7.
Can you please test if this works. I have only in some days the possibility to test it on one of my “own” systems.
If it works as expected i will also upload this then to the exchange.
with version 3.7 the command gives a lot of good looking output with temperature, memory, cpu, drives, raid and so on. Only section fans is empty. But in checkmk I see no such services.
I set up a folder with “Agent HPE iLO Configuration” rule where username and password are entered. In this folder I added the host and entered its management board IP. Service “Check_MK” is in state OK and service summary says “[special_ilo] Success, execution time 0.3 sec” but no other services are discovered. Did I miss something?
From “cmk --debug -vvI hostname” I could see that the hpe ilo special agent tried to query the host’s IP, not the management board IP. After changing the servers IP to its management board IP the services are successfully discovered.
I’m used to have the server and its management board as one single host in monitoring which shows output of the checkmk agent installed in the servers OS and iLO data together. Isn’t this possible/intended with iLO special agent?
This is not possible with the broken management board function of CMK. I would also not recommend to use the management board function.
It is also better from a management standpoint to separate hardware (iLO) from software (OS).
Yes, this would also be my recomendation. Create the host as host and another host for the management board.
I lowerded the cpu load of two cmk systems by 20% just by moving the management board to a seperate host. Same hosts, same data, same polling interval, same credentials, same snmp version and settings. CPU load was increased with the update from cmk 1.6 to 2.0 and the switch from net-snmp to their own snmp implementation. So something seems messed up with the internal fetcher, maybe by different timings to put cmk agent and snmp data together.
My environment is 90% IPv6 only and I was unable to get the “agent_ilo” to work correctly with IPv6 addresses because it looks like it was missing the [] around the address. I made some mods to the “agent_ilo” to make it work. Am I missing something and that’s why I can’t get IPv6 addresses to work properly?
I tested with one server with ipv6 and it is needed to include the interface in the address.
This looks like a problem of the Python requests library.
The following IP format was working on the command line
[fe80::9640:c9ff:fe41:1234%eth0]
eth0 is the interface name on my monitoring server
IPv6 Link local addresses always need the interface name attached as otherwise the kernel is not able to know which link to use to send out the packets. This is independent from the operating system.