HPE iLO Restful Redfish API Agent

If someone want’s to test my actual version of the iLO agent, I build a actual package with only one special agent for iLO4 and 5. In my environment it runs with the latest patched iLO 4 and 5 without problem.

This topic was already discussed here Check_MK over iLO 5 by using RESTful API

You can find the actual mkp file on my github.

If an error happens at the special agent it would be good to get the agent output directly or if error happens at the check, the output from “cmk --debug -vvn hostname” would be the best for troubleshooting.

1 Like

Ah, you already did something. I bookmarked GitHub - bb-Ricardo/check_redfish: A monitoring/inventory plugin to check components and health status of systems which support Redfish. It will also create a inventory of all components of a system. and wanted to implemented a special agent based on that code when there is some free time (i.e. never :wink: ).

Some update News :slight_smile:
Agent for Dell Redfish looks also possible the same way. First tests are successful, i have only to select what is important to monitor as there like in HP devices so many single values to gather.

Hi guys,

I can confirm that this solution https://github.com/bb-Ricardo/check_redfish works with HP servers perfectly.

Why using a classic check if you can use a special agent?
The classic check can also have some problems inside CMK if it really needs Python 3.

Hi Andreas,

im using your redfish plugin on over 60 HPE servers with different models and GEN 9/10 now and the plugin runs perfectly fine !
Should be the default in checkmk in my opinion as HPE seems to have a lot of problems either in their ilo or snmp code, that leads to more than 60/70s of runtime for snmpwalks on some servers.
The redfish agent is running 1.8 seconds in average :slight_smile:

1 Like

The time for the data collections depends most on the amount of memory modules and hard drives.
It it possible with big servers that it takes up to 30 seconds.

Hi Andreas

I am trying out your special agent, and getting this error on a BL460 G10 and Synergy 480 G10

[special_ilo] ERROR: Agent exited with code 1: Traceback (most recent call last):
File “/omd/sites/site/local/share/check_mk/agents/special/agent_ilo”, line 284, in
get_information(REDFISHOBJ)
File “/omd/sites/site/local/share/check_mk/agents/special/agent_ilo”, line 166, in get_information
psus = response.dict[“PowerSupplies”]
KeyError: ‘PowerSupplies’

I am told they dont have seperate power supplies so I dont know if that has any relevance, or if this should even work on synergy kit?

I think this is possible to fix. I will look later today if i find some quick solution.

Hi Andreas,

looks like i can’t get the special agent to work at all. All I get is this:
OMD[moni]:~$ /opt/omd/sites/moni/local/share/check_mk/agents/special/agent_ilo -u rhev -p password 10.10.10.130
Traceback (most recent call last):
File “/opt/omd/sites/moni/local/share/check_mk/agents/special/agent_ilo”, line 276, in
REDFISHOBJ = RedfishClient(base_url=iLO_host, username=iLO_account,
File “/omd/sites/moni/local/lib/python3/redfish/rest/v1.py”, line 481, in init
super(RedfishClient, self).init(default_prefix=’/redfish/v1/’, is_redfish=True,
File “/omd/sites/moni/local/lib/python3/redfish/rest/v1.py”, line 211, in init
self.auth_type = self._get_auth_type(auth, ca_cert_data=ca_cert_data, **client_kwargs)
File “/omd/sites/moni/local/lib/python3/redfish/rest/v1.py”, line 232, in _get_auth_type
if ‘cert_file’ in ca_cert_data and ca_cert_data[‘cert_file’]:
TypeError: argument of type ‘NoneType’ is not iterable

Any idea what I’m doing wrong ?

The system is a fully patched CentOS 7 and the ilorest-library is installed
OMD[moni]:~$ pip3 list|grep ilo
python-ilorest-library 3.2.2

The icinga check_redfish plugin works out of the box on the same system, but I would prefer to use the special agent.

Thanks
Frank

There are some changes in the actual ilorest-library.

Can you modify the special agent like this an test again?
Line 278 - add the ca_cert_data things

    # Create a Redfish client object
    REDFISHOBJ = RedfishClient(base_url=iLO_host, username=iLO_account, \
                               password=iLO_password, ca_cert_data={})
    # Login with the Redfish client
    REDFISHOBJ.login()

Looks way better, but still not perfect.

OMD[moni]:~$ /opt/omd/sites/moni/local/share/check_mk/agents/special/agent_ilo -u rhev -p password 10.10.10.130
<<<check_mk>>>
Version: 4
AgentOS: iLO 4.272
<<<ilo_api_power:sep(124)>>>
Traceback (most recent call last):
File “/opt/omd/sites/moni/local/share/check_mk/agents/special/agent_ilo”, line 289, in
get_information(REDFISHOBJ)
File “/opt/omd/sites/moni/local/share/check_mk/agents/special/agent_ilo”, line 166, in get_information
psus = response.dict[“PowerSupplies”]
KeyError: ‘PowerSupplies’

Maybe this is because this is a server blade and i don’t know right now if the PSU is maybe only visible through the chassis.

Thanks
Frank

Yup that is the case
[admin2654@lxomd2 check_redfish]$ ./check_redfish.py -H 10.10.10.130 -f ~/auth/rhev.cred --power
[UNKNOWN]: Request error: No power supply data returned for API URL ‘/redfish/v1/Chassis/1//Power’

Yes i know - for blades there needs to be some extra handling implemented.
As i have no blades to test it is a little bit complicated.
It was the same error as in June this year :wink:
To fix this i would need the content of the power section.
Quickest way to get the power section is - insert a line after 169

sys.stdout.write("<<<ilo_api_power:sep(124)>>>\n")

With

sys.stdout.write(response)

pay attention to the same indentation as the line before.
If you now execute the agent on the command line you will get the power section before it throws this error.
The problem is the power section exists on the blade but it has no PSU inside.

Hi Andreas, thanks a lot for our help! :slight_smile:
For now I will just modify the code as you said.
If you need access to a blade system for some test or to check things out we can do a teamviewer session and i can give you access to one of our blades.

Cheers
Frank

It would be ok if you can sent me the output from my one line as a private message.
I need to know how the power section looks like if no PSU is present.

Hi Andreas,

Seems that I have an issue with your iLO plugin. We recently got our first GEN10 Server and if we use the iLO special Agent it shows a crash in the GUI.

The output of the special agent looks fine in general and the metrics are available in the GUI.

If the special_agent is executed directly as the instance user then there are no errors whatsoever.

OMD[moni]:~$ /opt/omd/sites/moni/local/share/check_mk/agents/special/agent_ilo -u checkmk -p Password 10.10.10.99
<<<check_mk>>>
Version: 5
AgentOS: iLO 5.242
<<<ilo_api_cpu:sep(124)>>>
2|Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz|OK
<<<ilo_api_general:sep(124)>>>
ProLiant DL380 Gen10|U30 v2.42 (01/23/2021)|CZ21100NJ1|OK
<<<ilo_api_fans:sep(124)>>>
Fan 1|13|Percent|Enabled|OK
Fan 2|13|Percent|Enabled|OK
Fan 3|13|Percent|Enabled|OK
Fan 4|13|Percent|Enabled|OK
Fan 5|13|Percent|Enabled|OK
Fan 6|13|Percent|Enabled|OK
<<<ilo_api_temp:sep(124)>>>
01-Inlet Ambient|23|Celsius|42|47|Enabled|OK
02-CPU 1|40|Celsius|70|70|Enabled|OK
03-CPU 2|40|Celsius|70|70|Enabled|OK
04-P1 DIMM 1-6|41|Celsius|90|90|Enabled|OK
05-PMM 1-6|0|Celsius|0|0|Absent|NP
06-P1 DIMM 7-12|42|Celsius|90|90|Enabled|OK
07-PMM 7-12|0|Celsius|0|0|Absent|NP
08-P2 DIMM 1-6|43|Celsius|90|90|Enabled|OK
09-PMM 1-6|0|Celsius|0|0|Absent|NP
10-P2 DIMM 7-12|40|Celsius|90|90|Enabled|OK
11-PMM 7-12|0|Celsius|0|0|Absent|NP
12-HD Max|35|Celsius|60|60|Enabled|OK
13-Exp Bay Drive|0|Celsius|0|0|Absent|NP
14-Stor Batt 1|0|Celsius|0|0|Absent|NP
15-Front Ambient|33|Celsius|70|70|Enabled|OK
16-VR P1|45|Celsius|115|120|Enabled|OK
17-VR P2|47|Celsius|115|120|Enabled|OK
18-VR P1 Mem 1|35|Celsius|115|120|Enabled|OK
19-VR P1 Mem 2|38|Celsius|115|120|Enabled|OK
20-VR P2 Mem 1|38|Celsius|115|120|Enabled|OK
21-VR P2 Mem 2|38|Celsius|115|120|Enabled|OK
22-Chipset|54|Celsius|100|100|Enabled|OK
23-BMC|71|Celsius|110|115|Enabled|OK
24-BMC Zone|41|Celsius|90|95|Enabled|OK
25-HD Controller|46|Celsius|100|100|Enabled|OK
26-HD Cntlr Zone|41|Celsius|85|90|Enabled|OK
28-LOM Card|81|Celsius|100|100|Enabled|OK
29-LOM Card Zone|40|Celsius|75|80|Enabled|OK
30-PCI 1|55|Celsius|100|100|Enabled|OK
31-PCI 1 Zone|39|Celsius|75|80|Enabled|OK
32-PCI 2|0|Celsius|0|0|Absent|NP
33-PCI 2 Zone|39|Celsius|75|80|Enabled|OK
34-PCI 3|0|Celsius|0|0|Absent|NP
35-PCI 3 Zone|39|Celsius|75|80|Enabled|OK
36-PCI 4|74|Celsius|100|100|Enabled|OK
37-PCI 4 Zone|34|Celsius|75|80|Enabled|OK
38-PCI 5|0|Celsius|0|0|Absent|NP
39-PCI 5 Zone|35|Celsius|75|80|Enabled|OK
42-PCI 7|0|Celsius|0|0|Absent|NP
43-PCI 7 Zone|38|Celsius|75|80|Enabled|OK
48-Mid HD Max|0|Celsius|0|0|Absent|NP
49-Rear HD 4 Max|0|Celsius|0|0|Absent|NP
50-Rear HD 5 Max|0|Celsius|0|0|Absent|NP
51-Rear HD 6 Max|0|Celsius|0|0|Absent|NP
52-Rear 3LFF Max|0|Celsius|0|0|Absent|NP
53-Battery Zone|39|Celsius|75|80|Enabled|OK
54-P/S 1 Inlet|27|Celsius|0|0|Enabled|OK
55-P/S 2 Inlet|38|Celsius|0|0|Enabled|OK
56-P/S 1|40|Celsius|0|0|Enabled|OK
57-P/S 2|42|Celsius|0|0|Enabled|OK
58-P/S 2 Zone|46|Celsius|75|80|Enabled|OK
59-E-Fuse|39|Celsius|100|100|Enabled|OK
76-AHCI HD Max|0|Celsius|0|0|Absent|NP
80-PCI 1 M2|0|Celsius|0|0|Absent|NP
81-PCI 1 M2 Zn|0|Celsius|0|0|Absent|NP
82-PCI 2 M2|0|Celsius|0|0|Absent|NP
83-PCI 2 M2 Zn|0|Celsius|0|0|Absent|NP
84-PCI 3 M2|0|Celsius|0|0|Absent|NP
85-PCI 3 M2 Zn|0|Celsius|0|0|Absent|NP
86-PCI 4 M2|0|Celsius|0|0|Absent|NP
87-PCI 4 M2 Zn|0|Celsius|0|0|Absent|NP
88-PCI 5 M2|0|Celsius|0|0|Absent|NP
89-PCI 5 M2 Zn|0|Celsius|0|0|Absent|NP
90-PCI 6 M2|0|Celsius|0|0|Absent|NP
91-PCI 6 M2 Zn|0|Celsius|0|0|Absent|NP
92-PCI 7 M2|0|Celsius|0|0|Absent|NP
93-PCI 7 M2 Zn|0|Celsius|0|0|Absent|NP
94-PCI 8 M2|0|Celsius|0|0|Absent|NP
95-PCI 8 M2 Zn|0|Celsius|0|0|Absent|NP
<<<ilo_api_power:sep(124)>>>
1|126|1600|Enabled|OK
2|156|1600|Enabled|OK
<<<ilo_api_power_metrics:sep(124)
0|0|3200|282
<<<ilo_api_mem:sep(124)>>>
proc1dimm3|DDR4|32768|OK
proc1dimm4|DDR4|32768|OK
proc1dimm5|DDR4|32768|OK
proc1dimm6|DDR4|32768|OK
proc1dimm7|DDR4|32768|OK
proc1dimm8|DDR4|32768|OK
proc1dimm9|DDR4|32768|OK
proc1dimm10|DDR4|32768|OK
proc2dimm3|DDR4|32768|OK
proc2dimm4|DDR4|32768|OK
proc2dimm5|DDR4|32768|OK
proc2dimm6|DDR4|32768|OK
proc2dimm7|DDR4|32768|OK
proc2dimm8|DDR4|32768|OK
proc2dimm9|DDR4|32768|OK
proc2dimm10|DDR4|32768|OK
<<<ilo_api_cntrl:sep(124)>>>
0|HPE Smart Array E208i-a SR Gen10|PEYHB0FRHF70WQ |3.53|OK
<<<ilo_api_phydrv:sep(124)>>>
1I:3:2|18|228936|OK
1I:3:1|18|228936|OK
<<<ilo_api_raid:sep(124)>>>
0-1|1|228902|262144|OK
<<<ilo_firmware:sep(124)>>>
2.42 Apr 05 2021|System Board iLO 5
U30 v2.42 (01/23/2021)|System Board System ROM
14.3.0 Build 50|System Board Intelligent Platform Abstraction Data
0x31|System Board System Programmable Logic Device
1.0.7|System Board Power Management Controller Firmware
2.00|Bay 1 Power Supply Firmware
2.00|Bay 2 Power Supply Firmware
0.2.2.0|System Board Innovation Engine (IE) Firmware
4.1.4.423|System Board Server Platform Services (SPS) Firmware
1.2 0|System Board Server Platform Services (SPS) Descriptor
U30 v2.40 (10/26/2020)|System Board Redundant System ROM
3.50.100|System Board Intelligent Provisioning
1.1|System Board Power Management Controller FW Bootloader
10.54.7|Embedded ALOM HPE Ethernet 10Gb 2-port 562FLR-SFP+ Adpt
3.53|Embedded RAID HPE Smart Array E208i-a SR Gen10
12.8.352.12|PCI-E Slot 1 HPE SN1200E 16Gb 2p FC HBA
10.54.7|PCI-E Slot 4 HPE Ethernet 10Gb 2-port 562SFP+ Adapter
2.5|Embedded Device Embedded Video Controller
HPG4|Port=1I:Box=3:Bay=2 Drive
HPG4|Port=1I:Box=3:Bay=1 Drive

This is the Crash that shows up:

Traceback:

File “/omd/sites/moni/lib/python3/cmk/base/decorator.py”, line 37, in wrapped_check_func

status, infotexts, long_infotexts, perfdata = check_func(hostname, *args, **kwargs)

File “/omd/sites/moni/lib/python3/cmk/base/discovery.py”, line 788, in check_discovery

services, host_label_discovery_result = _get_host_services(

File “/omd/sites/moni/lib/python3/cmk/base/discovery.py”, line 1600, in _get_host_services

services, host_label_discovery_result = _get_node_services(

File “/omd/sites/moni/lib/python3/cmk/base/discovery.py”, line 1621, in _get_node_services

services, host_label_discovery_result = _get_discovered_services(

File “/omd/sites/moni/lib/python3/cmk/base/discovery.py”, line 1649, in _get_discovered_services

discovered_services, host_label_discovery_result = _discover_host_labels_and_services(

File “/omd/sites/moni/lib/python3/cmk/base/discovery.py”, line 1421, in _discover_host_labels_and_services

discovered_services = [] if discovery_parameters.only_host_labels else _discover_services(

File “/omd/sites/moni/lib/python3/cmk/base/discovery.py”, line 1470, in _discover_services

service_table.update({

File “/omd/sites/moni/lib/python3/cmk/base/discovery.py”, line 1470, in

service_table.update({

File “/omd/sites/moni/lib/python3/cmk/base/discovery.py”, line 1538, in _execute_discovery

yield from _enriched_discovered_services(hostname, check_plugin.name, plugins_services)

File “/omd/sites/moni/lib/python3/cmk/base/discovery.py”, line 1552, in _enriched_discovered_services

for service in plugins_services:

File “/omd/sites/moni/lib/python3/cmk/base/api/agent_based/register/check_plugins.py”, line 72, in filtered_generator

for element in generator(*args, **kwargs):

File “/omd/sites/moni/lib/python3/cmk/base/api/agent_based/register/check_plugins_legacy.py”, line 88, in discovery_migration_wrapper

for element in original_discovery_result:

File “/omd/sites/moni/local/share/check_mk/checks/ilo_api_power”, line 25, in inventory_ilo_api_power

if line[3] != u"Absent":

Any ideas what causes this? Might be something on your end.

Thanks a million

Wazgen

Is this already version 2.4 of my package? I had a bug in the special agent and with this no separation between two sections.
In 2.4 this is fixed.

Right, we still have version 2.3. I’m going to have to upgrade. Thanks a lot.