Redfish problems (trying to monitor iLO 5)

That’s why i said, first install a clean 2.3 and check if this works as expected.
I don’t know what was the original situation in your system.
Is it a distributed system, how many mkp’s are installed and so on.
That is not an easy task that can be done on the fly.

For the password problem - what CMK version is this site running?

I will create a new test site tomorrow then! Originally it was working using your older plugin for the HPE iLO as well as an older Redfish plugin, which was specific to CMK 2.1. We then upgraded to 2.3 in the following method as per support’s advice: 2.1.0p32 > 2.1.0p44 > 2.2.0p27 > 2.3.0p6.

Then I tried to use the built-in plugin, which had these issues straight out of the box.

As I said though I will see if these issues occur on a clean site as per your reccomendations, since the upgrade path we took may have introduced a lot of deeper issues!

We are currently on 2.3.0p6 and running a single instance/site on a self-managed Ubuntu LTS 22.04 VM.

Thanks again for all your help so far.

Ok so this is strange. I have removed all but 1 of the ILO hosts from monitoring, to reduce noise and focus my troubleshooting.

The issue no longer occurs!

Could this be related to high load on the plugin???

High load on the plugin should normally not exists.
The biggest difference between the old iLO Redfish and the generic Redfish is the session cache. This cache file is written with the host IP as unique identifier.
If now the same user and same IP is used by another monitoring object (should not be the case), then it can have problem. But this also should only affect single object and not all.

So I added one more ILO, and it immediately started crashing. It didn’t crash at all overnight with just a single host, but adding a second it started crashing.

I am going to try removing the host that had no issues, and seeing what happens to the new one I added today.

I removed all hosts, and re-added the original host that seemed fine overnight, and during initial service discovery this happened:

A rescan worked fine though, but it now keeps crashing as it did before. I am so lost!

Working with support to see if there is anything wrong with our site configuration, but the config checker shows all ok, and diag logs seem good too.

Hello, I’m facing the same problem as mentioned above. Updated from 2.2 to 2.3 and suddenly all redfish checks went UNKNOWN.
Installed the newest release of the plugin and I receive the same error message.


Any chance to get this fixed any time soon?

Thank you very much!

Any chance that you not removed the before installed Python packages.
With 2.1/2.2 you had to install extra Python packages with “pip install redfish ‘urllib3<2’”. These packages should be removed before or after upgrade to 2.3 as they are included now.

Also, please don’t use SNMP at the same time on this management interface, only the special agent.

Hi Andreas - how would we uninstall that package? It might be what is causing our issues…

It is not so easy.
First step i would inspect the folder “~/local/lib/python3/”
What’s there inside?

Folder contents:

This is not an upgraded site from 2.2 where redfish plugin was working before or?
Normally there are way more libs installed.
If it is a clean installation for 2.3 then i don’t know where the “normalizer” comes from. Looks strange.

I did go in and remove a LOT of python packages, since the upgrade failed multiple times until I did this. I did not know where most of these came from either.

I did take a dump of the original contents of the python plugins before I eviscerated them:

OMD[ctshirts]:~$ find ~/local/lib/python3/ -type d -name '*.*-info'
/omd/sites/ctshirts/local/lib/python3/decorator-5.1.1.dist-info
/omd/sites/ctshirts/local/lib/python3/ply-3.11.dist-info
/omd/sites/ctshirts/local/lib/python3/idna-3.4.dist-info
/omd/sites/ctshirts/local/lib/python3/charset_normalizer-3.1.0.dist-info
/omd/sites/ctshirts/local/lib/python3/redfish-3.1.9.dist-info
/omd/sites/ctshirts/local/lib/python3/requests_toolbelt-1.0.0.dist-info
/omd/sites/ctshirts/local/lib/python3/requests-2.31.0.dist-info
/omd/sites/ctshirts/local/lib/python3/certifi-2023.5.7.dist-info
/omd/sites/ctshirts/local/lib/python3/jsonpatch-1.33.dist-info
/omd/sites/ctshirts/local/lib/python3/jsonpointer-2.4.dist-info
/omd/sites/ctshirts/local/lib/python3/requests_unixsocket-0.3.0.dist-info
/omd/sites/ctshirts/local/lib/python3/urllib3-2.0.3.dist-info
/omd/sites/ctshirts/local/lib/python3/jsonpath_rw-1.4.0-py3.9.egg-info
/omd/sites/ctshirts/local/lib/python3/six-1.16.0.dist-info

Should I reinstall any of these?

No - all these packages where installed before with the redfish as dependencies.
Now all is included already.

That makes sense then.

The normalizer plugin got uninstalled, but left some weird folder names there as you see “~harset_normalizer”. I tried removing it but it wouldnt! Wouldnt be causing a problem would it do you think?

I just tried to check installed python packages, but apparently the upgrade to 2.3 removed pip!

OMD[ctshirts]:~$ pip list
Command 'pip' not found, but can be installed with:
apt install python3-pip
Please ask your administrator.

Should I reinstall this or is it supposed to be missing?

For python 3 its pip3

1 Like

Man I feel dumb lol! :sweat_smile:

Just adding more info here as I’m still troubleshooting.

We have added a few more ILO hosts. What I am seeing I can only explain as weird…

One of the ILO’s we added, is not having any issues whatsoever with the Redfish agent crashing. As soon as I added another one, this started having issues straight away. The weird issue is the original ILO is still fine. I am now wondering if anything is wrong with the ILO’s themselves, but I’ve rebooted the ILO and even gone as far as to reboot the host, but nothing improved.

The alert history for the problematic ILO reveals some interesting timings - it seems to be triggering alerts almost exactly every 5 minutes:

I’ve added 2 more ILOs from a completely different site, and these dont seem to be having any issues either.

It should be noted that ALL of these ILO’s have been very recently upgraded to the latest ILO firmware from HPE (v3.04) - but they are all setup the same.

The ILO’s having issues are all on the same network as the CheckMK server (even on the same subnet), so I dont see this being a network issue. The working ones are on a different subnet, differnt site entirely and are reachable through an SDWAN VPN.

Could there be an issue with these ILO’s all along rather than the CheckMK server or Redfish?

If so I dont know what, since I’ve tried rebooting them and they are all setup correctly.

Ilo Version 3.05 was released recently, so you could try If that makes a difference.