Call for Redfish beta testers

Hi Andreas,

I just installed the new version 2.3.52 and now I get an error when I try to retrieve the ILOs of my HPE G10+. But I also get the message on my two Cisco servers.

image

Did I configure something wrong? I updated from 2.3.45 to 2.3.52.

Thanks and regards, Sascha

Can you please execute the agent on command line with ā€œā€“debugā€ and ā€œ-vvā€ switch?
The code that does the import from ā€œredfish.messagesā€ is not active at the moment. Only preparation for the next versions.
I checked the import on both 2.2 and 2.3 - it was working without problem.
2.3

OMD[cmk]:~$ python3
Python 3.12.3 (main, May  7 2024, 15:13:53) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from redfish.messages import (
...     get_messages_detail,
...     get_error_messages,
...     search_message,
...     RedfishPasswordChangeRequiredError,
...     RedfishOperationFailedError,
... )
>>>

2.2

OMD[cmk]:~$ python3
Python 3.11.5 (main, Nov 30 2023, 14:57:54) [GCC 13.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from redfish.messages import (
...     get_messages_detail,
...     get_error_messages,
...     search_message,
...     RedfishPasswordChangeRequiredError,
...     RedfishOperationFailedError,
... )
>>>

Please check your ā€œ~/local/lib/python3/ā€ there should be no Redfish folder anymore with CMK 2.3.
If there is one it needs to be removed (old Redfish lib from 2.2 manually installed).

1 Like

Hi Andreas,

thank you for the quick reply. I had two subfolders in the python3 folder. After deleting them, all working well.

Bye, Sascha

Some information for users of CMK 2.2
If you update to 2.2.52 version of Redfish plugin you will get some service names with different naming.
Now all the services names are the same for 2.2 and 2.3.
Sorry this is an incompatible change but it needed to be done at some point :wink:

2 Likes

Hi, firstly thank you very much for spending time on this plugin. Really appreciated! I was wondering how I can work around very slow BMCs. I have a few Gigabyte boards that I would like to monitor with the plugin but I am having trouble doing the inventory and I think it’s because it just takes too long, e.g.:

$ agent_redfish -u xxx -s xxx -v --debug --timeout 30 n12345
INFO 2024-07-25 11:08:26 root: running file /omd/sites/hpcwatch1/lib/python3/cmk/special_agents/utils/agent_common.py
INFO 2024-07-25 11:08:26 root: using Python interpreter v3.11.5.final.0 at /omd/sites/hpcwatch1/bin/python3
INFO 2024-07-25 11:08:26 redfish: Redfish API
INFO 2024-07-25 11:08:26 redfish.rest.v1: Attempt 1 of /redfish/v1
INFO 2024-07-25 11:08:38 redfish.rest.v1: Response Time for GET to /redfish/v1: 11.636435125023127 seconds.

So that initial step already takes more than 10 seconds. Scanning the whole tree takes just over 2 minutes :slight_smile: - is this a lost cause or can I do something?

It looks very bad. The initial fetch should be nearly immediately.
How long does the system needs if you only enable one section like here fan and temperature.


If this time is acceptable then i would do it this way.
You get the system roll-up state and the single fan and temperature services.
In case of an hardware failure beside the fans and temperatures you will get an message at the ā€œSystem stateā€ service.

Hi @andreas-doehler well my first problem is that I am unable to inventory a node with the plugin enabled. Is there an internal timeout in checkmk for that process that will not wait for the plugin to finish and is there a way to increase that timeout?

If you get an timeout only with ā€œFan and Temperaturesā€ active then it needs longer than 60 seconds.
That is normally the check timeout inside CMK.
With enterprise edition you can define your own timeouts for single hosts.
With RAW edition you need to manually change the Nagios core config.

With your system i would do the following steps.

  • define the rule as shown in my screenshot
  • on command line as site user do cmk --debug -vvI hostname
  • look at the needed time - you see this inside output

Thanks Andreas, I did what you suggested but no luck so far:

  Source: SourceInfo(hostname='n12345', ipaddress='...', ident='special_redfish', fetcher_type=<FetcherType.SPECIAL_AGENT: 6>, source_type=<SourceType.HOST: 1>)
[cpu_tracking] Start [7fa5b83d3110]
Read from cache: AgentFileCache(n12345, path_template=/omd/sites/hpcwatch1/tmp/check_mk/data_source_cache/special_redfish/{hostname}, max_age=MaxAge(checking=0, discovery=900.0, inventory=900.0), simulation=False, use_only_cache=False, file_cache_mode=1)
Calling: /omd/sites/hpcwatch1/local/share/check_mk/agents/special/agent_redfish -u ... -s ... -m Thermal --timeout 40 n12345
[cpu_tracking] Stop [7fa5b83d3110 - Snapshot(process=posix.times_result(user=0.0, system=0.0, children_user=0.17, children_system=0.03, elapsed=3.2200000025331974))]
+ PARSE FETCHER RESULTS
...
  HostKey(hostname='n12345', source_type=<SourceType.HOST: 1>)  -> Not adding sections: Agent exited with code 1: 

So the agent exits with code 1 - that call clearly does not wait long enough. By the way this is CEE so based on what you said I should be able to increase the timeout. I’ll take a look.

I got a little bit further, I ended up with a stack:

...
  File "/omd/sites/hpcwatch1/local/lib/python3/cmk/base/plugins/agent_based/redfish_fans.py", line 36, in discovery_redfish_fans
    for fan in fans:
TypeError: 'NoneType' object is not iterable

I fixed this in the code and the discovery is now finishing :slight_smile: Thanks a lot for the help!

I fixed the no fan discovery problem. New version is pushed to github and uploaded to exchange.

2.x.53 - standby Firmware is shown as ok
2.x.54 - fixed crash if no fans or temperatures exists in the thermal section

and

2 Likes

There was another case I had to fix up in code. For PSUs it looks like the health is not always reported so I did:

    if "Status" in psu:
        dev_state, dev_msg = redfish_health_state(psu["Status"])
    else:
        dev_state, dev_msg = redfish_health_state({})

It would be nice if you can sent me a dump of your Redfish interface created with Redfish Mockup Creator.
What version of the plugin do you use?
The mentioned line is already fixed here with this commit.

suggested a fix for that already in Feb… Call for Redfish beta testers - #67 by MasopustC, thought that it is already implemented since then…

That is my bad for not checking! I installed version 2.2.52 as we are using checkmk 2.2.0p27 and I wasn’t sure if the latest plugin is compatible.

Ok I will do that.

Hi Andreas, I don’t think I am allowed to send a PM, or am I missing something?

As a new (not only time after signing up, but also interaction in the forum) user, it is possible, that you cannot initiate PM to avoid spam. Andreas should be able to start the conversation with you though.

2 Likes

We use Fujitsu Primergy Servers and the Redfish-Plugin is working very well.
But the Power Supply values seems to be incorrect.
This is an output from an running server.

Power supply 0-PSU1 0.0 Watts input, 0.0 Watts output, 0.0 V input, Capacity 2600.0 Watts, Typ CDR26214M3
Power supply 1-PSU2 0.0 Watts input, 0.0 Watts output, 0.0 V input, Capacity 2600.0 Watts, Typ CDR26214M3

Version: redfish 2.3.52, iRMC S6

Can you post or sent me the raw agent output of the power section?
If there are no values inside that are usable then the check cannot show any information beside the status data.

I’m redoing some of my redfish hosts in a new site and noticed something that I’d like to fix. Most of my redfish compatible hosts have two 10G SFP+ ports and two 1G ethernet ports. We’re currently only using the first 10G port and the other three all show up in a warning state. How can I tell the system that the warnings are kind of a false positive, they aren’t connected and that’s the normal state at the moment? It’s not a big deal but it does add info that isn’t needed into the list of services that aren’t OK, messes up the signal-to-noise ratio.