Call for Redfish beta testers

To protect Andreas a bit as he is developing a rather large extension for free:
If I would be you Andreas, I would focus on servers.
Because when we mainline it, we will likely compile a list of supported systems against this will work and that will be only servers anyway.

2 Likes

Will the next appliance version 1.7 hopefully include the necessary package?

Hello, today i installed either the HPE specific Redfish plugin and the generic plugin to supersede the oh-so-often crappy SNMP disadvantages and to compare both plugins. Here is what happened:

You cleary can see the switch from SNMP to the HPE specific plugin, so thats very very good (>80s va <7s)!
The upper dent at 11 is the short switch do the universal redfish plugin, which took about 16-18s.
Addendum: albeit being found rawdata in the HPE plugin, disks are not added to the monitoring


grafik

And neither does the universal plugin, although the rest of the services that are added read basically identical at first & second sight.

Testing of Redfish agent and reporting a problem.

If a connection problem exists.

  • on the command line execute the agent with the switches “–debug” and “-vv” to get a maximum an output. It is possible that here already the real problem is shown (credential problem, slow connection or generic problem)

If only data is missing that was expected, like hard drives, memory modules and so on.

  • on the command line execute the agent with the switches “–debug” and “-vv” to get a maximum an output.
  • Inspect the sections in the output if the missed section is there or not
  • If section exists and is only not shown as checks, i need this section output to take a look
  • If section is missing i need the complete agent output - the section “redfish_system” and “redfish_chassis” are the minimum i need

If no real good output is generated, it is possible to create a dump for the complete Redfish interface.
To achieve this, there is a small tool existing.

The output can be compressed into one archive and i can check if needed whats provided by the interface.

2 Likes

Thanks Andreas. I put this into the initial post as well as information on supported systems.

The necessary dependencies will become part of the Checkmk installation package, once this endeavor concludes. But right now, this is in a testing stage.

What is the reason to support this only for servers?
From enterprise monitoring perspective I would suggest not to limit on servers.
If the redfish standard is implemented in PDUs and other devices which are used in enterprise datacenters, why there is no interest to support this?

Mostly because this was the initial intention of Andreas, who is doing this for free and specifically called it monitoring for “Redfish Compatible Management Controller” and it is based on the assumption that most servers will look alike. Any generic monitoring, which not only includes data ingestion, but also data processing, and data visualization still needs a scope, otherwise it leads to wrong user expectation, as in your case, where you thought that just because a PDU implements Redfish, it can be monitored in the same fashion. It is similar to the expectation, that just because a device can do SNMP, there is one generic mechanism for monitoring all SNMP capable devices, which generates the same output across all these devices: from PDU to server to switch. It would be nice, but the reality differs unfortunately.

That’s the reason why this project has a scope: Redfish Compatible Management Controllers.

That’s a wrong assumption. Have you ever looked into the Check Plug-Ins Catalog?
Checkmk covers clock devices, cameras, door locks. So quite a lot of more edge case stuff in enterprise data center. So, your statement is incorrect. The evidence shows that we continuously add more monitoring for anything in the enterprise data center, which is not directly a compute or networking device, e.g. currently we are adding support for Vertiv racks.
We always expand the monitoring to the areas, where there is demand. This could in the future also be monitoring PDUs via Redfish. However, there I would first check, if not one of the existing plug-ins via SNMP is sufficient already: Raritan plug-ins

2 Likes

Indeed, thanks a lot to Andreas and the Checkmk team for this effort! :slight_smile:

The term “Redfish standard” might suggest that there is one identical way to query all devices that support it. That is not the case, just like “REST-API” means you still have to use specific individual API calls for different applications.

Many years ago a consultant actually told me that it shouldn’t be a problem to monitor a new type of device, because it provided a REST-API and “almost everything speaks REST these days”. :rofl:

That is almost a bit like saying, if you know the alphabet you surely can speak a language like Indonesian, which uses the same alphabet as English. :wink:

Thanks for all your effort!
Kind regards, Dirk.

2 Likes

Redfish is, i think, a relative well defined standard.
I would recommend to everyone, who does work in infrastructure, to read the official documentation.
https://www.dmtf.org/standards/redfish
With the tool i mentioned before “Redfish-Mockup-Creator” you have a generic tool that fetches data from every conform implementation of Redfish interfaces.
If this is not so well defined, it would not work.

3 Likes

A well defined standard, but far from trivial.

What I intended to express with regards to questions about support for PDUs and similar, was that if you implemented checks to monitor Management Board information under /redfish/v1/Chassis/{ChassisId}, then nobody should expect that this magically also works for PDUs, which (just a first guess) might provide relevant information under /redfish/v1/PowerEquipment/RackPDUs/{PowerDistributionId}/Mains/{CircuitId} or something like this, which would need to be implemented as additional checks.

So, yes, anyone who wonders, why this does not support all devices right out-of-the-box should really have a look at the specifications :slight_smile:

Kind regards, Dirk.

2 Likes

The implementation of the checks is not the problem as all the data is available in JSON format.
No complex parsing of strange formatted tables and so on.

The general problem is to find a good way to retrieve the data from the different devices.

Update to Redfish Agent.

v 2.2.19 / 2.1.19

At the moment only from the github page.
Exchange packages are in review.

Changelog

2.2.18 - rework special agent to use CMK included functions - like Nutanix agent
2.2.19 - agent can handle a device without manager

Keep in mind if there are real problems with the data from the agent i need a dump made with the “Mockup-Creator” tool linked in the first post. You don’t need to post such a dump here - PM or mail is good for this :slight_smile:

PS: the next days there will also be a test version for 2.3 daily builds

3 Likes

Now also a mkp for the daily builds of CMK 2.3 master tree is available.
Attention it was build with the daily from 24.12. and there where changes over the last two weeks that broke some things :smiley:
v.2.3.24
Also pay attention that in the new API for rules there is no password input field available and the password field at the moment uses the normal text input.
→ password is visible everywhere
@moritz is this password input only missing in the new rule API or made i something wrong? If i import the “old” IndividualOrStoredPassword then i get an error message if i try to add an rule or show the rules.

2 Likes

I have finally had time to try this out in my homelab… :smiley:

So 1 of 3 ESXi hosts works (they are SuperMicro based) The one with the oldest firmware (from 2017) works, but not the others… (Firmware from 2019 and 2022)

Running cmk --debug -vvn does not show any meaningful (just exit =1) (but it does show the password in clear text… not sure if the “calling” could be excluded from the debug?

I also noticed that if you delete a host the cache (~/tmp/check_mk/data_source_cache/special_redfish) for that host is still there, but perhaps mostly a Checkmk clean up job? Also, not sure when, if the Agent was un-successful a cache file of 0 bytes was created. (yes all OOB are prefixed with “ilo” just an old habit)

For the one host that i’s working I’m missing the PSUs, Not sure if this is an issue with super micro (might be) (the Redish rule does include PSU)

-m Memory,Power,Processors,Thermal,FirmwareInventory,NetworkAdapters,NetworkInterfaces,EthernetInterfaces,Storage,ArrayControllers,SmartStorage,HostBusAdapters,PhysicalDrives,LogicalDrives

PSU and SuperMicro is a strange thing :smiley:
I have some SuperMicro BMC inside my systems and some are working flawlessly and some other show strange status messages for the PSU. But shown are all the devices.

For the not working BMC i have only one tip.
Please execute the agent with “–debug -vv” on the command line. There you should get a little bit more error message.
At SuperMicro devices i only know that you need to insert the licence key for the BMC

The password should be only visible if you use the normal password inside the WATO rule. With stored password you should only see the key to the password.
For debugging the call of the agent is not needed.
More important for debugging is the real agent output if you execute the agent manually with “–debug -vv” like mentioned above.

is this password input only missing in the new rule API

The API is still work in progress :slight_smile: ; a few things are still missing. Passwords have arrived now:

1 Like

Hi all,
we have several new HPE Gen11 Servers with ILO6 and while we get all information regarding PSU, Memory, CPU and general health, the storage devices (controller and disks) are missing in the monitoring.

A comparison with an older Gen10/ILO5 server shows different paths for the storage components.

HPE ILO5:
/redfish/v1/Systems/1/SmartStorage/ArrayControllers/0/DiskDrives/0/

HPE ILO6:
/redfish/v1/Systems/1/Storage/DE045000/Drives/0

I’m happy to privately share the output of the DMTF Mockup Creator of one of our Gen11 servers.

1 Like

If possible you can sent me a dump as private message. Gen11 is missing at the moment in my collection :slight_smile: