Hey!
We are currently planning to support monitoring via Redfish. Redfish is an interface for the remote maintenance of servers.
Most new management boards come with Redfish support (e.g. Dell iDRAC8, HPE iLO4, Lenovo XClarity, Supermicro X10, Cisco IMC) and monitoring via Redfish has become a great alternative to monitoring via IPMI/SNMP here.
Our plan is to adopt the generic Redfish Special Agent from @andreas-doehler into Checkmk (Checkmk Exchange). Andreas already uses it for the data centers that he monitors and it works very well there so far.
The goal would be for us to be able to monitor all servers that speak Redfish with one special agent. The nice thing is that you then get the same services for all servers and can use the same rules everywhere.
We are now looking for more people to test the monitoring via Redfish in the field and let us know whether everything works as expected. Any insights are helpful (e.g. sensors, disks etc. missing).
Recommendation: Do on a test site!
To get started with the tests, all you need to do is
- install the MKP (Checkmk Exchange)
- install the redfish python package as the SITE user:
pip3 install 'urllib3<2' redfish
- activate Redfish on the servers, if not already active
- create a host for each management board you want to monitor via Redfish and configure the special agent rule âRedfish Compatible Management Controllerâ for that host
Note: Testing this on the Checkmk appliance is currently not possible due to the missing package.
Minimum requirements:
- HPE iLO5 (iLO4 only with newest firmware due to performance issues)
- Dell iDRAC v9 (v8 works, but is too slow)
- Cisco CIMC (currently no insights on minimum version requirements)
- Supermicro BMC (currently no insights on minimum version requirements)
- Nutanix (currently no insights on minimum version requirements)
- Lenovo (currently no insights on minimum version requirements)
The monitoring is built for management boards. Any other device with a âRedfishâ is not supported.
Testing of Redfish agent and reporting a problem.
If a connection problem exists.
- on the command line execute the agent with the switches ââdebugâ and â-vvâ to get a maximum an output. It is possible that here already the real problem is shown (credential problem, slow connection or generic problem)
If only data is missing that was expected, like hard drives, memory modules and so on.
- on the command line execute the agent with the switches ââdebugâ and â-vvâ to get a maximum an output.
- Inspect the sections in the output if the missed section is there or not
- If section exists and is only not shown as checks, i need this section output to take a look
- If section is missing i need the complete agent output - the section âredfish_systemâ and âredfish_chassisâ are the minimum i need
If no real good output is generated, it is possible to create a dump for the complete Redfish interface.
To achieve this, there is a small tool existing.
The output can be compressed into one archive and Andreas can check if the needed data is in whats provided by the interface.
Happy monitoring and thanks to Andreas for his great contribution and free service to the community here. We would like to merge this once we have received sufficient feedback from users in the field.
Cheers, Martin