Call for Redfish beta testers

Thanks for your replies!

That’s good news, that the generic agent already includes the checks from the older specialized Redfish checks. Thanks a lot for your work!

Kind regards, Dirk.

1 Like

First of all: I already used the iLO version and was very happy with it. The redfish agent clearly is a bit more generic with its output: No “General Status” with the BIOS Version (I really liked that, but should be easy to add to system state), but redfish at least shows the network interfaces, though I get the same one twice (once by number, once by name with missing part number, same sn though) and the storage controller is missing its version number.
The iLO version gives me a good inventory of all the firmwares in the system. And the plugin already has a report for that! This is dearly missed on the redfish version!

Finally I’d also would like you to keep the management board integration, maybe just change the prefix to “BMC”. I really like the BMC data to be in the same host as the OS. In my eyes this is the hardware the OS is running on, the same host, same box. And you already have plugins to monitor CPU temps and add hardware inventory data to the host, should they be moved to a “hardware only” host? Through the piggyback system I add data from vSphere to the ESXi Hosts and VMs as well as data from my Citrix controllers, why should BMC data be different?
I don’t mind the dozens of extra services for the host, I already have that on Microsoft SQL Servers.

That’s a point. As HPE is the only one with this information, i need to look if it is possible also to fetch such information for the other vendors.

Until now there was no management board integration and if i could decide, i would not implement one.

No that’s two completely different things. But that’s my opinion. :wink:

As the general status is already there it should be no problem to integrate some more information. This week a had a good discussion with the devs in Munich and they said that i should remove all non status data from the summary output.
At this point I would also keep some non status data like the version information or the size of the hard drives in the output but the devs said that it should be only inside the HW/SW inventory information.

Can you provide a small screenshot what is doubled and from the storage controller?

1 Like

Here’s the requested screenshot from a DL360 Gen10 (which also an information itself missing in the redfish version)


The storage controller just doesn’t show the version number (like the BIOS version is missing from the system state).
But as long as I get a firmware table like with HPE, than it’s totally fine if that info isn’t in the service summaries. The only firmware version in the summaries is the iLO version in the Check_MK Agent right now, so the inventory way wouldn’t change that to much and make the devs in munich happy

We understand the need of users in this regard and the inconsistency in which this was developed over the recent years (showing non-status relevant information in service summaries).
I think we can though find a compromise on what should be shown in the summary and what not. And I believe, eventually when the inventory information is much more easily accessible in the GUI, we can start cleaning up the service summaries throughout Checkmk.

But it is a very popular opinion. :slight_smile:
We even have a KB article on it (full disclosure: written primarily by me): https://checkmk.atlassian.net/wiki/spaces/KB/pages/24477697/Management+boards+as+dedicated+hosts

I think @carnold meant the currently existing management board section in the host configuration. And as of now there are no plans to deprecate it, AFAIK.

We are now looking for more people to test the monitoring via Redfish in the field and let us know whether everything works as expected

Hi there :slight_smile:

Running it for some time on a bunch of Proliant servers. Feels like a great step forward coming from IPMI lan or SNMP. Shout out to @andreas-doehler for his work on this and support here.

Room for improvement IMHO:
Speed

Default values:

quit arbitrary IMHO. Resulted in warnings for me on sensors >50deg while iLO / Prioliant itself doesn’t give any upper limit.

Monitoring of DIMMs is currently broken. It’s a minor adjustment:
@andreas-doehler could you merge Fix ilo_api_mem.py by systeembeheerder · Pull Request #27 · Yogibaer75/Check_MK-Things · GitHub please?

Please remove the 0.0 degree. The device level is NULL / empty / non existent / whatever, but not zero degrees.

Thanks for testing. Can you please check with the actual generic Redfish agent.

or
https://exchange.checkmk.com/p/redfish

Most of your mentioned problems should be already solved there.

This agent also respects the None for the temperature thresholds.
The runtime problem, i’m working on this at the moment.

Some example for the temperature sensor problems without thresholds.


image

ILO4 gives the threshold 0 degree
{"Name": "14-P/S 1", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 1, "LocationYmm": 12}}, "PhysicalContext": "PowerSupply", "ReadingCelsius": 31, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 0, "UpperThresholdFatal": 0}

The DIMM check uses the same Redfish state to monitoring state translation as every other component.

1 Like

Hi and thanks @andreas-doehler for doing this properly for Checkmk.
I’m really looking forward to it!

I did a quick test on a test site to see how and if it works for a Lenovo Server.

The test was done on a Lenovo SR650 (ThinkSystem) - Type 7X06.

Here are my findings and hope this can get improved:

Special Agent execution time
The execution time is quite high. I’m aware that BMC’s are slow and redfish is not the fastest, but in the beginning I was running instant in the 60-second timeout because my check interval is every minute. I had to disable sections one by one, and now I have everything enabled except the HPE Storagesubsystem.
This is my execution time:

No Levels for FAN Speeds
It seems that I can’t put levels (warn/crit) on the FAN services. I was able to do that in the past, for example on my Fujitsu Servers.

Special Agent crashing
I let it run for a few days now and this is my event history for the Check_MK Service:


It is crashing quite often.

File "/omd/sites/redfish/lib/python3/cmk/special_agents/utils/agent_common.py", line 148, in _special_agent_main_core
    return main_fn(args)
  File "/omd/sites/redfish/local/share/check_mk/agents/special/agent_redfish", line 567, in agent_redfish_main
    get_information(REDFISHOBJ, args.sections)
  File "/omd/sites/redfish/local/share/check_mk/agents/special/agent_redfish", line 459, in get_information
    result = fetch_sections(redfishobj, resulting_sections, sections, system)
  File "/omd/sites/redfish/local/share/check_mk/agents/special/agent_redfish", line 193, in fetch_sections
    result = fetch_collection(redfishobj, section_data, section)
  File "/omd/sites/redfish/local/share/check_mk/agents/special/agent_redfish", line 147, in fetch_collection
    element_data = fetch_data(redfishobj, element.get("@odata.id"), component)
  File "/omd/sites/redfish/local/share/check_mk/agents/special/agent_redfish", line 132, in fetch_data
    response_url = redfishobj.get(url, None)
  File "/omd/sites/redfish/local/lib/python3/redfish/rest/v1.py", line 628, in get
    return self._rest_request(path, method='GET', args=args,
  File "/omd/sites/redfish/local/lib/python3/redfish/rest/v1.py", line 1110, in _rest_request
    return super(HttpClient, self)._rest_request(path=path, method=method,
  File "/omd/sites/redfish/local/lib/python3/redfish/rest/v1.py", line 954, in _rest_request
    raise RetriesExhaustedError() from cause_exception
 'cause_exception': ReadTimeout(ReadTimeoutError("HTTPSConnectionPool(host='10.0.0.111', port=443): Read timed out. (read timeout=3)")),

I can provide some crash reports upon request. :slight_smile:

BMC Info Check
Currently, I’m missing a BMC Info Check. What do I mean by that? Let me explain.
The Current System State or Check_MK Agent Services are providing only some information about the system. But I’d like something like this:

This is already done on a xClarity Controller with the check_redfish Nagios Check.

MEL and SEL Log Monitoring
Another idea would be to be able to monitor the SEL and MEL of the Servers with Checkmk maybe even be able to forward it to the Event Console / Logwatch.

This is also already possible with the check_redfish Nagios Check (not the Logwatch / Event Console forwarding). Looks like this:


I’m not sure how and if the other vendors have this implemented, but Lenovo has it.

Overall health
I’m not quite sure, but is the redfish integration able to pull an overall health state of the system? To get informed in case anything is wrong with the system (e.g. driver monitor)?
Or is this already implemented in the System State service as the Component state?

Summary
Thanks so far for this, and I’m really looking forward to having this properly implemented in Checkmk. I’m looking forward to the answers and can test a few more things if needed. :slight_smile: Not perfect yet, but it’s already a very good base to work with. Just DM me in case somebody needs the crash reports.

Best Regards
Norm

@Norm - The xClarity controller monitored by the Redfish agent is this the same that you monitor with the classic Nagios Redfish check? If yes then this ca be a problem.

The “System state” check is the overall health check of the device. If you have a single problem then you should also get a message at this check.

The runtime on your system is very long? Does this system has many HDD/SSD or memory modules? If yes you can disable the single HDD status collection and also the memory status collection. You will get the memory summary from the base data and the HDD status indirect with the “System state”.

Check_MK Agent
I use Setup → Services → Checkmk Agent installation auditing which fails on the redfish agent. Why does this check check a special (non checkmk) agent? Why does Redfish return 2.0? The mkp packages has verison 2.2.18. The service: Check_MK Agent

afbeelding

PSU 0-HpServerPowerSupply / PSU 1-HpServerPowerSupply
This agent makes my servers very efficient :slight_smile:
0 Watt input
afbeelding

MKP
I think it’s a mess. I guess it won’t change. Some kind of weird fear for git? I don’t know.

OMD[central]:~$ mkp list
Name    Version Title                      Author                                                                    Req. Version Until Version Files State
------- ------- -------------------------- ------------------------------------------------------------------------- ------------ ------------- ----- -----------------------------
hpe_ilo 4.0.0   HPE iLO Restful API Checks Andreas Doehler andreas.doehler@bechtle.com / andreas.doehler@gmail.com   2.2.0b1      None          18    Enabled (active on this site)
redfish 2.2.18  Redfish Restful API Checks Andreas Doehler (andreas.doehler@bechtle.com / andreas.doehler@gmail.com) 2.2.0b1      None          26    Enabled (active on this site)
OMD[central]:~$ mkp list
Name    Version Title                      Author                                                                    Req. Version Until Version Files State
------- ------- -------------------------- ------------------------------------------------------------------------- ------------ ------------- ----- -----------------------------
hpe_ilo 4.0.0   HPE iLO Restful API Checks Andreas Doehler andreas.doehler@bechtle.com / andreas.doehler@gmail.com   2.2.0b1      None          18    Enabled (active on this site)
redfish 2.2.18  Redfish Restful API Checks Andreas Doehler (andreas.doehler@bechtle.com / andreas.doehler@gmail.com) 2.2.0b1      None          26    Enabled (active on this site)
OMD[central]:~$ mkp disable hpe_ilo
Package hpe_ilo is not enabled
OMD[central]:~$ mkp list
Name    Version Title                      Author                                                                    Req. Version Until Version Files State
------- ------- -------------------------- ------------------------------------------------------------------------- ------------ ------------- ----- -----------------------------
hpe_ilo 4.0.0   HPE iLO Restful API Checks Andreas Doehler andreas.doehler@bechtle.com / andreas.doehler@gmail.com   2.2.0b1      None          18    Enabled (active on this site)
redfish 2.2.18  Redfish Restful API Checks Andreas Doehler (andreas.doehler@bechtle.com / andreas.doehler@gmail.com) 2.2.0b1      None          26    Enabled (active on this site)
OMD[central]:~$

Fan X
Fan speed is reported on Proliant Gen10 but not on Gen9:

iLO server off
I think there should not be a UNKN or WARN just because the OS is not running / server powered off. Proliant Gen9.

Performance
comparable to the hpe_ilo plugin.

TLDR
Great step forward for CheckMK / Check-MK / CMK, still some minor issues.

HI @andreas-doehler

We have now tested the following on a test instance (Ubuntu 22.04.3, CEE 2.2.0p14, Redfish 2.2.18):

First, we integrated the Redfish extension from CMK Exchange. Subsequently (as described in the forum), we installed package dependencies using pip3 install ‘urllib’ redfish.

The results were as follows: On the host with iLO4 FW v2.7.9, the Redfish checks work and provide data. However, the “Fan Outputs” only do not return values.

image

On another host with iLO5 FW v2.96, as well as on the Raritan device (Model PX4-559A-E8, Firmware Version 4.1.0.5-49885), the following error message occurs: [special_redfish] Agent exited with code 1: Agent failed (Crash-ID: 75a47d82-92ae-11ee-8ad9-0523ffb8dca), and no values can be retrieved through the agent.

Here is a list of the installed modules:

.

Here is a list of the installed mkp’s:

Any assistance or proposed solutions would be appreciated. Thank you.

For iLO 4 this is possible. Here i need the raw agent output.
iLO4 only supports around 60% of the Redfish standard.

Can you please run the agent on the command line with debug option? The output from crash you can sent me as PM.

HPE Proliant 360 Gen9; ILO4

OMD[server]:~/local/share/check_mk/agents/special$ ./agent_redfish -m Thermal -u checkmk -s verysecret 10.1.2.3
<<<check_mk:sep(32)>>>
Version: 2.0
AgentOS: iLO 4 - 2.82
<<<redfish_manager:sep(0)>>>
[{"@odata.context": "/redfish/v1/$metadata#Managers/Members/$entity", "@odata.id": "/redfish/v1/Managers/1/", "@odata.type": "#Manager.1.0.0.Manager", "Actions": {"#Manager.Reset": {"target": "/redfish/v1/Managers/1/Actions/Manager.Reset/"}}, "CommandShell": {"ConnectTypesSupported": ["SSH", "Oem"], "MaxConcurrentSessions": 9}, "Description": "Manager View", "EthernetInterfaces": {"@odata.id": "/redfish/v1/Managers/1/EthernetInterfaces/"}, "FirmwareVersion": "iLO 4 v2.82", "GraphicalConsole": {"ConnectTypesSupported": ["KVMIP"], "MaxConcurrentSessions": 10}, "Id": "1", "Links": {"ManagerForChassis": [{"@odata.id": "/redfish/v1/Chassis/1/"}], "ManagerForServers": [{"@odata.id": "/redfish/v1/Systems/1/"}]}, "LogServices": {"@odata.id": "/redfish/v1/Managers/1/LogServices/"}, "ManagerType": "BMC", "Name": "Manager", "NetworkProtocol": {"@odata.id": "/redfish/v1/Managers/1/NetworkService/"}, "Oem": {"Hp": {"@odata.type": "#HpiLO.1.2.0.HpiLO", "Actions": {"#HpiLO.ClearRestApiState": {"target": "/redfish/v1/Managers/1/Actions/Oem/Hp/HpiLO.ClearRestApiState/"}, "#HpiLO.ResetToFactoryDefaults": {"ResetType@Redfish.AllowableValues": ["Default"], "target": "/redfish/v1/Managers/1/Actions/Oem/Hp/HpiLO.ResetToFactoryDefaults/"}, "#HpiLO.iLOFunctionality": {"target": "/redfish/v1/Managers/1/Actions/Oem/Hp/HpiLO.iLOFunctionality/"}}, "ClearRestApiStatus": "DataPresent", "FederationConfig": {"IPv6MulticastScope": "Site", "MulticastAnnouncementInterval": 600, "MulticastDiscovery": "Enabled", "MulticastTimeToLive": 5, "iLOFederationManagement": "Enabled"}, "Firmware": {"Current": {"Date": "Feb 06 2023", "DebugBuild": false, "MajorVersion": 2, "MinorVersion": 82, "Time": "", "VersionString": "iLO 4 v2.82"}}, "License": {"LicenseKey": "xxxxx-xxxxx-xxxxx-xxxxx-2SS4W", "LicenseString": "iLO Advanced", "LicenseType": "Perpetual"}, "Links": {"ActiveHealthSystem": {"@odata.id": "/redfish/v1/Managers/1/ActiveHealthSystem/"}, "DateTimeService": {"@odata.id": "/redfish/v1/Managers/1/DateTime/"}, "EmbeddedMediaService": {"@odata.id": "/redfish/v1/Managers/1/EmbeddedMedia/"}, "FederationDispatch": {"extref": "/dispatch"}, "FederationGroups": {"@odata.id": "/redfish/v1/Managers/1/FederationGroups/"}, "FederationPeers": {"@odata.id": "/redfish/v1/Managers/1/FederationPeers/"}, "LicenseService": {"@odata.id": "/redfish/v1/Managers/1/LicenseService/"}, "SecurityService": {"@odata.id": "/redfish/v1/Managers/1/SecurityService/"}, "UpdateService": {"@odata.id": "/redfish/v1/Managers/1/UpdateService/"}, "VSPLogLocation": {"extref": "/sol.log.gz"}}, "RequiredLoginForiLORBSU": false, "SerialCLISpeed": 9600, "SerialCLIStatus": "EnabledAuthReq", "VSPDlLoggingEnabled": false, "VSPLogDownloadEnabled": false, "iLOSelfTestResults": [{"Notes": "", "SelfTestName": "NVRAMData", "Status": "OK"}, {"Notes": "", "SelfTestName": "NVRAMSpace", "Status": "OK"}, {"Notes": "Controller firmware revision  2.10.00  ", "SelfTestName": "EmbeddedFlash/SDCard", "Status": "OK"}, {"Notes": "", "SelfTestName": "EEPROM", "Status": "OK"}, {"Notes": "", "SelfTestName": "HostRom", "Status": "OK"}, {"Notes": "", "SelfTestName": "SupportedHost", "Status": "OK"}, {"Notes": "Version 1.0.9", "SelfTestName": "PowerManagementController", "Status": "Informational"}, {"Notes": "ProLiant DL360 Gen9 System Programmable Logic Device version 0x34", "SelfTestName": "CPLDPAL0", "Status": "Informational"}]}}, "SerialConsole": {"ConnectTypesSupported": ["SSH", "IPMI", "Oem"], "MaxConcurrentSessions": 13}, "Status": {"State": "Enabled"}, "UUID": "b964487d-efe2-5453-85ee-06cd4ff14ef2", "VirtualMedia": {"@odata.id": "/redfish/v1/Managers/1/VirtualMedia/"}}]
<<<redfish_system:sep(0)>>>
[{"@odata.context": "/redfish/v1/$metadata#Systems/Members/$entity", "@odata.id": "/redfish/v1/Systems/1/", "@odata.type": "#ComputerSystem.1.0.1.ComputerSystem", "Actions": {"#ComputerSystem.Reset": {"ResetType@Redfish.AllowableValues": ["On", "ForceOff", "ForceRestart", "Nmi", "PushPowerButton"], "target": "/redfish/v1/Systems/1/Actions/ComputerSystem.Reset/"}}, "AssetTag": "YL1609200314", "BiosVersion": "P89 v3.30 (09/21/2023)", "Boot": {"BootSourceOverrideEnabled": "Disabled", "BootSourceOverrideTarget": "None", "UefiTargetBootSourceOverride": "None"}, "Description": "Computer System View", "EthernetInterfaces": {"@odata.id": "/redfish/v1/Systems/1/EthernetInterfaces/"}, "HostName": "SERVERNAME", "Id": "1", "IndicatorLED": "Off", "Links": {"Chassis": [{"@odata.id": "/redfish/v1/Chassis/1/"}], "ManagedBy": [{"@odata.id": "/redfish/v1/Managers/1/"}]}, "LogServices": {"@odata.id": "/redfish/v1/Systems/1/LogServices/"}, "Manufacturer": "HPE", "MemorySummary": {"Status": {"HealthRollup": "OK"}, "TotalSystemMemoryGiB": 512}, "Model": "ProLiant DL360 Gen9", "Name": "Computer System", "Oem": {"Hp": {"@odata.type": "#HpComputerSystemExt.1.2.2.HpComputerSystemExt", "Actions": {"#HpComputerSystemExt.PowerButton": {"PushType@Redfish.AllowableValues": ["Press", "PressAndHold"], "target": "/redfish/v1/Systems/1/Actions/Oem/Hp/ComputerSystemExt.PowerButton/"}, "#HpComputerSystemExt.SystemReset": {"ResetType@Redfish.AllowableValues": ["ColdBoot", "AuxCycle"], "target": "/redfish/v1/Systems/1/Actions/Oem/Hp/ComputerSystemExt.SystemReset/"}}, "Bios": {"Backup": {"Date": "07/18/2022", "Family": "P89", "VersionString": "P89 v3.02 (07/18/2022)"}, "Current": {"Date": "09/21/2023", "Family": "P89", "VersionString": "P89 v3.30 (09/21/2023)"}, "UefiClass": 2}, "DeviceDiscoveryComplete": {"AMSDeviceDiscovery": "Complete", "DeviceDiscovery": "vMainDeviceDiscoveryComplete", "SmartArrayDiscovery": "Complete"}, "HostOS": {"OsName": "Microsoft Windows Server 2019 Datacenter", "OsSysDescription": "Hardware: Intel64 Family 6 Model 79 Stepping 1 AT/AT COMPATIBLE - Software: Windows Version 10.0 (Build 17763 Multiprocessor Free)", "OsType": 57, "OsVersion": "10.0.17763"}, "IntelligentProvisioningIndex": 3, "IntelligentProvisioningLocation": "System Board", "IntelligentProvisioningVersion": "2.82.9", "Links": {"BIOS": {"@odata.id": "/redfish/v1/systems/1/bios/"}, "EthernetInterfaces": {"@odata.id": "/redfish/v1/Systems/1/EthernetInterfaces/"}, "FirmwareInventory": {"@odata.id": "/redfish/v1/Systems/1/FirmwareInventory/"}, "Memory": {"@odata.id": "/redfish/v1/Systems/1/Memory/"}, "NetworkAdapters": {"@odata.id": "/redfish/v1/Systems/1/NetworkAdapters/"}, "PCIDevices": {"@odata.id": "/redfish/v1/Systems/1/PCIDevices/"}, "PCISlots": {"@odata.id": "/redfish/v1/Systems/1/PCISlots/"}, "SUT": {"@odata.id": "/redfish/v1/systems/1/hpsut/"}, "SecureBoot": {"@odata.id": "/redfish/v1/Systems/1/SecureBoot/"}, "SmartStorage": {"@odata.id": "/redfish/v1/Systems/1/SmartStorage/"}, "SoftwareInventory": {"@odata.id": "/redfish/v1/Systems/1/SoftwareInventory/"}}, "PostState": "FinishedPost", "PowerAllocationLimit": 1600, "PowerAutoOn": "Restore", "PowerOnDelay": "Minimum", "PowerRegulatorMode": "Max", "PowerRegulatorModesSupported": ["OSControl", "Dynamic", "Max", "Min"], "TrustedModules": [{"Status": "NotPresent"}], "VirtualProfile": "Inactive"}}, "PowerState": "On", "ProcessorSummary": {"Count": 2, "Model": "Intel(R) Xeon(R) CPU E5-2696 v4 @ 2.20GHz", "Status": {"HealthRollup": "OK"}}, "Processors": {"@odata.id": "/redfish/v1/Systems/1/Processors/"}, "SKU": "755258-B21", "SerialNumber": "6CU64017LL", "Status": {"Health": "OK", "State": "Enabled"}, "SystemType": "Physical", "UUID": "32353537-3835-4336-5536-343031374C4C"}]
<<<redfish_chassis:sep(0)>>>
[{"@odata.context": "/redfish/v1/$metadata#Chassis/Members/$entity", "@odata.id": "/redfish/v1/Chassis/1/", "@odata.type": "#Chassis.1.0.0.Chassis", "ChassisType": "RackMount", "Id": "1", "Links": {"ComputerSystems": [{"@odata.id": "/redfish/v1/Systems/1/"}], "ManagedBy": [{"@odata.id": "/redfish/v1/Managers/1/"}]}, "Manufacturer": "HPE", "Model": "ProLiant DL360 Gen9", "Name": "Computer System Chassis", "Oem": {"Hp": {"@odata.type": "#HpServerChassis.1.1.0.HpServerChassis", "Firmware": {"PlatformDefinitionTable": {"Current": {"VersionString": "27.01"}}, "PowerManagementController": {"Current": {"VersionString": "1.0.9"}}, "PowerManagementControllerBootloader": {"Current": {"Family": "20", "VersionString": "1.0"}}, "SPSFirmwareVersionData": {"Current": {"VersionString": "3.1.3.21.0"}}, "SystemProgrammableLogicDevice": {"Current": {"VersionString": "Version 0x34"}}}, "Location": {"LocationInRack": {"RackLdsPartNumber": "0", "RackLdsProductDescription": "0", "RackUHeight": 0, "TagVersion": 0, "ULocation": "UNKNOWN"}, "LocationOfChassis": {"UUID": "32353537-3835-4336-5536-343031374C4C"}}}}, "Power": {"@odata.id": "/redfish/v1/Chassis/1/Power/"}, "SKU": "755258-B21", "SerialNumber": "6CU64017LL", "Status": {"Health": "OK", "State": "Enabled"}, "Thermal": {"@odata.id": "/redfish/v1/Chassis/1/Thermal/"}}]
<<<redfish_thermal:sep(0)>>>
{"@odata.context": "/redfish/v1/$metadata#Chassis/Members/1/Thermal$entity", "@odata.id": "/redfish/v1/Chassis/1/Thermal/", "@odata.type": "#Thermal.1.2.0.Thermal", "Fans": [{"FanName": "Fan 1", "Oem": {"Hp": {"@odata.type": "#HpServerFan.1.0.0.HpServerFan", "Location": "System"}}, "Status": {"Health": "OK", "State": "Enabled"}}, {"FanName": "Fan 2", "Oem": {"Hp": {"@odata.type": "#HpServerFan.1.0.0.HpServerFan", "Location": "System"}}, "Status": {"Health": "OK", "State": "Enabled"}}, {"FanName": "Fan 3", "Oem": {"Hp": {"@odata.type": "#HpServerFan.1.0.0.HpServerFan", "Location": "System"}}, "Status": {"Health": "OK", "State": "Enabled"}}, {"FanName": "Fan 4", "Oem": {"Hp": {"@odata.type": "#HpServerFan.1.0.0.HpServerFan", "Location": "System"}}, "Status": {"Health": "OK", "State": "Enabled"}}, {"FanName": "Fan 5", "Oem": {"Hp": {"@odata.type": "#HpServerFan.1.0.0.HpServerFan", "Location": "System"}}, "Status": {"Health": "OK", "State": "Enabled"}}, {"FanName": "Fan 6", "Oem": {"Hp": {"@odata.type": "#HpServerFan.1.0.0.HpServerFan", "Location": "System"}}, "Status": {"Health": "OK", "State": "Enabled"}}, {"FanName": "Fan 7", "Oem": {"Hp": {"@odata.type": "#HpServerFan.1.0.0.HpServerFan", "Location": "System"}}, "Status": {"Health": "OK", "State": "Enabled"}}], "Id": "Thermal", "Name": "Thermal", "Temperatures": [{"Name": "01-Inlet Ambient", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 15, "LocationYmm": 0}}, "PhysicalContext": "Intake", "ReadingCelsius": 31, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 42, "UpperThresholdFatal": 46, "UpperThresholdUser": 0}, {"Name": "02-CPU 1", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 11, "LocationYmm": 5}}, "PhysicalContext": "CPU", "ReadingCelsius": 40, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 70, "UpperThresholdFatal": 0}, {"Name": "03-CPU 2", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 4, "LocationYmm": 5}}, "PhysicalContext": "CPU", "ReadingCelsius": 40, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 70, "UpperThresholdFatal": 0}, {"Name": "04-P1 DIMM 1-6", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 9, "LocationYmm": 5}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 36, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 89, "UpperThresholdFatal": 0}, {"Name": "05-P1 DIMM 7-12", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 14, "LocationYmm": 5}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 39, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 89, "UpperThresholdFatal": 0}, {"Name": "06-P2 DIMM 1-6", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 1, "LocationYmm": 5}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 38, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 89, "UpperThresholdFatal": 0}, {"Name": "07-P2 DIMM 7-12", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 6, "LocationYmm": 5}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 38, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 89, "UpperThresholdFatal": 0}, {"Name": "08-HD Max", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 10, "LocationYmm": 0}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 35, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 60, "UpperThresholdFatal": 0}, {"Name": "09-Exp Bay Drive", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 12, "LocationYmm": 0}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 0, "Status": {"State": "Absent"}, "UpperThresholdCritical": 0, "UpperThresholdFatal": 0}, {"Name": "10-Chipset", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 13, "LocationYmm": 10}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 39, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 105, "UpperThresholdFatal": 0}, {"Name": "11-PS 1 Inlet", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 1, "LocationYmm": 10}}, "PhysicalContext": "PowerSupply", "ReadingCelsius": 33, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 0, "UpperThresholdFatal": 0}, {"Name": "12-PS 2 Inlet", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 4, "LocationYmm": 10}}, "PhysicalContext": "PowerSupply", "ReadingCelsius": 32, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 0, "UpperThresholdFatal": 0}, {"Name": "13-VR P1", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 10, "LocationYmm": 1}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 44, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 115, "UpperThresholdFatal": 120}, {"Name": "14-VR P2", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 4, "LocationYmm": 1}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 48, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 115, "UpperThresholdFatal": 120}, {"Name": "15-VR P1 Mem", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 9, "LocationYmm": 1}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 35, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 115, "UpperThresholdFatal": 120}, {"Name": "16-VR P1 Mem", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 13, "LocationYmm": 1}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 37, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 115, "UpperThresholdFatal": 120}, {"Name": "17-VR P2 Mem", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 2, "LocationYmm": 1}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 37, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 115, "UpperThresholdFatal": 120}, {"Name": "18-VR P2 Mem", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 6, "LocationYmm": 1}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 35, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 115, "UpperThresholdFatal": 120}, {"Name": "19-PS 1 Internal", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 1, "LocationYmm": 13}}, "PhysicalContext": "PowerSupply", "ReadingCelsius": 40, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 0, "UpperThresholdFatal": 0}, {"Name": "20-PS 2 Internal", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 4, "LocationYmm": 13}}, "PhysicalContext": "PowerSupply", "ReadingCelsius": 40, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 0, "UpperThresholdFatal": 0}, {"Name": "21-PCI 1", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 13, "LocationYmm": 13}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 0, "Status": {"State": "Absent"}, "UpperThresholdCritical": 0, "UpperThresholdFatal": 0}, {"Name": "22-PCI 2", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 13, "LocationYmm": 13}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 0, "Status": {"State": "Absent"}, "UpperThresholdCritical": 0, "UpperThresholdFatal": 0}, {"Name": "23-PCI 3", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 5, "LocationYmm": 12}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 0, "Status": {"State": "Absent"}, "UpperThresholdCritical": 0, "UpperThresholdFatal": 0}, {"Name": "24-HD Controller", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 8, "LocationYmm": 8}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 0, "Status": {"State": "Absent"}, "UpperThresholdCritical": 0, "UpperThresholdFatal": 0}, {"Name": "25-LOM Card", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 14, "LocationYmm": 13}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 59, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 100, "UpperThresholdFatal": 0}, {"Name": "26-LOM", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 7, "LocationYmm": 13}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 0, "Status": {"State": "Absent"}, "UpperThresholdCritical": 0, "UpperThresholdFatal": 0}, {"Name": "27-Front Ambient", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 9, "LocationYmm": 0}}, "PhysicalContext": "Intake", "ReadingCelsius": 29, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 65, "UpperThresholdFatal": 0}, {"Name": "28-P/S 2 Zone", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 3, "LocationYmm": 7}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 37, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 75, "UpperThresholdFatal": 0}, {"Name": "29-Battery Zone", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 7, "LocationYmm": 10}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 33, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 75, "UpperThresholdFatal": 80}, {"Name": "30-iLO Zone", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 9, "LocationYmm": 14}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 37, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 90, "UpperThresholdFatal": 95}, {"Name": "31-PCI 1 Zone", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 13, "LocationYmm": 13}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 37, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 70, "UpperThresholdFatal": 75}, {"Name": "32-PCI 2 Zone", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 13, "LocationYmm": 13}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 39, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 70, "UpperThresholdFatal": 75}, {"Name": "33-PCI 3 Zone", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 5, "LocationYmm": 12}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 0, "Status": {"State": "Absent"}, "UpperThresholdCritical": 0, "UpperThresholdFatal": 0}, {"Name": "34-HD Cntlr Zone", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 11, "LocationYmm": 7}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 0, "Status": {"State": "Absent"}, "UpperThresholdCritical": 0, "UpperThresholdFatal": 0}, {"Name": "35-I/O Zone", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 14, "LocationYmm": 11}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 36, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 75, "UpperThresholdFatal": 80}, {"Name": "36-Storage Batt", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 8, "LocationYmm": 0}}, "PhysicalContext": "SystemBoard", "ReadingCelsius": 0, "Status": {"State": "Absent"}, "UpperThresholdCritical": 0, "UpperThresholdFatal": 0}, {"Name": "37-Fuse", "Oem": {"Hp": {"@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors", "LocationXmm": 1, "LocationYmm": 8}}, "PhysicalContext": "PowerSupply", "ReadingCelsius": 38, "Status": {"Health": "OK", "State": "Enabled"}, "UpperThresholdCritical": 100, "UpperThresholdFatal": 0}]}

Look like -m Thermal doesn’t work?

The thermal section looks perfectly fine for iLO4.

Fan

        {
            "FanName": "Fan 1",
            "Oem": {
                "Hp": {
                    "@odata.type": "#HpServerFan.1.0.0.HpServerFan",
                    "Location": "System",
                }
            },
            "Status": {"Health": "OK", "State": "Enabled"},
        },

Temperature

        {
            "Name": "01-Inlet Ambient",
            "Oem": {
                "Hp": {
                    "@odata.type": "#HpSeaOfSensors.1.0.0.HpSeaOfSensors",
                    "LocationXmm": 15,
                    "LocationYmm": 0,
                }
            },
            "PhysicalContext": "Intake",
            "ReadingCelsius": 31,
            "Status": {"Health": "OK", "State": "Enabled"},
            "UpperThresholdCritical": 42,
            "UpperThresholdFatal": 46,
            "UpperThresholdUser": 0,
        },

The “-m” switch only outputs the selected sections. You selected “thermal” and this is in your output.
“manager”, “chassis” and “system” is every time inside the output.

iLO5:

OMD[TEST]:~$ /omd/sites/TEST/local/share/check_mk/agents/special/agent_redfish -u checkmk -s '********' -m Memory,Power,Processors,Thermal,FirmwareInventory,NetworkAdapters,NetworkInterfaces,EthernetInterfaces,Storage,ArrayControllers,SmartStorage,HostBusAdapters,PhysicalDrives,LogicalDrives IP-ADDRESS --debug
INFO 2023-12-06 11:49:24 redfish: Redfish API
INFO 2023-12-06 11:49:24 redfish.rest.v1: Attempt 1 of /redfish/v1
INFO 2023-12-06 11:49:25 redfish.rest.v1: Response Time for GET to /redfish/v1: 0.559999154007528 seconds.
INFO 2023-12-06 11:49:25 redfish.rest.v1: Attempt 1 of /redfish/v1/SessionService/Sessions
INFO 2023-12-06 11:49:25 redfish.rest.v1: Response Time for POST to /redfish/v1/SessionService/Sessions: 0.01032053999369964 seconds.
INFO 2023-12-06 11:49:25 redfish.rest.v1: Login returned code 400: {"error":{"code":"iLO.0.10.ExtendedInfo","message":"See @Message.ExtendedInfo for more information.","@Message.ExtendedInfo":[{"MessageId":"iLO.2.19.UnauthorizedLoginAttempt"}]}}
Traceback (most recent call last):
  File "/omd/sites/TEST/local/share/check_mk/agents/special/agent_redfish", line 615, in <module>
    sys.exit(main())
             ^^^^^^
  File "/omd/sites/TEST/local/share/check_mk/agents/special/agent_redfish", line 611, in main
    return special_agent_main(parse_arguments, agent_redfish_main)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/TEST/lib/python3/cmk/special_agents/utils/agent_common.py", line 171, in special_agent_main
    return _special_agent_main_core(parse_arguments, main_fn, argv or sys.argv[1:])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/TEST/lib/python3/cmk/special_agents/utils/agent_common.py", line 148, in _special_agent_main_core
    return main_fn(args)
           ^^^^^^^^^^^^^
  File "/omd/sites/TEST/local/share/check_mk/agents/special/agent_redfish", line 600, in agent_redfish_main
    redfishobj = get_session(args)
                 ^^^^^^^^^^^^^^^^^
  File "/omd/sites/TEST/local/share/check_mk/agents/special/agent_redfish", line 573, in get_session
    redfishobj.login(auth="session")
  File "/omd/sites/TEST/local/lib/python3/redfish/rest/v1.py", line 1017, in login
    raise SessionCreationError('HTTP {}: Failed to created the session\n{}'.format(resp.status, error_str))
redfish.rest.v1.SessionCreationError: HTTP 400: Failed to created the session
See @Message.ExtendedInfo for more information.

Raritan PX4

OMD[TEST]:~$ /omd/sites/TEST/local/share/check_mk/agents/special/agent_redfish -u cmktest -s '*******************' -m Memory,Power,Processors,Thermal,FirmwareInventory,NetworkAdapters,NetworkInterfaces,EthernetInterfaces,Storage,ArrayControllers,SmartStorage,HostBusAdapters,PhysicalDrives,LogicalDrives IP-ADDRESS --debug
INFO 2023-12-06 11:54:30 redfish: Redfish API
INFO 2023-12-06 11:54:30 redfish.rest.v1: Attempt 1 of /redfish/v1
INFO 2023-12-06 11:54:31 redfish.rest.v1: Response Time for GET to /redfish/v1: 0.4948085469659418 seconds.
INFO 2023-12-06 11:54:31 redfish.rest.v1: Attempt 1 of /redfish/v1/SessionService/Sessions
INFO 2023-12-06 11:54:31 redfish.rest.v1: Response Time for GET to /redfish/v1/SessionService/Sessions: 0.07353916298598051 seconds.
INFO 2023-12-06 11:54:31 redfish.rest.v1: Attempt 1 of /redfish/v1
INFO 2023-12-06 11:54:31 redfish.rest.v1: Response Time for GET to /redfish/v1: 0.039514622010756284 seconds.
Traceback (most recent call last):
  File "/omd/sites/TEST/local/share/check_mk/agents/special/agent_redfish", line 615, in <module>
    sys.exit(main())
             ^^^^^^
  File "/omd/sites/TEST/local/share/check_mk/agents/special/agent_redfish", line 611, in main
    return special_agent_main(parse_arguments, agent_redfish_main)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/TEST/lib/python3/cmk/special_agents/utils/agent_common.py", line 171, in special_agent_main
    return _special_agent_main_core(parse_arguments, main_fn, argv or sys.argv[1:])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/TEST/lib/python3/cmk/special_agents/utils/agent_common.py", line 148, in _special_agent_main_core
    return main_fn(args)
           ^^^^^^^^^^^^^
  File "/omd/sites/TEST/local/share/check_mk/agents/special/agent_redfish", line 601, in agent_redfish_main
    get_information(redfishobj, args.sections)
  File "/omd/sites/TEST/local/share/check_mk/agents/special/agent_redfish", line 437, in get_information
    chassis_url = base_data.get("Chassis").get("@odata.id")
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

I would say the user is not authorized to login.

This is a PDU. I can make the agent itself a little bit more fail prove but what data do you expect from this device?
The problem i see is that it has no “Chassis” object.

:smiley: works

For example the Outlets of the PDU, but as much as possible :slight_smile:
What would help you?

There exists a Redfish extraction tool.

With this you can produce the complete Redfish data from the PDU.
And it is possible to see if checks can be made.