Over the last weeks i made some bugfixes and functional extensions to the existing Redfish integration.
Attached you can find a extension package the includes all the modifications.
Who want, please test if it is working with your setup - i have at the moment only around 30 Redfish dumps available to make some automatic testing.
Fixes:
Dell Servicetag recognition
HW/SW inventory paths for Firmware and hardware information
special agent exception if no data is present for components
special agent extra output if normal special agent is called for a PDU
special agent power - now also fetches sensors if they are inside a chassis collection
New Checks:
Dell StorageController battery state
PSU redundancy check
System power consumption
RackPDU state
HW/SW inventory data for devices with serial number (only these are filtered to be shown)
Only new function is the rollup state only for storage controllers.
This is usefull if you storage consists of more than one subsystem or subsystems with “strange” states. The rollup state should represent the whole state of all subsystems.
Fix: device levels from some HPE power supplies (0 degrees) ignored automatically
thanks for backporting to 2.3 (I still haven’t gotten around to update all of our own custom extensions to the new APIs…)
No errors / crashes occured, “System state” is now “System 1”, and older firmware Lenovo servers display a WARN on the new Power redundancy check because the component is supposedly disabled…. That’s most likely a firmware issue as the same model shows all green on a more recent XCC version.
HW inventory seems properly populated as well for our servers, so good job.
That’s the same server where seven drives have exactly the same ID and Name on two different controllers. Here the new way is needed to have all drives inside your monitoring.
In the default configuration, the classic naming is used to avoid breaking changes. But if you think not all drives are visible inside your monitoring you can switch the discovery option to check if more are found.
I only write here as i cannot comment on a closed PR @smeagol91 after the merge of the huge PR (what is good in first place) i see two problems at the moment.
The current file inventory_redfish_data.py has nothing to do with the file with the same name from the PR. Result is a completely broken HW/SW inventory.
Original PR
Second problem is the way the storage battery discovery function was changed from the PR to the now current code.
Original
def discovery_redfish_storage_battery(section: RedfishAPIData) -> DiscoveryResult:
"""Discover single controller batteries"""
for key in section.keys():
if section[key].get("Status", {}).get("State") == "UnavailableOffline":
continue
if (section[key].get("Oem") or {}).get("Dell", {}).get(
"DellControllerBattery", None
) is not None:
yield Service(item=section[key]["Id"])
Now actual code
def discovery_redfish_storage_battery(section: RedfishAPIData) -> DiscoveryResult:
"""Discover Dell storage controller batteries."""
for key in section:
if section[key].get("Status", {}).get("State") == "UnavailableOffline":
continue
battery_data = section[key].get("Oem", {}).get("Dell", {}).get("DellControllerBattery")
if battery_data:
yield Service(item=section[key]["Id"])
This code is producing nice crashes if there is empty data inside a key.
Like this crash here.
File "/omd/sites/cmk/lib/python3/cmk/checkengine/plugin_backend/check_plugins.py", line 77, in filtered_generator
for element in generator(*args, **kwargs):
~~~~~~~~~^^^^^^^^^^^^^^^^^
File "/omd/sites/cmk/lib/python3.13/site-packages/cmk/plugins/redfish/agent_based/redfish_storage.py", line 93, in discovery_redfish_storage_battery
battery_data = section[key].get("Oem", {}).get("Dell", {}).get("DellControllerBattery")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'
Please revert these changes back to what i had submitted. It was working on over 30 Redfish simulator dumps without any problem.
Also some more problems found - i don’t understand why these was also changed.
→ PowerControl is not working as before.
Thank you for the detailed report you were right on all three points. I’ve investigated thoroughly, comparing every file from your PR against what was merged, and identified the root causes.
inventory_redfish_data.py During the merge, this file was inadvertently rewritten to only inventorize drives from the redfish_drives section. (Reverted now)
Storage battery crash: The .get(“Oem”, {}).get(“Dell”, {}) chain crashes when the Oem key exists but has value None. Fixed using the (x.get(“Oem”) or {}) pattern, as you had it in your original code. Battery name is now also included in the output, and we kept the status map (Warning/Degraded → WARN, anything else non-OK → CRIT).
PowerControl: Both redfish_power_consumption and redfish_power_redundancy were changed from single aggregated services to per-MemberId item services during the merge. This broke the output format and the service model. Both have been restored to your original single-service design. Power consumption now shows System Power Control: PowerCapacityWatts - ### W / PowerConsumedWatts - ### W again, with the “No power consumption data available.” fallback.
I’ve also verified every other file touched by the PR (drives, volumes, ethernet interfaces, physical drives, PDUs, system, lib.py, agent hardening, rulesets, graphing) against your originals: I have found no further issues. The remaining differences are really small improvements made during the merge.
Now with released version 2.4.0p26 and also 2.5.0b4 there are all functionality from the “redfish_exstensions” mkp is included with CMK.
A short check on my dev system showed no problems.
@andreas-doehler , is recommended to remove your bugfix redfish_extensions plugins prior to attempting an update to the 2.4.0p26 or will it auto-disable as part of the upgrade process?
You don’t need to remove before. At upgrade time you get some output that metrics are defined twice - that was my signal that the changes are integrated now
After upgrade you can disable and remove the mkp.
It will not be disabled automatically as there is no until version set.