Redfish extensions for existing integration in CMK 2.4

Over the last weeks i made some bugfixes and functional extensions to the existing Redfish integration.

Attached you can find a extension package the includes all the modifications.
Who want, please test if it is working with your setup - i have at the moment only around 30 Redfish dumps available to make some automatic testing.

Fixes:

  • Dell Servicetag recognition
  • HW/SW inventory paths for Firmware and hardware information
  • special agent exception if no data is present for components
  • special agent extra output if normal special agent is called for a PDU
  • special agent power - now also fetches sensors if they are inside a chassis collection

New Checks:

  • Dell StorageController battery state
  • PSU redundancy check
  • System power consumption
  • RackPDU state
  • HW/SW inventory data for devices with serial number (only these are filtered to be shown)
  • Rollup state only check for storage controllers

MKP file
redfish_extensions-1.3.1.mkp (14.0 KB)

Please test and give some feedback if unexpected problems occur.

5 Likes

Hello ,

I cant install the Redfish extension package because we use 2.3.0p40 of CheckMK .

Michael

I must see how much work this is for back porting to 2.3 not sure if it will come.

First a small update to the package.

redfish_extensions-1.3.3.mkp (16.6 KB)

Only new function is the rollup state only for storage controllers.
This is usefull if you storage consists of more than one subsystem or subsystems with “strange” states. The rollup state should represent the whole state of all subsystems.

Fix: device levels from some HPE power supplies (0 degrees) ignored automatically

1 Like

Some more small fixes. Mainly for Dell devices.

Tested with around 30 different mockup dumps.

redfish_extensions-1.3.6.mkp (17.4 KB)

All this is now also submitted as PR - we will see how quick the code comes into the main branch. :slight_smile:

1 Like

Next update.

redfish_extensions-1.3.7.mkp (18.8 KB)

I also ported these changes back to a last CMK 2.3 version.
It’s already available on Github → https://github.com/Yogibaer75/Check_MK-Things/raw/refs/heads/master/check%20plugins%202.3/redfish/redfish-2.3.78.mkp

And the next days also on the exchange.

2 Likes

Hello Andreas,

thanks for backporting to 2.3 (I still haven’t gotten around to update all of our own custom extensions to the new APIs…)

No errors / crashes occured, “System state” is now “System 1”, and older firmware Lenovo servers display a WARN on the new Power redundancy check because the component is supposedly disabled…. That’s most likely a firmware issue as the same model shows all green on a more recent XCC version.

HW inventory seems properly populated as well for our servers, so good job.

That was necessary to support systems with more than one instance (Bladecenter).

One Update more.

redfish_extensions-1.3.8.mkp (19.5 KB) and

https://github.com/Yogibaer75/Check_MK-Things/raw/refs/heads/master/check%20plugins%202.3/redfish/redfish-2.3.79.mkp

Fixed/Added function

→ Discovery Rule → Redfish Physical Drive discovery

Here you can define how drive items should be discovered. Options as it is done now or only system/controller/drive id based.

Result will look like this.

Classic

New way example with same drives

That’s the same server where seven drives have exactly the same ID and Name on two different controllers. Here the new way is needed to have all drives inside your monitoring.
In the default configuration, the classic naming is used to avoid breaking changes. But if you think not all drives are visible inside your monitoring you can switch the discovery option to check if more are found.

1 Like

One more update. The same fix as for the drives also for the volume checks.

https://github.com/Yogibaer75/Check_MK-Things/raw/refs/heads/master/check%20plugins%202.3/redfish/redfish_extensions-1.3.9.mkp

In the examples on top is the new naming style on the bottom the old one.

Without creating a rule the system stays at the old style for discovery.
It is only done this way to prevent “incompatible changes” :wink:

I only write here as i cannot comment on a closed PR :frowning:
@smeagol91 after the merge of the huge PR (what is good in first place) i see two problems at the moment.

The current file inventory_redfish_data.py has nothing to do with the file with the same name from the PR. Result is a completely broken HW/SW inventory.
Original PR
image

Now with the daily build

Second problem is the way the storage battery discovery function was changed from the PR to the now current code.
Original

def discovery_redfish_storage_battery(section: RedfishAPIData) -> DiscoveryResult:
    """Discover single controller batteries"""
    for key in section.keys():
        if section[key].get("Status", {}).get("State") == "UnavailableOffline":
            continue
        if (section[key].get("Oem") or {}).get("Dell", {}).get(
            "DellControllerBattery", None
        ) is not None:
            yield Service(item=section[key]["Id"])

Now actual code

def discovery_redfish_storage_battery(section: RedfishAPIData) -> DiscoveryResult:
    """Discover Dell storage controller batteries."""
    for key in section:
        if section[key].get("Status", {}).get("State") == "UnavailableOffline":
            continue
        battery_data = section[key].get("Oem", {}).get("Dell", {}).get("DellControllerBattery")
        if battery_data:
            yield Service(item=section[key]["Id"])

This code is producing nice crashes if there is empty data inside a key.
Like this crash here.

  File "/omd/sites/cmk/lib/python3/cmk/checkengine/plugin_backend/check_plugins.py", line 77, in filtered_generator
    for element in generator(*args, **kwargs):
                   ~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/omd/sites/cmk/lib/python3.13/site-packages/cmk/plugins/redfish/agent_based/redfish_storage.py", line 93, in discovery_redfish_storage_battery
    battery_data = section[key].get("Oem", {}).get("Dell", {}).get("DellControllerBattery")
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

Please revert these changes back to what i had submitted. It was working on over 30 Redfish simulator dumps without any problem.

Also some more problems found - i don’t understand why these was also changed.
→ PowerControl is not working as before.

Before


After

WHY???

1 Like

Hello Andreas,

Thank you for the detailed report you were right on all three points. I’ve investigated thoroughly, comparing every file from your PR against what was merged, and identified the root causes.

  1. inventory_redfish_data.py During the merge, this file was inadvertently rewritten to only inventorize drives from the redfish_drives section. (Reverted now)

  2. Storage battery crash: The .get(“Oem”, {}).get(“Dell”, {}) chain crashes when the Oem key exists but has value None. Fixed using the (x.get(“Oem”) or {}) pattern, as you had it in your original code. Battery name is now also included in the output, and we kept the status map (Warning/Degraded → WARN, anything else non-OK → CRIT).

  3. PowerControl: Both redfish_power_consumption and redfish_power_redundancy were changed from single aggregated services to per-MemberId item services during the merge. This broke the output format and the service model. Both have been restored to your original single-service design. Power consumption now shows System Power Control: PowerCapacityWatts - ### W / PowerConsumedWatts - ### W again, with the “No power consumption data available.” fallback.

I’ve also verified every other file touched by the PR (drives, volumes, ethernet interfaces, physical drives, PDUs, system, lib.py, agent hardening, rulesets, graphing) against your originals: I have found no further issues. The remaining differences are really small improvements made during the merge.

The fix is being pushed to 2.4.0, 2.5.0, and master and should arrive soon. (Change-Id: I9117b744f5c2ee5e78c86b97c08435f1046c3dec)

Sorry for the trouble, and thank you again for the contribution and for catching these issues so quickly.

Best
Sebastian

3 Likes

Now with released version 2.4.0p26 and also 2.5.0b4 there are all functionality from the “redfish_exstensions” mkp is included with CMK.
A short check on my dev system showed no problems.

1 Like

@andreas-doehler , is recommended to remove your bugfix redfish_extensions plugins prior to attempting an update to the 2.4.0p26 or will it auto-disable as part of the upgrade process?

Sincerely,

Scotsie

You don’t need to remove before. At upgrade time you get some output that metrics are defined twice - that was my signal that the changes are integrated now :smiley:
After upgrade you can disable and remove the mkp.
It will not be disabled automatically as there is no until version set.

1 Like