No more IPMI checks via Management Board since 2.4.x update

CMK version: 2.4.0p8 RAW
OS version: Ubuntu 24.04.2 LTS

Hi everyone,

Since I updated my RAW checkmk in 2.4, all my IPMI checks (via Management board) appear stale and still don’t work if I manually force the checks.

I tried to install manually ipmitool and freeipmi on the server, I tried to allow the ipmi-sensors command by sudoers for my site user account but it is still not working.

I know this feature will be deprecated but it’s very useful. I check supermicro IPMI, Dell IDRAC and HPE ILO.

Thank you

Additionnal debug info :

cmk --debug -vv --checks=mgmt_ipmi_sensors exxxxx01

WARNING: '--checks' is deprecated in favour of option 'detect-plugins'
Unknown check plugin 'mgmt_ipmi_sensors'
Traceback (most recent call last):
  File "/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/modes/check_mk.py", line 2017, in _lookup_plugin
    return plugins[plugin_name]
           ~~~~~~~^^^^^^^^^^^^^
KeyError: CheckPluginName('mgmt_ipmi_sensors')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/omd/sites/xxxxxxxxxx/bin/cmk", line 157, in <module>
    exit_status = modes.call("--check", None, opts, args, trace_context)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/modes/__init__.py", line 91, in call
    return handler(*handler_args)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/modes/check_mk.py", line 2302, in mode_check
    selected_sections, run_plugin_names = _extract_plugin_selection(
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/modes/check_mk.py", line 2051, in _extract_plugin_selection
    agent_based_register.filter_relevant_raw_sections(
  File "/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/api/agent_based/register/utils.py", line 198, in filter_relevant_raw_sections
    section_name for plugin in consumers for section_name in plugin.sections
                               ^^^^^^^^^
  File "/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/modes/check_mk.py", line 2052, in <genexpr>
    consumers=(_lookup_plugin(pn, plugins) for pn in plugin_names),
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/modes/check_mk.py", line 2019, in _lookup_plugin
    raise MKBailOut(f"Unknown check plugin '{plugin_name}'") from exc
cmk.ccc.exceptions.MKBailOut: Unknown check plugin 'mgmt_ipmi_sensors'

Hello,

Should be fixed:

Anyway migrate this to a dedicated host as recommended. As the BMC is an independent device it makes totally sense.

best regards

Michael

2 Likes

HI,

As you can see it’s not working because of Python errors. Is there a way to repair theses files in /lib/python3/cmk/base/modes?

I tried to configure as independant device too, but nothing work (tried with IDRAC and ILO).

With FreeIPMI: Agent exited with code 1: ERROR: ‘ipmi-sensors: connection timeout’.

With IPMITool: Agent exited with code 1: ERROR: ‘Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory, Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory’.
OR
Agent exited with code 1: ERROR: 'Error: Unable to establish LAN session
Error: Unable to establish IPMI v1.5 / RMCP session, Error: Unable to establish LAN session
Error: Unable to establish IPMI v1.5 / RMCP session’.
OR
Agent exited with code 1: ERROR: ‘Error: Unable to establish IPMI v2 / RMCP+ session, Error: Unable to establish IPMI v2 / RMCP+ session’.
OR
<<ipmi:sep(124)>>
Error: no IMB driver found at /dev/imb!
<<<ipmi_discrete:sep(124)>>>
Error: no IMB driver found at /dev/imb!
Depending on the IPMI interface configured…

After a long time to investigate and test a lot, here is my results:

  • With Management inteface, evrything is broken and nothing works since update
  • With dedicated host via SNMP it’s working but a lot of sensors are missing
  • For my old IPMI supermicro board it is working well via freeipmi
  • For my IDRAC 9, nothing is woking with IPMI configuration, only SNMP is working …
  • For my ILOs 4, freeipmi has finally worked with “LAN_2_0” as driver type. But the problem is that the checks are taking a very very long time: about 2mn and generate a lot of errors in WATO… I don’t understand why because when I check connexions the results of IPMI agent are showed in less than a second, with good values … Is this another bug ? On theses dedicated hosts the “Check_MK Discovery” service is always timed out …

If you have any suggestion (and a way to repair management board too), I’m interested!

Hi, I still have two problems:

  • some python files seems corrupted in “/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/modes/” and the management board (IPMI and SNMP) is broken whatever I am trying.
  • For ILO the “new” method via freeipmi on a dedicated host seems buggy while it’s working well when I run a connection test, and the IPMI infos are shown in less than a second, but with a discovery service or normal checks it’s almost 2mn long before getting any return from plugin, making it in timeout everytime…

Is there a way to repare python modules and a way to reduce this huge latency with ILO monitoring ?

Thanks again to the community !

How should be there corrupted files?

Generally you should do all troubleshooting for special agents or SNMP on the command line.

With all your problems the complete output from a “cmk --debug -vvI hostname” should help very much. Or do a “cmk -D hostname” to get the complete command line executed from CMK to query your device. This command line can also executed manually.

Beside this why not use the Redfish special agent on the management interfaces? There you can configure what information should be fetched.

Hi,
I send you the output in private.

For redfish I tried with IDRAC but the plugin returns redfish.rest.v1.InvalidCredentialsError: HTTP 401 Unauthorized returned: Invalid credentials supplied
The user have no admin right since it’s unneeded I think.

For ILO it’s still very long with redfish: 4mn for the last scan. Moreover I can’t disable thresholds like with freeipmi and ILO returns 4 wrongs by default: All temperatures of PS (warn/crit at 0.0 °C/0.0 °C)CRIT

I think I will disable anything with IPMI and use only SNMP, even if I miss some captors.

Hi all,

I solved my IDRAC problem. Redfish never worked, but IPMI did with the following settings:

  • FreeIPMI
  • LAN driver type LAN_2_0
  • Set a BMC key and add “0x…” in front of it, even though it doesn’t appear on the IDRAC side… Subtle information.

However, for ILO it’s disastrous. FreeIPMI generated a huge CPU load, Redfish returns almost no more information than SNMP, and in the end even SNMP times out on ILO. I don’t know what else to do to monitor these interfaces…
If you have any ideas for things to test, I’m all ears. Hours of searching the internet have yielded nothing. Everything worked fine via the previous “management interfaces,” what a shame to have broken that…

The FreeIPMI mailing list is quite responsive. Maybe you ask for a solution for your resource problems.

regards

Michael