CMK version: 2.4.0p8 RAW OS version: Ubuntu 24.04.2 LTS
Hi everyone,
Since I updated my RAW checkmk in 2.4, all my IPMI checks (via Management board) appear stale and still don’t work if I manually force the checks.
I tried to install manually ipmitool and freeipmi on the server, I tried to allow the ipmi-sensors command by sudoers for my site user account but it is still not working.
I know this feature will be deprecated but it’s very useful. I check supermicro IPMI, Dell IDRAC and HPE ILO.
WARNING: '--checks' is deprecated in favour of option 'detect-plugins'
Unknown check plugin 'mgmt_ipmi_sensors'
Traceback (most recent call last):
File "/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/modes/check_mk.py", line 2017, in _lookup_plugin
return plugins[plugin_name]
~~~~~~~^^^^^^^^^^^^^
KeyError: CheckPluginName('mgmt_ipmi_sensors')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/omd/sites/xxxxxxxxxx/bin/cmk", line 157, in <module>
exit_status = modes.call("--check", None, opts, args, trace_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/modes/__init__.py", line 91, in call
return handler(*handler_args)
^^^^^^^^^^^^^^^^^^^^^^
File "/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/modes/check_mk.py", line 2302, in mode_check
selected_sections, run_plugin_names = _extract_plugin_selection(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/modes/check_mk.py", line 2051, in _extract_plugin_selection
agent_based_register.filter_relevant_raw_sections(
File "/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/api/agent_based/register/utils.py", line 198, in filter_relevant_raw_sections
section_name for plugin in consumers for section_name in plugin.sections
^^^^^^^^^
File "/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/modes/check_mk.py", line 2052, in <genexpr>
consumers=(_lookup_plugin(pn, plugins) for pn in plugin_names),
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/modes/check_mk.py", line 2019, in _lookup_plugin
raise MKBailOut(f"Unknown check plugin '{plugin_name}'") from exc
cmk.ccc.exceptions.MKBailOut: Unknown check plugin 'mgmt_ipmi_sensors'
As you can see it’s not working because of Python errors. Is there a way to repair theses files in /lib/python3/cmk/base/modes?
I tried to configure as independant device too, but nothing work (tried with IDRAC and ILO).
With FreeIPMI: Agent exited with code 1: ERROR: ‘ipmi-sensors: connection timeout’.
With IPMITool: Agent exited with code 1: ERROR: ‘Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory, Could not open device at /dev/ipmi0 or /dev/ipmi/0 or /dev/ipmidev/0: No such file or directory’.
OR Agent exited with code 1: ERROR: 'Error: Unable to establish LAN session Error: Unable to establish IPMI v1.5 / RMCP session, Error: Unable to establish LAN session Error: Unable to establish IPMI v1.5 / RMCP session’.
OR Agent exited with code 1: ERROR: ‘Error: Unable to establish IPMI v2 / RMCP+ session, Error: Unable to establish IPMI v2 / RMCP+ session’.
OR <<ipmi:sep(124)>> Error: no IMB driver found at /dev/imb! <<<ipmi_discrete:sep(124)>>> Error: no IMB driver found at /dev/imb!
Depending on the IPMI interface configured…
After a long time to investigate and test a lot, here is my results:
With Management inteface, evrything is broken and nothing works since update
With dedicated host via SNMP it’s working but a lot of sensors are missing
For my old IPMI supermicro board it is working well via freeipmi
For my IDRAC 9, nothing is woking with IPMI configuration, only SNMP is working …
For my ILOs 4, freeipmi has finally worked with “LAN_2_0” as driver type. But the problem is that the checks are taking a very very long time: about 2mn and generate a lot of errors in WATO… I don’t understand why because when I check connexions the results of IPMI agent are showed in less than a second, with good values … Is this another bug ? On theses dedicated hosts the “Check_MK Discovery” service is always timed out …
If you have any suggestion (and a way to repair management board too), I’m interested!
some python files seems corrupted in “/omd/sites/xxxxxxxxxx/lib/python3/cmk/base/modes/” and the management board (IPMI and SNMP) is broken whatever I am trying.
For ILO the “new” method via freeipmi on a dedicated host seems buggy while it’s working well when I run a connection test, and the IPMI infos are shown in less than a second, but with a discovery service or normal checks it’s almost 2mn long before getting any return from plugin, making it in timeout everytime…
Is there a way to repare python modules and a way to reduce this huge latency with ILO monitoring ?
Generally you should do all troubleshooting for special agents or SNMP on the command line.
With all your problems the complete output from a “cmk --debug -vvI hostname” should help very much. Or do a “cmk -D hostname” to get the complete command line executed from CMK to query your device. This command line can also executed manually.
Beside this why not use the Redfish special agent on the management interfaces? There you can configure what information should be fetched.
For redfish I tried with IDRAC but the plugin returns redfish.rest.v1.InvalidCredentialsError: HTTP 401 Unauthorized returned: Invalid credentials supplied
The user have no admin right since it’s unneeded I think.
For ILO it’s still very long with redfish: 4mn for the last scan. Moreover I can’t disable thresholds like with freeipmi and ILO returns 4 wrongs by default: All temperatures of PS (warn/crit at 0.0 °C/0.0 °C)CRIT
I think I will disable anything with IPMI and use only SNMP, even if I miss some captors.
I solved my IDRAC problem. Redfish never worked, but IPMI did with the following settings:
FreeIPMI
LAN driver type LAN_2_0
Set a BMC key and add “0x…” in front of it, even though it doesn’t appear on the IDRAC side… Subtle information.
However, for ILO it’s disastrous. FreeIPMI generated a huge CPU load, Redfish returns almost no more information than SNMP, and in the end even SNMP times out on ILO. I don’t know what else to do to monitor these interfaces…
If you have any ideas for things to test, I’m all ears. Hours of searching the internet have yielded nothing. Everything worked fine via the previous “management interfaces,” what a shame to have broken that…