Nutanix (SuperMicro) ipmi and Redfish 2.3.70 stops working randomly on multiple hosts

CMK version: 2.3p19
OS version: RHEL 8.10
Redfish Package: 2.3.70

Hi @andreas-doehler

I’ve migrated all deprecated IPMI checks in CheckMK to the new Redfish way of doing things. This works fine most of the time for most of the hosts but there are some hosts that completely stop working every now and than. I have been playing around with timeout settings and retries and excluding some sections. But no succes up to now.

This command…

./local/lib/python3/cmk_addons/plugins/redfish/libexec/agent_redfish -u SOMEUSER --password-id SOMEUSER:/omd/sites/SOMESITE/var/check_mk/passwords_merged -P https -n Memory,Power,Processors,Thermal,FirmwareInventory,NetworkAdapters,NetworkInterfaces 1.2.3.4

…gives me…

ERROR: too many retries for connection attempt:

Is there a more verbose method of finding out what the problem could be?

And there is another host that gives me this output…

<<<check_mk:sep(32)>>>
Version: 2.3.0
AgentOS: redfish
OSType: redfish
OSName: None
OSVersion: 08.02.02
OSPlatform: Supermicro
<<<redfish_manager:sep(0)>>>
[{"@odata.id": "/redfish/v1/Managers/1", "@odata.type": "#Manager.v1_7_0.Manager", "Actions": {"#Manager.Reset": {"ResetType@Redfish.AllowableValues": ["GracefulRestart"], "target": "/redfish/v1/Managers/1/Actions/Manager.Reset"}, "Oem": {"#SmcManagerConfig.Reset": {"@Redfish.ActionInfo": "/redfish/v1/Managers/1/Oem/Supermicro/ResetActionInfo", "target": "/redfish/v1/Managers/1/Actions/Oem/SmcManagerConfig.Reset"}}}, "CommandShell": {"ConnectTypesSupported": ["SSH"], "MaxConcurrentSessions": 0, "ServiceEnabled": false}, "DateTime": "2025-02-11T09:25:40Z", "DateTimeLocalOffset": "+00:00", "Description": "BMC", "EthernetInterfaces": {"@odata.id": "/redfish/v1/Managers/1/EthernetInterfaces"}, "FirmwareVersion": "08.02.02", "GraphicalConsole": {"ConnectTypesSupported": ["KVMIP"], "MaxConcurrentSessions": 4, "ServiceEnabled": true}, "HostInterfaces": {"@odata.id": "/redfish/v1/Managers/1/HostInterfaces"}, "Id": "1", "Links": {"ActiveSoftwareImage": {"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/BMC"}, "ManagerForChassis": [{"@odata.id": "/redfish/v1/Chassis/1"}], "ManagerForChassis@odata.count": 1, "ManagerForServers": [{"@odata.id": "/redfish/v1/Systems/1"}], "ManagerForServers@odata.count": 1, "ManagerInChassis": {"@odata.id": "/redfish/v1/Chassis/1/"}, "Oem": {}, "SoftwareImages": [{"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/BMC"}, {"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/Backup_BMC"}, {"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/Golden_BMC"}, {"@odata.id": "/redfish/v1/UpdateService/FirmwareInventory/Staging_BMC"}], "SoftwareImages@odata.count": 4}, "LogServices": {"@odata.id": "/redfish/v1/Managers/1/LogServices"}, "ManagerType": "BMC", "Model": "ASPEED", "Name": "Manager", "NetworkProtocol": {"@odata.id": "/redfish/v1/Managers/1/NetworkProtocol"}, "Oem": {"Supermicro": {"@odata.type": "#SmcManagerExtensions.v1_0_0.Manager", "FanMode": {"@odata.id": "/redfish/v1/Managers/1/Oem/Supermicro/FanMode"}, "IKVM": {"@odata.id": "/redfish/v1/Managers/1/Oem/Supermicro/IKVM"}, "IPAccessControl": {"@odata.id": "/redfish/v1/Managers/1/Oem/Supermicro/IPAccessControl"}, "KCSInterface": {"@odata.id": "/redfish/v1/Managers/1/Oem/Supermicro/KCSInterface"}, "LicenseManager": {"@odata.id": "/redfish/v1/Managers/1/LicenseManager"}, "MouseMode": {"@odata.id": "/redfish/v1/Managers/1/Oem/Supermicro/MouseMode"}, "NTP": {"@odata.id": "/redfish/v1/Managers/1/Oem/Supermicro/NTP"}, "RADIUS": {"@odata.id": "/redfish/v1/Managers/1/Oem/Supermicro/RADIUS"}, "SMCRAKP": {"@odata.id": "/redfish/v1/Managers/1/Oem/Supermicro/SMCRAKP"}, "Snooping": {"@odata.id": "/redfish/v1/Managers/1/Oem/Supermicro/Snooping"}, "SysLockdown": {"@odata.id": "/redfish/v1/Managers/1/Oem/Supermicro/SysLockdown"}, "Syslog": {"@odata.id": "/redfish/v1/Managers/1/Oem/Supermicro/Syslog"}}}, "SerialConsole": {"ConnectTypesSupported": ["IPMI"], "MaxConcurrentSessions": 1, "ServiceEnabled": true}, "SerialInterfaces": {"@odata.id": "/redfish/v1/Managers/1/SerialInterfaces"}, "Status": {"Health": "OK", "State": "Enabled"}, "UUID": "00000000-0000-0000-0000-7CC255813C09", "VirtualMedia": {"@odata.id": "/redfish/v1/Managers/1/VirtualMedia"}}]
<<<redfish_system:sep(0)>>>
[{"@odata.id": "/redfish/v1/Systems/1", "@odata.type": "#ComputerSystem.v1_8_0.ComputerSystem", "Actions": {"#ComputerSystem.Reset": {"@Redfish.ActionInfo": "/redfish/v1/Systems/1/ResetActionInfo", "target": "/redfish/v1/Systems/1/Actions/ComputerSystem.Reset"}, "Oem": {}}, "Bios": {"@odata.id": "/redfish/v1/Systems/1/Bios"}, "BiosVersion": "WU61.101", "Boot": {"BootNext": "", "BootOptions": {"@odata.id": "/redfish/v1/Systems/1/BootOptions"}, "BootOrder": ["Boot0002", "Boot0003", "Boot0004", "Boot0005", "Boot0006", "Boot0007", "Boot0008", "Boot0009", "Boot000A", "Boot000B", "Boot000C", "Boot000D", "Boot000E", "Boot0001"], "BootSourceOverrideEnabled": "Disabled", "BootSourceOverrideMode": "Legacy", "BootSourceOverrideTarget": "Cd", "BootSourceOverrideTarget@Redfish.AllowableValues": ["None", "Pxe", "Floppy", "Cd", "Usb", "Hdd", "BiosSetup", "UsbCd", "UefiBootNext"]}, "Description": "Description of server", "EthernetInterfaces": {"@odata.id": "/redfish/v1/Systems/1/EthernetInterfaces"}, "Id": "1", "IndicatorLED": "Off", "Links": {"Chassis": [{"@odata.id": "/redfish/v1/Chassis/1"}], "ManagedBy": [{"@odata.id": "/redfish/v1/Managers/1"}]}, "LogServices": {"@odata.id": "/redfish/v1/Systems/1/LogServices"}, "Manufacturer": "Nutanix", "Memory": {"@odata.id": "/redfish/v1/Systems/1/Memory"}, "MemorySummary": {"MemoryMirroring": "System", "Metrics": {"@odata.id": "/redfish/v1/Systems/1/MemorySummary/MemoryMetrics"}, "Status": {"Health": "OK", "HealthRollup": "OK", "State": "Enabled"}, "TotalSystemMemoryGiB": 1024}, "Model": "NX-8170-G8", "Name": "System", "NetworkInterfaces": {"@odata.id": "/redfish/v1/Systems/1/NetworkInterfaces"}, "Oem": {"Supermicro": {"@odata.type": "#SmcSystemExtensions.v1_0_0.System", "NodeManager": {"@odata.id": "/redfish/v1/Systems/1/Oem/Supermicro/NodeManager"}}}, "PartNumber": "NX-8170-G8", "PowerState": "On", "ProcessorSummary": {"Count": 2, "Metrics": {"@odata.id": "/redfish/v1/Systems/1/ProcessorSummary/ProcessorMetrics"}, "Model": "Intel(R) Xeon(R) processor", "Status": {"Health": "OK", "HealthRollup": "OK", "State": "Enabled"}}, "Processors": {"@odata.id": "/redfish/v1/Systems/1/Processors"}, "SKU": "To be filled by O.E.M.", "SecureBoot": {"@odata.id": "/redfish/v1/Systems/1/SecureBoot"}, "SerialNumber": "*************", "SimpleStorage": {"@odata.id": "/redfish/v1/Systems/1/SimpleStorage"}, "Status": {"Health": "OK", "State": "Enabled"}, "Storage": {"@odata.id": "/redfish/v1/Systems/1/Storage"}, "SystemType": "Physical", "UUID": "*****************************"}]
Agent failed - please submit a crash report! (Crash-ID: 620c513e-e85a-11ef-aabf-506b8db05e4d)

Traceback (most recent call last):
  File "/omd/sites/SOMESITE/local/lib/python3/urllib3/connectionpool.py", line 468, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/omd/sites/SOMESITE/local/lib/python3/urllib3/connectionpool.py", line 463, in _make_request
    httplib_response = conn.getresponse()
                       ^^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/lib/python3.12/http/client.py", line 1428, in getresponse
    response.begin()
  File "/omd/sites/SOMESITE/lib/python3.12/http/client.py", line 331, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/lib/python3.12/http/client.py", line 292, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/lib/python3.12/socket.py", line 707, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/lib/python3.12/ssl.py", line 1252, in recv_into
    return self.read(nbytes, buffer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/lib/python3.12/ssl.py", line 1104, in read
    return self._sslobj.read(len, buffer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/omd/sites/SOMESITE/local/lib/python3/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/local/lib/python3/urllib3/connectionpool.py", line 802, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/local/lib/python3/urllib3/util/retry.py", line 552, in increment
    raise six.reraise(type(error), error, _stacktrace)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/local/lib/python3/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/omd/sites/SOMESITE/local/lib/python3/urllib3/connectionpool.py", line 716, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/local/lib/python3/urllib3/connectionpool.py", line 470, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "/omd/sites/SOMESITE/local/lib/python3/urllib3/connectionpool.py", line 358, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='1.2.3.4', port=443): Read timed out. (read timeout=3)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/omd/sites/SOMESITE/local/lib/python3/redfish/rest/v1.py", line 910, in _rest_request
    resp = self._session.request(method.upper(), "{}{}".format(self.__base_url, reqpath), data=body,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/local/lib/python3/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/local/lib/python3/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/local/lib/python3/requests/adapters.py", line 713, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='1.2.3.4', port=443): Read timed out. (read timeout=3)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/omd/sites/SOMESITE/lib/python3/cmk/special_agents/v0_unstable/agent_common.py", line 149, in _special_agent_main_core
    return main_fn(args)
           ^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/local/lib/python3/cmk_addons/plugins/redfish/special_agents/agent_redfish.py", line 865, in agent_redfish_main
    get_information(redfishobj)
  File "/omd/sites/SOMESITE/local/lib/python3/cmk_addons/plugins/redfish/special_agents/agent_redfish.py", line 645, in get_information
    fetch_sections(redfishobj, resulting_sections, redfishobj.sections, system)
  File "/omd/sites/SOMESITE/local/lib/python3/cmk_addons/plugins/redfish/special_agents/agent_redfish.py", line 264, in fetch_sections
    section_data = fetch_data(
                   ^^^^^^^^^^^
  File "/omd/sites/SOMESITE/local/lib/python3/cmk_addons/plugins/redfish/special_agents/agent_redfish.py", line 188, in fetch_data
    response_url = redfishobj.redfish_connection.get(url, None)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/local/lib/python3/redfish/rest/v1.py", line 628, in get
    return self._rest_request(path, method='GET', args=args,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/local/lib/python3/redfish/rest/v1.py", line 1107, in _rest_request
    return super(HttpClient, self)._rest_request(path=path, method=method,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/local/lib/python3/redfish/rest/v1.py", line 954, in _rest_request
    raise RetriesExhaustedError() from cause_exception
redfish.rest.v1.RetriesExhaustedError
OMD[SOMESITE]:~$ ./local/lib/python3/cmk_addons/plugins/redfish/libexec/agent_redfish -u checkmkipmi --password-id ipmi_nutanix:/omd/sites/SOMESITE/var/check_mk/passwords_merged -P https -n Memory,Power,Processors,Thermal,FirmwareInventory,NetworkAdapters,NetworkInterfaces 10.3.3.151
ERROR: too many retries for connection attempt:
OMD[SOMESITE]:~$ ./local/lib/python3/cmk_addons/plugins/redfish/libexec/agent_redfish -u SOMEUSER --password-id SOMEUSER:/omd/sites/SOMESITE/var/check_mk/passwords_merged -P https -n Memory,Power,Processors,Thermal,FirmwareInventory,NetworkAdapters,NetworkInterfaces --timeout 40 1.2.3.4
ERROR 2025-02-11 10:32:47 redfish.rest.v1: Service responded with invalid JSON at URI /redfish/v1/SessionService/Sessions

Agent failed - please submit a crash report! (Crash-ID: 27a0486a-e85b-11ef-975c-506b8db05e4d)

Traceback (most recent call last):
  File "/omd/sites/SOMESITE/lib/python3/cmk/special_agents/v0_unstable/agent_common.py", line 149, in _special_agent_main_core
    return main_fn(args)
           ^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/local/lib/python3/cmk_addons/plugins/redfish/special_agents/agent_redfish.py", line 854, in agent_redfish_main
    redfishobj = get_session(args)
                 ^^^^^^^^^^^^^^^^^
  File "/omd/sites/SOMESITE/local/lib/python3/cmk_addons/plugins/redfish/special_agents/agent_redfish.py", line 827, in get_session
    redfishobj.redfish_connection.login(auth="session")
  File "/omd/sites/SOMESITE/local/lib/python3/redfish/rest/v1.py", line 1006, in login
    raise InvalidCredentialsError('HTTP 401 Unauthorized returned: Invalid credentials supplied')
redfish.rest.v1.InvalidCredentialsError: HTTP 401 Unauthorized returned: Invalid credentials supplied

To get a better output you can also insert a “–debug” before IP of you management interface.
Also you should take a look at “~/tmp/check_mk/agents/agent_redfish/” if the pkl files with IP_Port are older than some hours. Inside this file the reused session information is stored.
It is possible if this session is created every interval that you have a “out of session” problem.
In my systems these pkl files are older than some days. Sometimes up to one month.

If you run the agent with the “–debug” option you will also see the times needed for every query.

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed. Contact an admin if you think this should be re-opened.